Data Lake, Data Transformation and Data-driven

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

At AWS, we are committed to empowering organizations with tools that streamline data analytics and transformation processes. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. Event-driven architectures – The enhanced API facilitates seamless integration with external events, enabling the triggering of Airflow DAGs based on these events.

Interactive

Interactive Testing Data-driven Data Lake

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Using Amazon DataZone lets us avoid building and maintaining an in-house platform, allowing our developers to focus on tailored solutions.

Analytics

Analytics Visualization Data Governance Data-driven

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. Data integrity presented a major challenge for the team, as there were many instances of duplicate data.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Data is the foundation of innovation, agility and competitive advantage in todays digital economy. As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Data quality is no longer a back-office concern.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. GenericInMemoryCatalog stores the catalog data in memory.

Data Lake

Data Lake Metadata Business Analysis Data-driven

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

As with many burgeoning fields and disciplines, we don’t yet have a shared canonical infrastructure stack or best practices for developing and deploying data-intensive applications. Why: Data Makes It Different. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT

IT Testing Experimentation Software

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava believes in and takes advantage of data-driven decisions in order to become the market leader. smava believes in and takes advantage of data-driven decisions in order to become the market leader.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

As businesses strive to make informed decisions, the amount of data being generated and required for analysis is growing exponentially. This trend is no exception for Dafiti , an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

Additionally, this forecasting system needs to provide data enrichment steps including byproducts, serve as the master data around the semiconductor management, and enable further use cases at the BMW Group. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

In the final part of this three-part series, we’ll explore ho w data mesh bolsters performance and helps organizations and data teams work more effectively. Usually, organizations will combine different domain topologies, depending on the trade-offs, and choose to focus on specific aspects of data mesh.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

To accelerate growth through innovation, the company is expanding its use of data science and artificial intelligence (AI) across the business to improve patient outcomes. . This initiative alone has generated an explosion in the quantity and complexity of data the company collects, stores, and analyzes for insights. . “We

Machine Learning

Machine Learning Data Science Data-driven Testing

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure. While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Such a solution should use the latest technologies, including Internet of Things (IoT) sensors, cloud computing, and machine learning (ML), to provide accurate, timely, and actionable data. To take advantage of this data and build an effective inventory management and forecasting solution, retailers can use a range of AWS services.

Forecasting

Forecasting Management IoT Data-driven

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak, a leading provider of modern data engineering solutions, is now a certified solution partner with Cloudera. Customers can now seamlessly automate migration to Cloudera’s Hybrid Data Platform — Cloudera Data Platform (CDP) to dynamically auto-scale cloud services with Cloudera Data Engineering (CDE) integration with Modak Nabu.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. Use case Consider a large ecommerce company that relies heavily on data-driven insights to optimize its operations, marketing strategies, and customer experiences.

Analytics

Analytics Data-driven Data Integration Data Lake

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

There’s no debate that the volume and variety of data is exploding and that the associated costs are rising rapidly. The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Enter the open data lakehouse.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog.

Data Processing

Data Processing Visualization Data Lake Data Processing

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Turning the page

Cloudera

JUNE 1, 2021

This means we can double down on our strategy – continuing to win the Hybrid Data Cloud battle in the IT department AND building new, easy-to-use cloud solutions for the line of business. It also means we can complete our business transformation with the systems, processes and people that support a new operating model. .

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

The most impactful data-driven insights come from connecting the dots between all your data sources—across departments, services, on-premises tools, and third-party applications. But typically, connecting data requires complex extract, transform, and load (ETL) pipelines, taking hours or days.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Data platform architecture has an interesting history. A read-optimized platform that can integrate data from multiple applications emerged. In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. It is too expensive.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. Building the right data model is an important part of your data strategy.

Modeling

Modeling Big Data IoT Data Warehouse

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. So questions linger about whether transformed data can be trusted.

Data Governance

Data Governance Risk Metadata Management

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

In this post, we share how the AWS Data Lab helped Tricentis to improve their software as a service (SaaS) Tricentis Analytics platform with insights powered by Amazon Redshift. Although Tricentis has amassed such data over a decade, the data remains untapped for valuable insights.

Software

Software Data Lake Testing Dashboards

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

Data people face a challenge. They must put high-quality data into the hands of users as efficiently as possible. As the latest iteration in this pursuit of high-quality data sharing, DataOps combines a range of disciplines. It synthesizes all we’ve learned about agile, data quality , and ETL/ELT.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

In 2021, Showpad set forth the vision to use the power of data to unlock innovations and drive business decisions across its organization. In this post, we share how Showpad used QuickSight to streamline data and insights access across teams and customers. Showpad migrated over 70 dashboards with over 1,000 visuals.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

Its AI/ML-driven predictive analysis enhanced proactive threat hunting and phishing investigations as well as automated case management for swift threat identification. Options included hosting a secondary data center, outsourcing business continuity to a vendor, and establishing private cloud solutions.

IT

IT Insurance Cost-Benefit Testing

What is a Data Pipeline?

Jet Global

MAY 9, 2024

A data pipeline is a series of processes that move raw data from one or more sources to one or more destinations, often transforming and processing the data along the way. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Data’s dark secret: Why poor quality cripples AI and growth

Build a data lake with Apache Flink on Amazon EMR

MLOps and DevOps: Why Data Makes It Different

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

7 key Microsoft Azure analytics services (plus one extra)

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

How the BMW Group analyses semiconductor demand with AWS Glue

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

At AstraZeneca, data and AI are more than game changers – they are life changers

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Lay the groundwork now for advanced analytics and AI

Reference guide to build inventory management and forecasting solutions on AWS

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

An AI Chat Bot Wrote This Blog Post …

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Use AWS Glue to streamline SFTP data processing

Amazon Redshift data ingestion options

Turning the page

Connect your data for faster decisions with AWS

Data platform trinity: Competitive or complementary?

Unlock scalable analytics with AWS Glue and Google BigQuery

Building Better Data Models to Unlock Next-Level Intelligence

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Turnkey Cloud DataOps: Solution from Alation and Accenture

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

CIO 100 Award winners drive business results with IT

What is a Data Pipeline?

Stay Connected