Data Lake, Data Transformation and IT

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Why: Data Makes It Different. In contrast, a defining feature of ML-powered applications is that they are directly exposed to a large amount of messy, real-world data which is too complex to be understood and modeled by hand. However, the concept is quite abstract. Can’t we just fold it into existing DevOps best practices?

IT

IT Testing Experimentation Software

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Data professionals need to access and work with this information for businesses to run efficiently, and to make strategic forecasting decisions through AI-powered data models.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. Data integrity presented a major challenge for the team, as there were many instances of duplicate data.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Most databases use a transaction log to record changes made to the database.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

The old stadium, which opened in 1992, provided the business operations team with data, but that data came from disparate sources, many of which were not consistently updated. The new Globe Life Field not only boasts a retractable roof, but it produces data in categories that didn’t even exist in 1992.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. You can now view your project’s subscribed data directly within Tableau and build dashboards.

Analytics

Analytics Visualization Data Governance Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The Amazon EMR Flink CDC connector reads the binlog data and processes the data.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Can it also help write SQL queries? The answer is yes. Choose Notebook instances.

Metadata

Metadata Data Lake Modeling Data Warehouse

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements.

Data Integration

Data Integration Visualization Data Processing Big Data

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Let’s expand the use case to run your data pipeline and perform extract, transform, and load (ETL) jobs when a new file lands in an Amazon Simple Storage Service (Amazon S3) bucket in your data lake. The modified architecture to support the data-aware scheduling is presented below.

Interactive

Interactive Testing Data-driven Data Lake

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. And as businesses contend with increasingly large amounts of data, the cloud is fast becoming the logical place where analytics work gets done.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. Using a native AWS Glue connector increases agility, simplifies data movement, and improves data quality. Choose Save to save your job, and choose Run to run the job.

Analytics

Analytics IT Data Lake Visualization

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. Choose Add data. In the following sections, we demonstrate how to set up this connection and run queries using different AWS services.

Visualization

Visualization Data Processing Testing Publishing

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

These issues dont just hinder next-gen analytics and AI; they erode trust, delay transformation and diminish business value. Data quality is no longer a back-office concern. In this article, I am drawing from firsthand experience working with CIOs, CDOs, CTOs and transformation leaders across industries.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

This reduces the time and effort you need to learn, build, and run data integration jobs using AWS Glue data integration engines. For example, you can ask Amazon Q Developer to generate a complete extract, transform, and load (ETL) script or code snippet for individual ETL operations.

Data Integration

Data Integration Data Lake Data Warehouse Software

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

The company’s orthodontics business, for instance, makes heavy use of image processing to the point that unstructured data is growing at a pace of roughly 20% to 25% per month. For example, imaging data can be used to show patients how an aligner will change their appearance over time. “It

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

For workloads such as data transforms, joins, and queries, you can use G.1X With exponentially growing data sources and data lakes, customers want to run more data integration workloads, including their most demanding transforms, aggregations, joins, and queries. 1X (1 DPU) and G.2X 4X (4 DPU) and G.8X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

To accelerate growth through innovation, the company is expanding its use of data science and artificial intelligence (AI) across the business to improve patient outcomes. . This initiative alone has generated an explosion in the quantity and complexity of data the company collects, stores, and analyzes for insights. . “We

Machine Learning

Machine Learning Data Science Data-driven Testing

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data volume is in double-digit TBs with steady growth as business and data sources evolve. smava’s Data Platform team faced the challenge to deliver data to stakeholders with different SLAs, while maintaining the flexibility to scale up and down while staying cost-efficient.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . This variety can result in a lack of standardization, leading to data duplication and inconsistency.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Building a data platform involves various approaches, each with its unique blend of complexities and solutions. In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

With the proliferation of IoT devices and the abundance of data generated by them, it has become possible to collect real-time data on inventory levels, customer behavior, and other key metrics. In the inventory management and forecasting solution, AWS Glue is recommended for data transformation.

Forecasting

Forecasting Management IoT Data-driven

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

MLOps and DevOps: Why Data Makes It Different

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Monitor data pipelines in a serverless data lake

Bridging the gap between mainframe data and hybrid cloud environments

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Texas Rangers data transformation modernizes stadium operations

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

Build a data lake with Apache Flink on Amazon EMR

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

How to modernize data lakes with a data lakehouse architecture

7 key Microsoft Azure analytics services (plus one extra)

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data’s dark secret: Why poor quality cripples AI and growth

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How the BMW Group analyses semiconductor demand with AWS Glue

Introducing Amazon Q data integration in AWS Glue

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Straumann Group is transforming dentistry with data, AI

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Lay the groundwork now for advanced analytics and AI

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

At AstraZeneca, data and AI are more than game changers – they are life changers

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Connecting the Data Lifecycle

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Reference guide to build inventory management and forecasting solutions on AWS

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Stay Connected