Data Lake and Data Transformation

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Cost-Benefit Testing

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

Resultant recommended a new, on-prem data infrastructure, complete with data lakes to provide stake holders with a better way to manage data reliability, accuracy, and timeliness. The process included co-developing a comprehensive roadmap, project plan, and budget with the business operations team.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

Additionally, integrating mainframe data with the cloud enables enterprises to feed information into data lakes and data lake houses, which is ideal for authorized data professionals to easily leverage the best and most modern tools for analytics and forecasting.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable).

Data Lake

Data Lake Snapshot Optimization Data Transformation

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements.

Data Integration

Data Integration Visualization Data Processing Data Lake

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Let’s expand the use case to run your data pipeline and perform extract, transform, and load (ETL) jobs when a new file lands in an Amazon Simple Storage Service (Amazon S3) bucket in your data lake. The modified architecture to support the data-aware scheduling is presented below.

Interactive

Interactive Testing Data-driven Data Lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Maintaining lists of possible values for the columns requires continuous updates.

Metadata

Metadata Data Lake Modeling Data Warehouse

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The Amazon EMR Flink CDC connector reads the binlog data and processes the data. Transformed data can be stored in Amazon S3. We use the AWS Glue Data Catalog to store the metadata such as table schema and table location. Verify all table metadata is stored in the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

But the features in Power BI Premium are now more powerful than the functionality in Azure Analysis Services, so while the service isn’t going away, Microsoft will offer an automated migration tool in the second half of this year for customers who want to move their data models into Power BI instead. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL.

Data Integration

Data Integration Data Lake Data Warehouse Software

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

ML use cases rarely dictate the master data management solution, so the ML stack needs to integrate with existing data warehouses. There’s an emerging space of ML-focused feature stores such as Tecton or labeling solutions like Scale and Snorkel. Model Development.

IT

IT Testing Experimentation Software

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub. In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization.

Forecasting

Forecasting Manufacturing Data Lake Big Data

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning).

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

Model, understand, and transform the data Comcast faced the challenge of collecting large amounts of information about potential security and reliability issues but with no easy way to make sense of it all, says Noopur Davis, corporate EVP, CISO, and chief product privacy officer.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

For workloads such as data transforms, joins, and queries, you can use G.1X With exponentially growing data sources and data lakes, customers want to run more data integration workloads, including their most demanding transforms, aggregations, joins, and queries. 1X (1 DPU) and G.2X DPU-hour ($) G.2X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu with CDP today have deployed Data Lakes and. This is the scale and speed that cloud-native solutions can provide — and Modak Nabu with CDP has been delivering the same.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. About the Authors Chiho Sugimoto is a Cloud Support Engineer on the AWS Big Data Support team.

Visualization

Visualization Data Processing Testing Publishing

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

The goal, she explained, is to knock down data silos between those groups, using multiple data lakes supported by strong security and governance, to drive positive impact across the supply chain, manufacturing, and the clinical trials of new drugs. . Four ways to improve data-driven business transformation .

Machine Learning

Machine Learning Data Science Data-driven Testing

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . This variety can result in a lack of standardization, leading to data duplication and inconsistency.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. This enables organizations to streamline data integration and analytics with OpenSearch Service. Select the secret you created, and on the Actions menu, choose Delete.

Analytics

Analytics IT Data Lake Visualization

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. Divisions decide how many domains to have within their node; some may have one, others many. Nodes and domains serve business needs and are not technology mandated.

Metadata

Metadata Data Governance Data Quality Data-driven

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Predict – Data Engineering (Apache Spark). This is Now. New Services.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Monitor data pipelines in a serverless data lake

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Texas Rangers data transformation modernizes stadium operations

Bridging the gap between mainframe data and hybrid cloud environments

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

How to modernize data lakes with a data lakehouse architecture

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

How EUROGATE established a data mesh architecture using Amazon DataZone

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Build a data lake with Apache Flink on Amazon EMR

7 key Microsoft Azure analytics services (plus one extra)

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Introducing Amazon Q data integration in AWS Glue

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

MLOps and DevOps: Why Data Makes It Different

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Straumann Group is transforming dentistry with data, AI

How the BMW Group analyses semiconductor demand with AWS Glue

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Lay the groundwork now for advanced analytics and AI

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Data’s dark secret: Why poor quality cripples AI and growth

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Connecting the Data Lifecycle

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

At AstraZeneca, data and AI are more than game changers – they are life changers

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Amazon Redshift data ingestion options

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Happy Birthday, CDP Public Cloud

Stay Connected