Data Lake, Data Transformation and Strategy

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. Lionel Pulickal is Sr.

Visualization

Visualization Data Lake Testing Data Governance

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

In this article, I am drawing from firsthand experience working with CIOs, CDOs, CTOs and transformation leaders across industries. I aim to outline pragmatic strategies to elevate data quality into an enterprise-wide capability. This challenge remains deceptively overlooked despite its profound impact on strategy and execution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Subsequently, we’ll explore strategies for overcoming these challenges.

Metadata

Metadata Data Lake Modeling Data Warehouse

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. Divisions decide how many domains to have within their node; some may have one, others many.

Metadata

Metadata Data Governance Data Quality Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

The recent announcement of the Microsoft Intelligent Data Platform makes that more obvious, though analytics is only one part of that new brand. Here we take a look at Microsoft Azure’s essential analytics services, what they are used for, and how they come together to make a comprehensive stack for your analytics strategy in the cloud.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

This multinational production strategy follows an even more international and extensive supplier network. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Effective planning, thorough risk assessment, and a well-designed migration strategy are crucial to mitigating these challenges and implementing a successful transition to the new data warehouse environment on Amazon Redshift.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Turning the page

Cloudera

JUNE 1, 2021

Cloudera will benefit from the operating capabilities, capital support and expertise of Clayton, Dubilier & Rice (CD&R) and KKR – two of the most experienced and successful global investment firms in the world recognized for supporting the growth strategies of the businesses they back. Our strategy.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In this post, we explore how AWS Glue can serve as the data integration service to bring the data from Snowflake for your data integration strategy, enabling you to harness the power of your data ecosystem and drive meaningful outcomes across various use cases. For more information on AWS Glue, visit AWS Glue.

Analytics

Analytics Data-driven Data Integration Data Lake

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

Barnett recognized the need for a disaster recovery strategy to address that vulnerability and help prevent significant disruptions to the 4 million-plus patients Baptist Memorial serves. Options included hosting a secondary data center, outsourcing business continuity to a vendor, and establishing private cloud solutions.

IT

IT Insurance Cost-Benefit Testing

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

For files with known structures, a Redshift stored procedure is used, which takes the file location and table name as parameters and runs a COPY command to load the raw data into corresponding Redshift tables. Finally, the dashboard’s user-friendly interface made survey data more accessible to a wider range of stakeholders.

Measurement

Measurement Dashboards Data Warehouse Analytics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

Usually, organizations will combine different domain topologies, depending on the trade-offs, and choose to focus on specific aspects of data mesh. Once accomplished, an effective implementation spurs a mindset in which organizations prioritize and value data for decision-making, formulating strategies, and day-to-day operations.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In our solution, we create a notebook to access automotive sensor data, enrich the data, and send the enriched output from the Kinesis Data Analytics Studio notebook to an Amazon Kinesis Data Firehose delivery stream for delivery to an Amazon Simple Storage Service (Amazon S3) data lake.

Data Analytics

Data Analytics Analytics IoT Data Lake

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021! Discover why.

Modeling

Modeling Big Data IoT Data Warehouse

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This is supported by automated lineage, governance and reproducibility of data, helping to ensure seamless operations and reliability.    IBM and AWS have partnered to accelerate customers’ cloud-based data modernization strategies.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

From detailed design to a beta release, Tricentis had customers expecting to consume data from a data lake specific to only their data, and all of the data that had been generated for over a decade. Data export As stated earlier, some customers want to get an export of their test data and create their data lake.

Software

Software Data Lake Testing Cost-Benefit

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

This approach doesn’t solve for data quality issues in source systems, and doesn’t remove the need to have a wholistic data quality strategy. For addressing data quality challenges in Amazon Simple Storage Service (Amazon S3) data lakes and data pipelines, AWS has announced AWS Glue Data Quality (preview).

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. But there are only so many data engineers available in the market today; there’s a big skills shortage. Mitesh: Metadata is the fuel for the engine.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

The company decided to use AWS to unify its business intelligence (BI) and reporting strategy for both internal organization-wide use cases and in-product embedded analytics targeted at its customers. The company also used the opportunity to reimagine its data pipeline and architecture.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

With Simba drivers acting as a bridge between Trino and your BI or ETL tools, you can unlock enhanced data connectivity, streamline analytics, and drive real-time decision-making. Let’s explore why this combination is a game-changer for data strategies and how it maximizes the value of Trino and Apache Iceberg for your business.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

Many organizations turn to data lakes for the flexibility and scale needed to manage large volumes of structured and unstructured data. Recently, NI embarked on a journey to transition their legacy data lake from Apache Hive to Apache Iceberg. NIs leading brands, Top10.com

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

AWS Big Data

APRIL 29, 2025

While enabling organization-wide efficiency, the team also applied these principles to the data architecture, making sure that CLEA itself operates frugally. After evaluating various tools, we built a serverless data transformation pipeline using Amazon Athena and dbt. The Source stage maintains raw data in its original form.

Data Transformation

Data Transformation Cost-Benefit Testing Data Lake

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

Second, because traditional data warehousing approaches are unable to keep up with the volume, velocity, and variety of data, engineering teams are building data lakes and adopting open data formats such as Parquet and Apache Iceberg to store their data. For Source , select Direct PUT.

Metadata

Metadata Data Lake Management Internet of Things

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

With a focus on privacy-first innovation, AppsFlyer empowers organizations to make data-driven decisions while respecting user privacy and compliance regulations. AppsFlyer provides tools for tracking user acquisition, engagement, and retention, delivering actionable insights to enhance ROI and streamline marketing strategies.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Come gestire (e ridurre) il debito tecnico per innovare nell’era dell’AI

CIO Business Intelligence

MAY 19, 2025

Sappiamo, in particolare, didover trasformare la gestione dei dati e creare un data lake basato su nuovi stack tecnologici per migliorare la governance e aiutare lazienda a diventare full data-driven. Per questo, indica il CIO, ho in piano un progetto di data transformation per creare un unico data lake aziendale.

KPI

KPI Data Lake Data-driven Software

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Webinars

Trending Sources

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Data’s dark secret: Why poor quality cripples AI and growth

How to modernize data lakes with a data lakehouse architecture

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

7 key Microsoft Azure analytics services (plus one extra)

How the BMW Group analyses semiconductor demand with AWS Glue

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Straumann Group is transforming dentistry with data, AI

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Turning the page

Connecting the Data Lifecycle

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

CIO 100 Award winners drive business results with IT

Reference guide to build inventory management and forecasting solutions on AWS

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Building Better Data Models to Unlock Next-Level Intelligence

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Tackling AI’s data challenges with IBM databases on AWS

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Exploring the AI and data capabilities of watsonx

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

What is a Data Pipeline?

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

What is Data Mapping?

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and Amazon Athena

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Come gestire (e ridurre) il debito tecnico per innovare nell’era dell’AI

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift