Data Analytics, Data Architecture and Data Lake

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Customers use data lake tables to achieve cost effective storage and interoperability with other tools.

Data Lake

Data Lake Data Warehouse Optimization Testing

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

At AWS, we are committed to empowering organizations with tools that streamline data analytics and transformation processes. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Data Gets Meshier. 2022 will bring further momentum behind modular enterprise architectures like data mesh. The data mesh addresses the problems characteristic of large, complex, monolithic data architectures by dividing the system into discrete domains managed by smaller, cross-functional teams.

Testing

Testing Data Lake Data Architecture Manufacturing

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. 5 seconds $0.08 8 seconds $0.07 8 seconds $0.02 107 seconds $0.25

Data Lake

Data Lake Metadata Snapshot Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

AWS Big Data

FEBRUARY 14, 2023

Organizations have chosen to build data lakes on top of Amazon Simple Storage Service (Amazon S3) for many years. A data lake is the most popular choice for organizations to store all their organizational data generated by different teams, across business domains, from all different formats, and even over history.

Data Lake

Data Lake Statistics Data Architecture Finance

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Carhartt turns to data under new CIO

CIO Business Intelligence

NOVEMBER 25, 2022

Carhartt’s signature workwear is near ubiquitous, and its continuing presence on factory floors and at skate parks alike is fueled in part thanks to an ongoing digital transformation that is advancing the 133-year-old Midwest company’s operations to make the most of advanced digital technologies, including the cloud, data analytics, and AI.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Architecture

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

A DataOps process hub offers a way for business analytics teams to cope with fast-paced requirements without expanding staff or sacrificing quality. Analytics Hub and Spoke. The data analytics function in large enterprises is generally distributed across departments and roles. DataOps Process Hub.

Business Analytics

Business Analytics Analytics Testing Dashboards

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

This post explores how you can use BladeBridge , a leading data environment modernization solution, to simplify and accelerate the migration of SQL code from BigQuery to Amazon Redshift. Tens of thousands of customers use Amazon Redshift every day to run analytics, processing exabytes of data for business insights.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data.

Analytics

Analytics Data Lake Metadata Data Warehouse

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Solution overview The following diagram illustrates the high-level solution architecture. We have defined all layers and components of our design in line with the AWS Well-Architected Framework Data Analytics Lens. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

The New Normal for FP&A: Data Analytics

Jedox

OCTOBER 22, 2020

The term “data analytics” refers to the process of examining datasets to draw conclusions about the information they contain. Data analysis techniques enhance the ability to take raw data and uncover patterns to extract valuable insights from it. Data analytics is not new.

Data Analytics

Data Analytics Analytics Unstructured Data Data mining

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

CDW Research Hub

JULY 18, 2022

One modern data platform solution that provides simplicity and flexibility to grow is Snowflake’s data cloud and platform. These Snowflake accelerators reduce the time to analytics for your users at all levels so you can make data-driven decisions faster. Security Data Lake. Snowflake Health Check.

Optimization

Optimization Data Lake Data Warehouse Manufacturing

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Why Data Mesh Needs Data Virtualization

Data Virtualization

AUGUST 19, 2021

“Data mesh” is a new data analytics paradigm proposed by Zhamak Dehghani, one that is designed to move organizations from monolithic architectures such as the data warehouse and the data lake to more decentralized architectures. As long-time supporters of logical.

Data Lake

Data Lake Data Warehouse Data Analytics Analytics

Why Data Mesh Needs Data Virtualization

Data Virtualization

AUGUST 19, 2021

“Data mesh” is a new data analytics paradigm proposed by Zhamak Dehghani, one that is designed to move organizations from monolithic architectures such as the data warehouse and the data lake to more decentralized architectures. As long-time supporters of logical.

Data Lake

Data Lake Data Warehouse Data Analytics Analytics

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Dashboards Data Lake Data Science

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture.

Analytics

Analytics Data Warehouse Metrics Big Data

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Data Virtualization

JANUARY 19, 2023

Reading Time: 2 minutes Today, many businesses are modernizing their on-premises data warehouses or cloud-based data lakes using Microsoft Azure Synapse Analytics. Unfortunately, with data spread.

Data Analytics

Data Analytics Data Lake Data Warehouse Analytics

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

Smart Data Collective

AUGUST 9, 2019

However, more mainstream games use big data as well. Fortnite is one of the games that uses big data to offer great service to its customers. Even Forbes Tech Council has written about the benefits of data lakes in Fortnite. As it turns out, Epic uses a data lake for this massive undertaking.

Big Data

Big Data Data Lake Data Architecture Machine Learning

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Its effective data analytics that allows personalization in marketing & sales, identifying new opportunities, making important decisions and being sustainable for the long term. Competitive Advantages to using Big Data Analytics. Challenges associated with Data Management and Optimizing Big Data.

Big Data

Big Data Data Analytics Management Analytics

AI Challenges and How Cloudera Can Help

Cloudera

AUGUST 20, 2024

Trusted data is what makes the outputs of AI not just accurate, but impactful in decision making. Ensuring data is trustworthy comes with its own complications. Cloudera’s State of Enterprise AI and Modern Data Architecture survey identified several challenges when it comes to data.

Data Architecture

Data Architecture Data Lake Data Governance Data Warehouse

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. times more effective than traditional mass marketing.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies. That’s today’s Cloudera.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

What is a Data Mesh?

Eight Top DataOps Trends for 2022

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Choosing an open table format for your transactional data lake on AWS

Centralize Your Data Processes With a DataOps Process Hub

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Carhartt turns to data under new CIO

Building a Beautiful Data Lakehouse

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

DataOps For Business Analytics Teams

Data architecture strategy for data quality

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Data science vs data analytics: Unpacking the differences

Top analytics announcements of AWS re:Invent 2024

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

The Future of the Data Lakehouse – Open

The New Normal for FP&A: Data Analytics

Deploy and Optimize Your Snowflake Environment Faster With Accelerators

The Future of the Data Lakehouse – Open

Why Data Mesh Needs Data Virtualization

Why Data Mesh Needs Data Virtualization

Why the Data Journey Manifesto?

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Habib Bank manages data at scale with Cloudera Data Platform

Insiders Cite The Wondrous Benefits Of Big Data In Fortnite

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

How Data Management and Big Data Analytics Speed Up Business Growth

AI Challenges and How Cloudera Can Help

Announcing the 2020 Data Impact Award Winners

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Stay Connected