Data Architecture, Data Lake and Interactive

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Data Warehouse Cost-Benefit

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata Data Lake Dashboards Interactive

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. We show two example scripts demonstrating a practical implementation of error handling for data conflicts in Iceberg streaming jobs.

Snapshot

Snapshot Management Metadata Big Data

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Traditionally, data was seen as information to be put on reserve, only called upon during customer interactions or executing a program. Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Dashboards Data Lake Data Science

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

AUGUST 2, 2023

In this blog, we will walk through how we can apply existing enterprise data to better understand and estimate Scope 1 carbon footprint using Amazon Simple Storage Service (S3) and Amazon Athena , a serverless interactive analytics service that makes it easy to analyze data using standard SQL.

Data Lake

Data Lake Measurement Visualization Data Architecture

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In the past, First Service Credit Union’s Chief data officer Ty Robbins struggled to integrate data from the legacy, non-relational, and often proprietary tabular databases on which many credit unions run. After moving its expensive, on-premise data lake to the cloud, Comcast created a three-tiered architecture.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How the Public Sector Can Maximize the Value of Dark Data

Cloudera

JANUARY 30, 2023

By 2025, it’s estimated that the amount of data created, consumed, and stored will reach 180 zettabytes , with up to 90% of that unstructured and nearly all of it unused for decision making. The purpose of this blog isn’t to emphasize the cyber risk of dark data but to spotlight its implications.

IoT

IoT Data Architecture Data Lake Machine Learning

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies. But, What Happened to Hadoop?

Big Data

Big Data Machine Learning Contextual Data Data Lake

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. Satori interacts with identity providers either via API or by using the SAML protocol.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

AWS Glue can interact with streaming data services such as Kinesis Data Streams and Amazon MSK for processing and transforming CDC data. With Amazon Managed Service for Apache Flink Studio , you can build and run Apache Flink stream processing applications using standard SQL, Python, and Scala in an interactive notebook.

Data Lake

Data Lake Unstructured Data Management Snapshot

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. In its first six months of operation, OVO UnCover has proven to be 7.9

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Extend your data mesh with Amazon Athena and federated views

AWS Big Data

JULY 28, 2023

Amazon Athena is a serverless, interactive analytics service built on the Trino, PrestoDB, and Apache Spark open-source frameworks. Recently, Athena added support for creating and querying views on federated data sources to bring greater flexibility and ease of use to use cases such as interactive analysis and business intelligence reporting.

Big Data

Big Data Data Architecture Data Lake Interactive

The hidden history of Db2

IBM Big Data Hub

JULY 5, 2022

In today’s world of complex data architectures and emerging technologies, databases can sometimes be undervalued and unrecognized. Via analyzes customer interactions to improve AI assistance . Vektis improves healthcare quality through data . Norfolk-Southern Corp’s 24/7 insights boost customer satisfaction .

Data Lake

Data Lake Data Warehouse Publishing Structured Data

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. He is designing data architectures and is looking to prep and clean the data as part of the migration.

Data Lake

Data Lake Advertising Data Architecture Insurance

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. A data fabric is comprised of a network of data nodes (e.g.,

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Data sources As part of this data platform, we are ingesting data from diverse and varied data sources, including: Transactional databases – These are active databases that store real-time data from various applications. AWS Glue – AWS Glue is used to load files into Amazon Redshift through the S3 data lake.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

After you create the table definition on the AWS Glue Data Catalog, you can use Athena to query the Data Catalog table. Query the Data Catalog table using Athena Athena is an interactive query service that makes it easy to analyze data in Amazon S3 and the AWS Glue Data Catalog using standard SQL.

Data Quality

Data Quality Metrics Visualization Dashboards

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Alation

NOVEMBER 2, 2022

At Databricks, we’re focused on enabling customers to adopt the data lakehouse, and that’s an open data architecture that combines the best of the data warehouse and the data lake into one platform,” Ferguson says. “[The And data governance is critical to driving adoption.”.

Data Governance

Data Governance Marketing Finance Data Lake

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

APRIL 3, 2023

With Amazon Redshift, you can query data across your data warehouse, operational data stores, and data lake using standard SQL. Amazon Redshift RSQL is a native command-line client for interacting with Amazon Redshift clusters and databases.

Data Warehouse

Data Warehouse Testing Data Lake Data-driven

Shopping for Data

Alation

FEBRUARY 20, 2020

In the future, the recommendation systems within the EDM will increasingly resemble those in consumer software like Amazon or Spotify — making pro-active, context-sensitive recommendations on which data sets or queries to use. Social interactions such as “following” other shoppers (as on Pinterest or Instagram) may also be built into the EDM.

Data Warehouse

Data Warehouse Metadata Data Lake Data Architecture

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Figure 1 Shows the overall idea of a data mesh with the major components: What Is a Data Mesh and How Does It Work? Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. Figure 2 Shows the concept of a data product: What Is a Data Contract?

Metadata

Metadata Data-driven Data Quality Data Architecture

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Load data incrementally from transactional data lakes to data warehouses

Webinars

What is a Data Mesh?

Run Apache XTable in AWS Lambda for background conversion of open table formats

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Data Architecture and Strategy in the AI Era

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

How to modernize data lakes with a data lakehouse architecture

Data democratization: How data architecture can drive business decisions and AI initiatives

AWS Lake Formation 2022 year in review

The Future of the Data Lakehouse – Open

Top analytics announcements of AWS re:Invent 2024

The Future of the Data Lakehouse – Open

Your guide to AWS Analytics at AWS re:Invent 2023

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Why the Data Journey Manifesto?

Estimating Scope 1 Carbon Footprint with Amazon Athena

Lay the groundwork now for advanced analytics and AI

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How the Public Sector Can Maximize the Value of Dark Data

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Accelerate Amazon Redshift secure data use with Satori – Part 1

Exploring real-time streaming for generative AI Applications

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Announcing the 2020 Data Impact Award Winners

Create an end-to-end data strategy for Customer 360 on AWS

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Extend your data mesh with Amazon Athena and federated views

The hidden history of Db2

3 Major Trends at Strata New York 2017

Data platform trinity: Competitive or complementary?

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Shopping for Data

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

What is Data Mesh?

Stay Connected