2023, Data Architecture and Data Lake

2023

Data Architecture

Data Lake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. What’s new with Amazon Redshift Want to learn more about the most recent features launched in Amazon Redshift?

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023. The Gartner Magic Quadrant evaluates 20 data integration tool vendors based on two axesAbility to Execute and Completeness of Vision.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

For example, earlier in the year, we announced speed ups for string-based data processing up to 63x compared to alternative compression encodings such as LZO (Lempel-Ziv-Oberhumer) or ZStandard. At AWS re:Invent 2023, we extended data sharing capabilities to launch multi-data warehouse writes in preview.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation. Take note of this role’s ARN to use later in the steps.

Data Lake

Data Lake Data Warehouse Marketing Management

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. For now, Flink plus Iceberg is the compute plus storage solution for streaming data. Flink is, relatively speaking, a newer technology.

Advertising

Advertising Data Lake Data Warehouse ROI

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

After walking his executive team through the data hops, flows, integrations, and processing across different ingestion software, databases, and analytical platforms, they were shocked by the complexity of their current data architecture and technology stack. It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Porsche Carrera Cup Brasil gets real-time data boost

CIO Business Intelligence

MAY 21, 2024

Defining a strategic relationship In July 2023, Dener Motorsport began working with Microsoft Fabric to get at that data in real-time, specifically Fabric components Synapse Real-Time Analytics for data streaming analysis, and Data Activator to monitor and trigger actions in real-time.

Broadcasting

Broadcasting Recreation/Entertainment Manufacturing Data Lake

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents.

Reporting

Reporting Data Quality Strategy Data-driven

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

To make all this possible, the data had to be collected, processed, and fed into the systems that needed it in a reliable, efficient, scalable, and secure way. Data warehouses then evolved into data lakes, and then data fabrics and other enterprise-wide data architectures.

Enterprise

Enterprise Consulting Modeling Cost-Benefit

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad also struggled with data quality issues in terms of consistency, ownership, and insufficient data access across its targeted user base due to a complex BI access process, licensing challenges, and insufficient education. As of January 2023, Showpad’s QuickSight instance includes over 2,433 datasets and 199 dashboards.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix. This is a new way to interact with the web and search.

IT Insurance Cost-Benefit Testing

Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Load data incrementally from transactional data lakes to data warehouses

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Your guide to AWS Analytics at AWS re:Invent 2023

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

5 Key Takeaways from Flink Forward 2023

Building a vision for real-time artificial intelligence

Porsche Carrera Cup Brasil gets real-time data boost

Educating ChatGPT on Data Lakehouse

CIOs rise to the ESG reporting challenge

The year’s top 10 enterprise AI trends — so far

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

CIO 100 Award winners drive business results with IT

Stay Connected