2023 and Data Lake - Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

Uniteds embrace of SageMaker and Bedrock as well as Amazon Q is going to be a game changer for building data products, said Mai-LanTomsenBukovec, AWS vice president of technology, who pointed to United Data Hub as a transformational component in its AI journey at re:Invent. That number has increased to 21% in just 18 months.

IT

IT Unstructured Data Experimentation Data Lake

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. In 2023, we released several updates to AWS Glue crawlers. Crawlers, salut!

Data Lake

Data Lake Metadata Data Governance Statistics

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

These include architectural optimizations to reduce memory usage and query times with more efficient batch processing to deliver better throughput, faster bulk writes and accelerated concurrent writes during data replication. also extends MongoDBs Queryable Encryption capability, which was introduced in 2023.

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. availability. parquet") df.sortWithinPartitions("review_date").writeTo("dev.db.amazon_reviews_iceberg").append()

Data Lake

Data Lake Snapshot Metadata Optimization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

These announcements drive forward the AWS Zero-ETL vision to unify all your data, enabling you to better maximize the value of your data with comprehensive analytics and ML capabilities, and innovate faster with secure data collaboration within and across organizations.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 2:30 PM – 3:30 PM (PDT) Mandalay Bay ANT335 | Get the most out of your data warehousing workloads.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Sessions ANT203 | What’s new in Amazon Redshift Watch this session to learn about the newest innovations within Amazon Redshift—the petabyte-scale AWS Cloud data warehousing solution. Easily build and train machine learning models using SQL within Amazon Redshift to generate predictive analytics and propel data-driven decision-making.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

Enterprise use of AI tools will only grow, with industries like manufacturing leading the charge Our research shows that mirroring the broader AI trend, enterprises across industry verticals sharply increased their use of AI from May 2023 to June 2023, with sustained growth through August 2023.

Enterprise

Enterprise Manufacturing Risk Data-driven

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

In this query, the repository name is os-snapshot-repo and the snapshot name is 2023-11-18. Sesha Sanjana Mylavarapu is an Associate Data Lake Consultant at AWS Professional Services. She specializes in cloud-based data management and collaborates with enterprise clients to design and implement scalable data lakes.

Snapshot

Snapshot Strategy Dashboards Data Lake

Outdated business apps can cloud your AI vision

CIO Business Intelligence

FEBRUARY 20, 2025

According to IDCs 2023 CIO Sentiment Survey , organizations were spending an average of 12.8% The data retention issue is a big challenge because internally collected data drives many AI initiatives, Klingbeil says. CIOs should also use data lakes to aggregate information from multiple sources, he adds.

Insurance

Insurance Cost-Benefit Unstructured Data Data Lake

Everything-as-a-Service: Huawei Brings the Cloud Ecosystem Within Reach at MWC 2023

CIO Business Intelligence

FEBRUARY 28, 2023

GSMA’s Mobile World Congress (MWC) 2023 in Barcelona—the largest and most influential event for connectivity—is expected to attract over 80,000 attendees from 200 countries and over 2,000 exhibitors. Experts tout 2023 to be the year when new AI-powered tools and services make their presence felt across industries.

Internet of Things

Internet of Things Digital Transformation Data Lake Enterprise

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

These features allow efficient data corrections, gap-filling in time series, and historical data updates without disrupting ongoing analyses or compromising data integrity. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. Reserve your seat now! Register now to secure your spot!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. Because of technology limitations, we have always had to start by ripping information from the business systems and moving it to a different platform—a data warehouse, data lake, data lakehouse, data cloud.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

Lessons from the field: How Generative AI is shaping software development in 2023

CIO Business Intelligence

SEPTEMBER 6, 2023

For example, litigation has surfaced against companies for training AI tools using data lakes with thousands of unlicensed works. Some companies have already seen severe penalties around AI tools being used for research and code, therefore acting quickly is necessary.

Software

Software Risk Experimentation Data Lake

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

Boris Evelson

JUNE 14, 2023

No matter what technology foundation you’re using – a data lake, a data warehouse, data fabric, data mesh, etc. – BI applications are where business users consume data and turn it into actionable insights and decisions. The BI market has […]

Business Intelligence

Business Intelligence Data Lake Data Warehouse Data-driven

What’s cooking with Amazon Redshift at AWS re:Invent 2023

AWS Big Data

NOVEMBER 15, 2023

Connect with experts, meet with book authors on data warehousing and analytics (at the Meet the Authors event on November 29 and 30, 3:00 PM – 4:00 PM), win prizes, and learn all about the latest innovations from our AWS Analytics services. A shapeshifting guardian and protector of data like Data Lynx?

Data Lake

Data Lake Data Warehouse B2B Deep Learning

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. Sign up for a free trial of Cloudera’s NiFi-based DataFlow and walk through use cases like stream filtering and cloud data warehouse ingest.

Advertising

Advertising Data Lake Data Warehouse ROI

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The following is a visual representation of an example job where the number of workers is 10. When the example job ran, the workerUtilization metrics showed the following trend.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

CarMax drives business value with GPT-3.5

CIO Business Intelligence

MAY 5, 2023

x for business value even before ChatGPT became a household name. That is why the omnichannel used-car retailer earned a coveted spot on the 2023 CIO 100 Award list: for its early, innovative use of a nascent AI technology that led to a spike in page views as well as higher SEO ranking and placement that drove substantial business growth.

Digital Transformation

Digital Transformation Cost-Benefit Business Driver Machine Learning

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

AWS Big Data

MARCH 20, 2023

In the first post of this series , we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. Even without prior experience using Hudi, Delta Lake or Iceberg, you can easily achieve typical use cases.

Visualization

Visualization Data Lake Snapshot Big Data

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

KDnuggets News, January 18: 7 Best Platforms to Practice SQL • Explainable AI: 10 Python Libraries for Demystifying Your Model’s Decisions

KDnuggets

JANUARY 18, 2023

7 Best Platforms to Practice SQL • Explainable AI: 10 Python Libraries for Demystifying Your Model's Decisions • ChatGPT: Everything You Need to Know • Data Lakes and SQL: A Match Made in Data Heaven • Google Data Analytics Certification Review for 2023

Data Lake

Data Lake Modeling Data Analytics Analytics

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

Key statistics highlight the severity of the issue: 57% of respondents in a 2024 dbt Labs survey rated data quality as one of the three most challenging aspects of data preparation (up from 41% in 2023). 73% of data practitioners do not trust their data (IDC).

Scorecard

Scorecard Data Quality Measurement Testing

Wolverine hits pause for cloud success

CIO Business Intelligence

JULY 8, 2022

Wolverine, which Slater says relies on SAP and Microsoft for its core infrastructure, is now “well along the journey in supply chain data” using SAP SAC analytics but has yet to embark on other aspects of its digital transformation, such as building a data lake and embracing AI, she says. We are not currently doing that.”.

Data Lake

Data Lake Manufacturing Machine Learning Digital Transformation

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

The data sourcing problem To ensure the reliability of PySpark data pipelines, it’s essential to have consistent record-level data from both dimensional and fact tables stored in the Enterprise Data Warehouse (EDW). These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime.

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Telefónica España promociona los destinos inteligentes con un nuevo ‘hub’ de innovación

CIO Business Intelligence

JUNE 14, 2024

De hecho, esta industria destaca como el principal motor de crecimiento económico de España; en el año 2023, supuso un 12,8% del PIB , según la asociación Exceltur, y fue responsable del 24,8% del empleo creado durante el primer trimestre de 2024 , según los datos de la Encuesta de Población Activa (EPA).

Data Lake

Data Lake IoT Big Data Software

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

In our previous post Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes , we discussed how you can implement solutions to improve operational efficiencies of your Amazon Simple Storage Service (Amazon S3) data lake that is using the Apache Iceberg open table format and running on the Amazon EMR big data platform.

Optimization

Optimization Snapshot Data Lake Metadata

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Data silos are a perennial data management problem for enterprises, with almost three-quarters (73%) of participants in ISG Research’s Data Governance Benchmark Research citing disparate data sources and systems as a data governance challenge.

Management

Management Data-driven Data Governance Data Lake

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data Virtualization

MARCH 28, 2024

As noted in the Gartner Hype Cycle for Finance Data and Analytics Governance, 2023, “Through. The post My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Finance

Finance Digital Transformation Analytics Data Integration

Top 11 Azure Data Services Interview Questions in 2023

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Load data incrementally from transactional data lakes to data warehouses

Webinars

United Airlines sets its flight plan for gen AI success

AWS Lake Formation 2023 year in review

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Use Apache Iceberg in a data lake to support incremental data processing

MongoDB Enhances Developer Data Platform

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Run Apache XTable in AWS Lambda for background conversion of open table formats

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Your guide to AWS Analytics at AWS re:Invent 2023

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

The Increasing Importance of Open Table Formats

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Outdated business apps can cloud your AI vision

Everything-as-a-Service: Huawei Brings the Cloud Ecosystem Within Reach at MWC 2023

Build a high-performance quant research platform with Apache Iceberg

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Top Opportunities for SAP Partners in 2023

Lessons from the field: How Generative AI is shaping software development in 2023

Key finding from Forrester’s latest BI research including The Forrester Wave™: Augmented Business Intelligence Platforms, Q2 2023

What’s cooking with Amazon Redshift at AWS re:Invent 2023

5 Key Takeaways from Flink Forward 2023

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

CarMax drives business value with GPT-3.5

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Use Amazon Athena with Spark SQL for your open-source transactional table formats

KDnuggets News, January 18: 7 Best Platforms to Practice SQL • Explainable AI: 10 Python Libraries for Demystifying Your Model’s Decisions

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

Wolverine hits pause for cloud success

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Telefónica España promociona los destinos inteligentes con un nuevo ‘hub’ de innovación

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Denodo Provides a Logical Approach to Data Management

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Stay Connected