Data Lake, Data-driven and Metadata

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. Customers are using AWS and Snowflake to develop purpose-built data architectures that provide the performance required for modern analytics and artificial intelligence (AI) use cases.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. Event-driven architectures – The enhanced API facilitates seamless integration with external events, enabling the triggering of Airflow DAGs based on these events.

Interactive

Interactive Testing Data-driven Data Lake

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It provides a conversational interface where users can submit queries in natural language within the scope of their current data permissions. Your data is not shared across accounts.

Metadata

Metadata Sales Data Warehouse Optimization

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. DataOps helps the data mesh deliver greater business agility by enabling decentralized domains to work in concert. . But first, let’s define the data mesh design pattern.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. In this post, we discuss a common use case in relation to operational data processing and the solution we built using Apache Hudi and AWS Glue.

Data Lake

Data Lake Data Processing Metadata Snapshot

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Informatica’s new data management clouds target health, finance services

CIO Business Intelligence

MAY 24, 2022

Just after launching a focused data management platform for retail customers in March, enterprise data management vendor Informatica has now released two more industry-specific versions of its Intelligent Data Management Cloud (IDMC) — one for financial services, and the other for health and life sciences.

Finance

Finance Management Metadata Machine Learning

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

In today’s data-driven world , organizations are constantly seeking efficient ways to process and analyze vast amounts of information across data lakes and warehouses. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Lake

Data Lake Metadata Insurance Data-driven

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

So if you’re going to move from your data from on-premise legacy data stores and warehouse systems to the cloud, you should do it right the first time. And as you make this transition, you need to understand what data you have, know where it is located, and govern it along the way. Then you must bulk load the legacy data.

Data Governance

Data Governance Metadata Testing Data Lake

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

Rigid requirements to ensure the accuracy of data and veracity of scientific formulas as well as machine learning algorithms and data tools are common in modern laboratories. When Bob McCowan was promoted to CIO at Regeneron Pharmaceuticals in 2018, he had previously run the data center infrastructure for the $81.5

Data Lake

Data Lake IT Experimentation Data-driven

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Although the terms data fabric and data mesh are often used interchangeably, I previously explained that they are distinct but complementary. The popularity of data fabric and data mesh has highlighted the importance of software providers, such as Denodo, that utilize data virtualization to enable logical data management.

Management

Management Data-driven Data Governance Data Lake

Data Swamp, Data Lake, Data Lakehouse: What to Know

Alation

OCTOBER 21, 2021

Data Swamp vs Data Lake. When you imagine a lake, it’s likely an idyllic image of a tree-ringed body of reflective water amid singing birds and dabbling ducks. I’ll take the lake, thank you very much. I’ll take the lake, thank you very much. And so will your data. Benefits of a Data Lake.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

Why should you integrate data governance (DG) and enterprise architecture (EA)? Data governance provides time-sensitive, current-state architecture information with a high level of quality. Data governance provides time-sensitive, current-state architecture information with a high level of quality.

Data Governance

Data Governance Enterprise Risk Data Lake

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

In the era of digital transformation and data-driven decision making, organizations must rapidly harness insights from their data to deliver exceptional customer experiences and gain competitive advantage. Solution overview Salesforce Data Cloud provides a point-and-click experience to share data with a customer’s AWS account.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

erwin, Microsoft and the Power of the Common Data Model

erwin

DECEMBER 17, 2020

What is Microsoft’s Common Data Model (CDM), and why is it so powerful? The same is true for data, with a number of vendors creating data models by vertical industry (financial services, healthcare, etc.) and making them commercially available to improve how organizations understand and work with their data assets.

Modeling

Modeling Metadata Data-driven Data Lake

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it. Separates control and data plane enabling high performance.

Data Lake

Data Lake Cost-Benefit Metadata Big Data

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

Do you know where your data is? What data you have? Add to the mix the potential for a data breach followed by non-compliance, reputational damage and financial penalties and a real horror story could unfold. s Information Commissioner’s Office had levied against both Facebook and Equifax for their data breaches.

Data Governance

Data Governance Metadata Risk Data Lake

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

Data sharing has become a crucial aspect of driving innovation, contributing to growth, and fostering collaboration across industries. According to this Gartner study , organizations promoting data sharing outperform their peers on most business value metrics. Data publishers : Users in producer AWS accounts.

Analytics

Analytics Publishing Metadata Sales

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation. Software development, once solely the domain of human programmers, is now increasingly the by-product of data being carefully selected, ingested, and analysed by machine learning (ML) systems in a recurrent cycle.

Data Governance

Data Governance IT Risk Data Lake

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Recap of Amazon Redshift key product announcements in 2024

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

Webinars

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Data Lakes on Cloud & it’s Usage in Healthcare

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Use open table format libraries on AWS Glue 5.0 for Apache Spark

What is a Data Mesh?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Enrich your serverless data lake with Amazon Bedrock

Top analytics announcements of AWS re:Invent 2024

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Informatica’s new data management clouds target health, finance services

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Doing Cloud Migration and Data Governance Right the First Time

Regeneron turns to IT to accelerate drug discovery

Denodo Provides a Logical Approach to Data Management

Data Swamp, Data Lake, Data Lakehouse: What to Know

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Top 15 data management platforms

Governing data in relational databases using Amazon DataZone

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Integrating Data Governance and Enterprise Architecture

The Future of the Data Lakehouse – Open

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Unlock data across organizational boundaries using Amazon DataZone – now generally available

erwin, Microsoft and the Power of the Common Data Model

The Future of the Data Lakehouse – Open

Apache Ozone and Dense Data Nodes

Data Governance Makes Data Security Less Scary

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Stay Connected