Data Integration, Data Lake and Strategy

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. They are the same.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

How will organizations wield AI to seize greater opportunities, engage employees, and drive secure access without compromising data integrity and compliance? While it may sound simplistic, the first step towards managing high-quality data and right-sizing AI is defining the GenAI use cases for your business.

Strategy

Strategy Modeling Management Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format.

Data Lake

Data Lake Metadata Snapshot Analytics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services.

Data Lake

Data Lake Metadata Management Software

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards. To identify the most promising opportunities, the team develops a segmentation strategy. The data analyst then discovers it and creates a comprehensive view of their market.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

Effective data analytics relies on seamlessly integrating data from disparate systems through identifying, gathering, cleansing, and combining relevant data into a unified format. This solution also allows you to update certain fields of the account object in the data lake and push it back to Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In our previous post Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg , we showed how to use Apache Iceberg in the context of strategy backtesting. Our analysis shows that Iceberg can accelerate query performance by up to 52%, reduce operational costs, and significantly improve data management at scale.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

I aim to outline pragmatic strategies to elevate data quality into an enterprise-wide capability. However, even the most sophisticated models and platforms can be undone by a single point of failure: poor data quality. This challenge remains deceptively overlooked despite its profound impact on strategy and execution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Disaster recovery is vital for organizations, offering a proactive strategy to mitigate the impact of unforeseen events like system failures, natural disasters, or cyberattacks. In Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for Recovery in the Cloud , we introduced four major strategies for disaster recovery (DR) on AWS.

Snapshot

Snapshot Strategy Dashboards Data Lake

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Having a clearly defined digital transformation strategy is an essential best practice for successful digital transformation. But what makes a viable digital transformation strategy? Constructing A Digital Transformation Strategy: Data Enablement. With automation, data quality is systemically assured.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

AWS Big Data

AUGUST 22, 2024

The infrastructure provides an analytics experience to hundreds of in-house analysts, data scientists, and student-facing frontend specialists. The data engineering team is on a mission to modernize its data integration platform to be agile, adaptive, and straightforward to use.

Data Warehouse

Data Warehouse Data Lake Data Integration Management

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

As your company transforms, you need a data foundation for business applications, new technical innovations, and data-driven business initiatives. With new capabilities for self-service and simpler builder experiences, you can democratize data access for line-of-business users, analysts, scientists, and engineers.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The data lakehouse is a relatively new data architecture concept, first championed by Cloudera, which offers both storage and analytics capabilities as part of the same solution, in contrast to the concepts for data lake and data warehouse which, respectively, store data in native format, and structured data, often in SQL format.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets.

Data Governance

Data Governance Metadata Testing Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

As organizations increasingly rely on data stored across various platforms, such as Snowflake , Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these disparate data sources together has never been more pressing.

Analytics

Analytics Data-driven Data Integration Data Lake

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Meaning, data architecture is a foundational element of your business strategy for higher data quality. Practice proper data hygiene across interfaces.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Data Lake Risk

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

To companies entrenched in decades-old business and IT processes, data fiefdoms, and legacy systems, the task may seem insurmountable. Develop a strategy to liberate data . Which type(s) of storage consolidation you use depends on the data you generate and collect. . Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

It requires taking data from equipment sensors, applying advanced analytics to derive descriptive and predictive insights, and automating corrective actions. The end-to-end process requires several steps, including data integration and algorithm development, training, and deployment.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Turning the page

Cloudera

JUNE 1, 2021

Cloudera will benefit from the operating capabilities, capital support and expertise of Clayton, Dubilier & Rice (CD&R) and KKR – two of the most experienced and successful global investment firms in the world recognized for supporting the growth strategies of the businesses they back. Our strategy.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Data Virtualization

JULY 8, 2020

In my last post, I covered some of the latest best practices for enhancing data management capabilities in the cloud. Despite the increasing popularity of cloud services, enterprises continue to struggle with creating and implementing a comprehensive cloud strategy that.

Data Integration

Data Integration Strategy Enterprise Management

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Get a closer look at how scaling for data warehousing works in AWS with the latest introduction of AI driven scaling and optimizations in Amazon Redshift Serverless to enable better price-performance for your workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

The term “data management platform” can be confusing because, while it sounds like a generalized product that works with all forms of data as part of generalized data management strategies, the term has been more narrowly defined of late as one targeted to marketing departments’ needs. Of course, marketing also works.

Management

Management Advertising Data Lake Sales

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

A data and analytics capability cannot emerge from an IT or business strategy alone. With both technology and business organization deeply involved in the what, why, and how of data, companies need to create cross-functional data teams to get the most out of it. That strategy is doomed to fail. What are the layers?

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

The Data Journey: From Raw Data to Insights

Sisense

JULY 22, 2020

The growing amount and increasingly varied sources of data that every organization generates make digital transformation a daunting prospect. At Sisense, we’re dedicated to making this complex task simple, putting power in the hands of the builders of business data and strategy, and providing insights for everyone.

Slice and Dice

Slice and Dice Digital Transformation Data Warehouse Data Lake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

The success of GenAI models lies in your data management strategy

Webinars

Trending Sources

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Webinars

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

What is data architecture? A framework to manage data

Recap of Amazon Redshift key product announcements in 2024

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Accelerate data integration with Salesforce and AWS using AWS Glue

Build a high-performance quant research platform with Apache Iceberg

Data’s dark secret: Why poor quality cripples AI and growth

The Key Components of a Successful Data Lake Strategy

The Key Components of a Successful Data Lake Strategy

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Your guide to AWS Analytics at AWS re:Invent 2023

Top analytics announcements of AWS re:Invent 2024

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Databricks’ new data lakehouse aims at media, entertainment sector

Create an end-to-end data strategy for Customer 360 on AWS

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Scaling RISE with SAP data and AWS Glue

Doing Cloud Migration and Data Governance Right the First Time

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Data architecture strategy for data quality

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Your 5-Step Journey from Analytics to AI

Straumann Group is transforming dentistry with data, AI

Data governance in the age of generative AI

P&G turns to AI to create digital manufacturing of the future

Data Strategies for Getting Greater Business Value from Distributed Data

Turning the page

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Top 15 data management platforms available today

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

The Data Journey: From Raw Data to Insights

Stay Connected