Data Integration, Data Lake and Strategy

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. They are the same.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

The importance of publishing only high-quality data cant be overstatedits the foundation for accurate analytics, reliable machine learning (ML) models, and sound decision-making. AWS Glue is a serverless data integration service that you can use to effectively monitor and manage data quality through AWS Glue Data Quality.

Data Quality

Data Quality Publishing Snapshot Data Lake

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

How will organizations wield AI to seize greater opportunities, engage employees, and drive secure access without compromising data integrity and compliance? While it may sound simplistic, the first step towards managing high-quality data and right-sizing AI is defining the GenAI use cases for your business.

Strategy

Strategy Modeling Management Data Lake

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format.

Data Lake

Data Lake Metadata Snapshot Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

Effective data analytics relies on seamlessly integrating data from disparate systems through identifying, gathering, cleansing, and combining relevant data into a unified format. This solution also allows you to update certain fields of the account object in the data lake and push it back to Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services.

Data Lake

Data Lake Metadata Management Software

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards. To identify the most promising opportunities, the team develops a segmentation strategy. The data analyst then discovers it and creates a comprehensive view of their market.

Analytics

Analytics Data Lake Data Warehouse Data-driven

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In our previous post Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg , we showed how to use Apache Iceberg in the context of strategy backtesting. Our analysis shows that Iceberg can accelerate query performance by up to 52%, reduce operational costs, and significantly improve data management at scale.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Disaster recovery is vital for organizations, offering a proactive strategy to mitigate the impact of unforeseen events like system failures, natural disasters, or cyberattacks. In Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for Recovery in the Cloud , we introduced four major strategies for disaster recovery (DR) on AWS.

Snapshot

Snapshot Strategy Dashboards Data Lake

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

AWS Big Data

AUGUST 22, 2024

The infrastructure provides an analytics experience to hundreds of in-house analysts, data scientists, and student-facing frontend specialists. The data engineering team is on a mission to modernize its data integration platform to be agile, adaptive, and straightforward to use.

Data Warehouse

Data Warehouse Data Lake Data Integration Management

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Avoid generative AI malaise to innovate and build business value

CIO Business Intelligence

APRIL 1, 2024

The research cited a lack of talent and skills to work with the technology (62%), unclear AI and GenAI investment priorities (47%), and the absence of a strategy for responsible AI (41%) as the top three obstacles. Reach consensus on strategy. GenAI requires high-quality data. But how do you get there? This playbook can help.

Data Lake

Data Lake Uncertainty Consulting Risk

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The data lakehouse is a relatively new data architecture concept, first championed by Cloudera, which offers both storage and analytics capabilities as part of the same solution, in contrast to the concepts for data lake and data warehouse which, respectively, store data in native format, and structured data, often in SQL format.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

As organizations increasingly rely on data stored across various platforms, such as Snowflake , Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these disparate data sources together has never been more pressing.

Analytics

Analytics Data-driven Data Integration Data Lake

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets.

Data Governance

Data Governance Metadata Testing Data Lake

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Meaning, data architecture is a foundational element of your business strategy for higher data quality. Practice proper data hygiene across interfaces.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Data Lake Risk

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

To companies entrenched in decades-old business and IT processes, data fiefdoms, and legacy systems, the task may seem insurmountable. Develop a strategy to liberate data . Which type(s) of storage consolidation you use depends on the data you generate and collect. . Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

CIO Business Intelligence

MAY 24, 2022

Australian research and advisory firm Adapt identifies an organisation’s ability to execute a data-driven strategy as one of 12 core competencies , identified from 30,000 conversations spanning three years with leading IT and businesses. This is the first post in a series of three on data-driven organisations. Oil and Gas.

Data-driven

Data-driven Data Lake Data Warehouse Machine Learning

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

It requires taking data from equipment sensors, applying advanced analytics to derive descriptive and predictive insights, and automating corrective actions. The end-to-end process requires several steps, including data integration and algorithm development, training, and deployment.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Turning the page

Cloudera

JUNE 1, 2021

Cloudera will benefit from the operating capabilities, capital support and expertise of Clayton, Dubilier & Rice (CD&R) and KKR – two of the most experienced and successful global investment firms in the world recognized for supporting the growth strategies of the businesses they back. Our strategy.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Data Virtualization

JULY 8, 2020

In my last post, I covered some of the latest best practices for enhancing data management capabilities in the cloud. Despite the increasing popularity of cloud services, enterprises continue to struggle with creating and implementing a comprehensive cloud strategy that.

Data Integration

Data Integration Strategy Enterprise Management

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

The term “data management platform” can be confusing because, while it sounds like a generalized product that works with all forms of data as part of generalized data management strategies, the term has been more narrowly defined of late as one targeted to marketing departments’ needs. Of course, marketing also works.

Management

Management Advertising Data Lake Sales

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

A data and analytics capability cannot emerge from an IT or business strategy alone. With both technology and business organization deeply involved in the what, why, and how of data, companies need to create cross-functional data teams to get the most out of it. That strategy is doomed to fail. What are the layers?

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

The Data Journey: From Raw Data to Insights

Sisense

JULY 22, 2020

The growing amount and increasingly varied sources of data that every organization generates make digital transformation a daunting prospect. At Sisense, we’re dedicated to making this complex task simple, putting power in the hands of the builders of business data and strategy, and providing insights for everyone.

Slice and Dice

Slice and Dice Digital Transformation Data Warehouse Data Lake

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. The application sends the prompt to Amazon Bedrock and retrieves the LLM output.

Management

Management Analytics Data Lake Interactive

ESG software: 6 tips for selecting the best fit for your business

CIO Business Intelligence

FEBRUARY 22, 2024

Increasing pressures around environment, social, and governance (ESG) concerns have organizations across industries turning to their CIOs to revamp their strategies for ESG reporting. To date, many companies have merely repurposed existing technology solutions for their ESG reporting needs. “The

Software

Software Reporting KPI Enterprise

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. What is a dimensional data model? It optimizes the database for faster data retrieval.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information.

Metadata

Metadata Data Lake Optimization Strategy

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

Webinars

Trending Sources

The success of GenAI models lies in your data management strategy

Webinars

Recap of Amazon Redshift key product announcements in 2024

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Accelerate data integration with Salesforce and AWS using AWS Glue

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

What is data architecture? A framework to manage data

Build a high-performance quant research platform with Apache Iceberg

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The Key Components of a Successful Data Lake Strategy

The Key Components of a Successful Data Lake Strategy

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

Create an end-to-end data strategy for Customer 360 on AWS

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Avoid generative AI malaise to innovate and build business value

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Top analytics announcements of AWS re:Invent 2024

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Databricks’ new data lakehouse aims at media, entertainment sector

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Doing Cloud Migration and Data Governance Right the First Time

Data governance in the age of generative AI

Data architecture strategy for data quality

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Your 5-Step Journey from Analytics to AI

Straumann Group is transforming dentistry with data, AI

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

P&G turns to AI to create digital manufacturing of the future

Data Strategies for Getting Greater Business Value from Distributed Data

Turning the page

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Top 15 data management platforms available today

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

The Data Journey: From Raw Data to Insights

Differentiate generative AI applications with your data using AWS analytics and managed databases

ESG software: 6 tips for selecting the best fit for your business

A hybrid approach in healthcare data warehousing with Amazon Redshift

Improving Multi-tenancy with Virtual Private Clusters

Stay Connected