Data Lake and Data Strategy - Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

The data flywheel: A better way to think about your data strategy

CIO Business Intelligence

OCTOBER 25, 2022

Every day, it helps countless organizations do everything from measure their ESG impact to create new streams of revenue, and consequently, companies without strong data cultures or concrete plans to build one are feeling the pressure. Some are our clients—and more of them are asking our help with their data strategy.

Data Strategy

Data Strategy Strategy Data Lake Data-driven

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. Lionel Pulickal is Sr.

Visualization

Visualization Data Lake Testing Data Governance

The Unexpected Cost of Data Copies

Fortunately, a next-gen data architecture enabled by the Dremio data lake service removes the need for replicated data, helping organizations to minimize complexity, boost efficiency and dramatically reduce costs. Read this whitepaper to learn: Why organizations frequently end up with unnecessary data copies.

Data Lake

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. It enables organizations to quickly construct robust, high-performance data lakes that support ACID transactions and analytics workloads.

Data Lake

Data Lake Metadata Snapshot Analytics

Differences Between Data Lake and Data Warehouses

TDAN

SEPTEMBER 14, 2021

Data lake is a newer IT term created for a new category of data store. But just what is a data lake? According to IBM, “a data lake is a storage repository that holds an enormous amount of raw or refined data in native format until it is accessed.” That makes sense. I think the […].

Data Lake

Data Lake Data Warehouse IT Data Strategy

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Deriving Value from Data Lakes with AI

Sisense

DECEMBER 23, 2019

AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information. Once your data is prepared for analysis, the next question is: how else can AI help you?

Data Lake

Data Lake Machine Learning Data Warehouse Data Science

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

But because of the infrastructure, employees spent hours on manual data analysis and spreadsheet jockeying. We had plenty of reporting, but very little data insight, and no real semblance of a data strategy. Once they were identified, we had to determine we had the right data.

Enterprise

Enterprise Dashboards KPI Data Lake

Architecture for the Data Lake

TDAN

JANUARY 3, 2023

For a while now, vendors have been advocating that people put their data in a data lake when they put their data in the cloud. The Data Lake The idea is that you put your data into a data lake. Then, at a later point in time, the end user analyst can come along and […].

Data Lake

Data Lake Data Architecture Data Warehouse Data Strategy

OCBC Bank Accelerates Its Data Strategy with Cloudera

Cloudera

DECEMBER 14, 2022

OCBC also won a Cloudera Data Impact Award 2022 in the Transformation category for the project. Real-time data analysis for better business and customer solutions. Ultimately, Cloudera’s support and platform will be helpful for us to accelerate our data strategy and allow us to continue innovating and grow with efficiency.” .

Data Strategy

Data Strategy Strategy IT Contextual Data

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Creating Data Value With a Decentralized Data Strategy

CIO Business Intelligence

APRIL 6, 2022

For decades organizations chased the Holy Grail of a centralized data warehouse/lake strategy to support business intelligence and advanced analytics. That’s not to say that a decentralized data strategy wholly replaces the more traditional centralized data initiative — Maccaux emphasizes that there is a need for both.

Data Strategy

Data Strategy Strategy Internet of Things Data Warehouse

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Analytics

Analytics Data Lake Unstructured Data Data Strategy

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

In reality MDM ( master data management ) means Major Data Mess at most large firms, the end result of 20-plus years of throwing data into data warehouses and data lakes without a comprehensive data strategy. Contributing to the general lack of data about data is complexity.

Management

Management Data Architecture Data Lake Data Strategy

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

In his current role at Salesforce, Sriram works on Zero Copy integration with major data lake partners and helps customers deliver value with their data strategies. Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures.

Data Lake

Data Lake Analytics Data-driven Data Strategy

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Cloudera

MAY 9, 2023

New Data Lakehouse Enables Stronger Data Governance SoftBank needed to reduce the number of workloads on its existing platform and decided to adopt Cloudera to build a data lake capable of managing data more effectively. We believe these new data analysis capabilities will boost what we can offer to our customers.”

Data Lake

Data Lake IoT Data Governance Data-driven

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Note that the extra package ( delta-iceberg ) is required to create a UniForm table in AWS Glue Data Catalog. The extra package is also required to generate Iceberg metadata along with Delta Lake metadata for the UniForm table. He’s passionate about helping customers use Apache Iceberg for their data lakes on AWS.

Metadata

Metadata Data Warehouse Big Data Data Lake

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. About the Authors Leo Ramsamy is a Platform Architect specializing in data and analytics for ANZ’s Institutional division.

Metadata

Metadata Data Governance Data Quality Data-driven

Data platform, un impulso alla customer experience e ai progetti IA

CIO Business Intelligence

JUNE 17, 2024

Su questa data platform, infatti, convergono tutti i dati generati dagli utenti sui sistemi della società, ovviamente solo laddove abbiano dato il consenso previsto dalle norme per la privacy.

Data Lake

Data Lake Data Warehouse Data Strategy Strategy

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

Managers see data as relevant in the context of digitalization, but often think of data-related problems as minor details that have little strategic importance. Thus, it is taken for granted that companies should have a data strategy. But what is the scope of an effective strategy and who is affected by it?

Data-driven

Data-driven Data Strategy Strategy Enterprise

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

The Amazon Redshift service must be running in the same Region where the Salesforce Data Cloud is running. AWS admin roles for Lake Formation and Amazon Redshift: Lake Formation – A data lake admin for accepting the share and providing access to users. He helps customers become data-driven.

Data Lake

Data Lake Analytics Data-driven Management

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

How Etihad taps data science to optimise airline operations

CIO Business Intelligence

MARCH 9, 2022

Despite the worldwide chaos, UAE national airline Etihad has managed to generate productivity gains and cost savings from insights using data science. Etihad began its data science journey with the Cloudera Data Platform and moved its data to the cloud to set up a data lake. A change was needed.

Data Science

Data Science Data Lake Cost-Benefit Digital Transformation

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

A typical ask for this data may be to identify sales trends as well as sales growth on a yearly, monthly, or even daily basis. A key pillar of AWS’s modern data strategy is the use of purpose-built data stores for specific use cases to achieve performance, cost, and scale. This is achieved by partitioning the data.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Trends in Data Management and Analytics

TDAN

MARCH 19, 2019

Various databases, plus one or more data warehouses, have been the state-of-the art data management infrastructure in companies for years. The emergence of various new concepts, technologies, and applications such as Hadoop, Tableau, R, Power BI, or Data Lakes indicate that changes are under way.

Management

Management Data Warehouse Data Lake Analytics

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Artificial intelligence (AI) is now at the forefront of how enterprises work with data to help reinvent operations, improve customer experiences, and maintain a competitive advantage. It’s no longer a nice-to-have, but an integral part of a successful data strategy.

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

Why Can’t we Advance Healthcare and Life Sciences this Fast all the time?

Cloudera

APRIL 4, 2022

While challenges exist in data interoperability, privacy controls, ongoing compliance initiatives, etc, the industry has proven speed is possible despite these obstacles. . The usage of data lakes and automation are helping facilitate the data sharing and collaboration across the healthcare ecosystem.

Data Lake

Data Lake Digital Transformation Manufacturing Sales

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

Ryan Snyder: For a long time, companies would just hire data scientists and point them at their data and expect amazing insights. That strategy is doomed to fail. The best way to start a data strategy is to establish some real value drivers that the business can get behind.

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

The data flywheel: A better way to think about your data strategy

Webinars

Trending Sources

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

The Unexpected Cost of Data Copies

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Differences Between Data Lake and Data Warehouses

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Deriving Value from Data Lakes with AI

Steps taken to build Sevita’s first enterprise data platform

Architecture for the Data Lake

OCBC Bank Accelerates Its Data Strategy with Cloudera

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

The rise of the data lakehouse: A new era of data value

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Create an end-to-end data strategy for Customer 360 on AWS

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Top analytics announcements of AWS re:Invent 2024

Creating Data Value With a Decentralized Data Strategy

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

What you don’t know about data management could kill your business

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data platform, un impulso alla customer experience e ai progetti IA

A data strategy checklist for the journey to the data-driven enterprise

AWS Lake Formation 2022 year in review

Data Strategies for Getting Greater Business Value from Distributed Data

Your guide to AWS Analytics at AWS re:Invent 2023

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

How Etihad taps data science to optimise airline operations

Data’s dark secret: Why poor quality cripples AI and growth

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Trends in Data Management and Analytics

Straumann Group is transforming dentistry with data, AI

Data governance in the age of generative AI

Achieve your AI goals with an open data lakehouse approach

Why Can’t we Advance Healthcare and Life Sciences this Fast all the time?

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

Stay Connected