Data Architecture, Data Lake and Data Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. It enables organizations to quickly construct robust, high-performance data lakes that support ACID transactions and analytics workloads.

Data Lake

Data Lake Metadata Snapshot Analytics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

The Unexpected Cost of Data Copies

Unfortunately, data replication, transformation, and movement can result in longer time to insight, reduced efficiency, elevated costs, and increased security and compliance risk. Read this whitepaper to learn: Why organizations frequently end up with unnecessary data copies.

Data Lake

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Previously, there were three types of data structures in telco: .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Architecture for the Data Lake

TDAN

JANUARY 3, 2023

For a while now, vendors have been advocating that people put their data in a data lake when they put their data in the cloud. The Data Lake The idea is that you put your data into a data lake. Then, at a later point in time, the end user analyst can come along and […].

Data Lake

Data Lake Data Architecture Data Warehouse Data Strategy

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few. But there’s another factor of data quality that doesn’t get the recognition it deserves: your data architecture. How the right data architecture improves data quality.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

But at the other end of the attention spectrum is data management, which all too frequently is perceived as being boring, tedious, the work of clerks and admins, and ridiculously expensive. Still, to truly create lasting value with data, organizations must develop data management mastery. Seven individuals raised their hands.

Management

Management Data Architecture Data Lake Data Strategy

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Cloudera

MAY 9, 2023

New Data Lakehouse Enables Stronger Data Governance SoftBank needed to reduce the number of workloads on its existing platform and decided to adopt Cloudera to build a data lake capable of managing data more effectively. We believe these new data analysis capabilities will boost what we can offer to our customers.”

Data Lake

Data Lake IoT Data Governance Data-driven

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

After walking his executive team through the data hops, flows, integrations, and processing across different ingestion software, databases, and analytical platforms, they were shocked by the complexity of their current data architecture and technology stack. It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

The following are the key components of the Bluestone Data Platform: Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

Martha Heller: What are the business drivers behind the data architecture ecosystem you’re building at Thermo Fisher Scientific? Ryan Snyder: For a long time, companies would just hire data scientists and point them at their data and expect amazing insights. That strategy is doomed to fail.

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

Managers see data as relevant in the context of digitalization, but often think of data-related problems as minor details that have little strategic importance. Thus, it is taken for granted that companies should have a data strategy. But what is the scope of an effective strategy and who is affected by it?

Data-driven

Data-driven Data Strategy Strategy Enterprise

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

How effectively and efficiently an organization can conduct data analytics is determined by its data strategy and data architecture , which allows an organization, its users and its applications to access different types of data regardless of where that data resides.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Achieving Trusted AI in Manufacturing

Cloudera

JANUARY 30, 2024

Simply put, many organizations fail to realize the value of AI because they rely on AI tools and data science that is being applied to data which is faulty to begin with. Trusted AI begins with trusted data What resolves the data challenge and fuels data-driven AI in manufacturing? Eliminate data silos.

Manufacturing

Manufacturing Contextual Data IoT Internet of Things

Supporting Transformation with an Integrated Data Platform. Three Common Questions Answered.

Cloudera

SEPTEMBER 8, 2021

CDOs are under increasing pressure to reduce costs by moving data and workloads to the cloud, similar to what has happened with business applications during the last decade. Our upcoming webinar is centered on how an integrated data platform supports the data strategy and goals of becoming a data-driven company.

Data Lake

Data Lake Enterprise Data-driven Data Strategy

Are Data Silos Undermining Digital Transformation?

BI-Survey

NOVEMBER 23, 2021

Thus, alternative data architecture concepts have emerged, such as the data lake and the data lakehouse. Which data architecture is right for the data-driven enterprise remains a subject of ongoing debate. Data black holes: the high cost of supposed flexibility.

Digital Transformation

Digital Transformation Data Warehouse Data Lake Data-driven

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. Paxata booth visitors encompassed a broad range of roles, all with data responsibility in some shape or form.

Data Lake

Data Lake Data Architecture Advertising Insurance

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog. Access control is enforced using AWS Lake Formation , which manages fine-grained access control and data sharing on data lake data.

Data Governance

Data Governance Publishing Data-driven Metadata

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Netflix uses big data to make decisions on new productions, casting and marketing and generate millions in revenue through successful and strategic bets. Data Management. Before building a big data ecosystem, the goals of the organization and the data strategy should be very clear. Unscalable data architecture.

Big Data

Big Data Data Analytics Management Unstructured Data

AWS re:Invent Recap: The Future of Cloud

Alation

DECEMBER 14, 2021

How do you provide access and connect the right people to the right data? AWS has created a way to manage policies and access, but this is only for data lake formation. What about other data sources? In summary, AWS powers next-generation analytics with the best of both data lakes and purpose-built data stores.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

A Simple Data Capability Framework

Peter James Thomas

MAY 3, 2019

Data Architecture / Infrastructure. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes , have appeared and – at least in some cases – begun to add significant value.

Strategy

Strategy Data Architecture Data Quality Data Strategy

A Retrospective of 2018’s Articles

Peter James Thomas

APRIL 9, 2019

How to Spot a Flawed Data Strategy. What alarm bells might alert you to problems with your Data Strategy ; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones. Analytics & Big Data. The Data and Analytics Dictionary. The Equation.

Data-driven

Data-driven Statistics Data Science Big Data

Data Warehouse Teams Adapt to Be Data Driven

TDAN

JUNE 16, 2020

When companies embark on a journey of becoming data-driven, usually, this goes hand in and with using new technologies and concepts such as AI and data lakes or Hadoop and IoT. Suddenly, the data warehouse team and their software are not the only ones anymore that turn data […].

Data Warehouse

Data Warehouse Data-driven Data Lake IoT

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Data-in-motion is predominantly about streaming data so enterprises typically have two different ways or binary ways of looking at data. That combination of MiNiFi, NiFi, Kafka, and Flink is what makes for a true data-in-motion platform and empowers companies with the ability to ingest, scale, and process data in real-time.

Enterprise

Enterprise Data Lake Strategy Metadata

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. The global catalog The basic building block of our business-focused solutions are data products.

Finance

Finance Metadata Big Data Recreation/Entertainment

This Structure has Novel Features which are of Considerable Business Interest

Peter James Thomas

APRIL 3, 2020

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. The way that this consistency of figures is achieved is by all elements of the Structured Reporting Framework drawing their data from the same data repositories. Introduction.

Dashboards

Dashboards Reporting Sales Data Lake

Join the Alation MLDC World Tour!

Alation

FEBRUARY 20, 2020

The next stops on the MLDC World Tour include Data Transparency in Washington, Gartner Symposium/ITxpo in Orlando, Teradata Analytics Universe in Las Vegas, Tableau in New Orleans, Big Data LDN in London, TDWI in Orlando and Forrester Data Strategy & Insights in Orlando, again. Data Catalogs Are the New Black.

Machine Learning

Machine Learning Metadata Reporting Data-driven

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Webinars

Data Architecture and Strategy in the AI Era

The Unexpected Cost of Data Copies

Modern Data Architecture for Telecommunications

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Data’s dark secret: Why poor quality cripples AI and growth

Architecture for the Data Lake

Data architecture strategy for data quality

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

What you don’t know about data management could kill your business

What is a data architect? Skills, salaries, and how to become a data framework master

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Data democratization: How data architecture can drive business decisions and AI initiatives

Create an end-to-end data strategy for Customer 360 on AWS

Top analytics announcements of AWS re:Invent 2024

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Lake Formation 2022 year in review

Building a vision for real-time artificial intelligence

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

Your guide to AWS Analytics at AWS re:Invent 2023

A data strategy checklist for the journey to the data-driven enterprise

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Data Strategies for Getting Greater Business Value from Distributed Data

Data science vs data analytics: Unpacking the differences

Unstructured data management and governance using AWS AI/ML and analytics services

Achieving Trusted AI in Manufacturing

Supporting Transformation with an Integrated Data Platform. Three Common Questions Answered.

Are Data Silos Undermining Digital Transformation?

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

3 Major Trends at Strata New York 2017

HEMA accelerates their data governance journey with Amazon DataZone

How Data Management and Big Data Analytics Speed Up Business Growth

AWS re:Invent Recap: The Future of Cloud

A Simple Data Capability Framework

A Retrospective of 2018’s Articles

Data Warehouse Teams Adapt to Be Data Driven

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

This Structure has Novel Features which are of Considerable Business Interest

Join the Alation MLDC World Tour!

Stay Connected