Big Data, Data Lake and Data Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. Why Cloudinary chose Apache Iceberg Apache Iceberg is a high-performance table format for huge analytic workloads. 5 seconds $0.08

Data Lake

Data Lake Metadata Snapshot Analytics

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. Their ability to resolve critical issues such as data consistency, query efficiency, and governance renders them indispensable for data- driven organizations.

Snapshot

Snapshot Metadata Data Lake Optimization

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Architecture for the Data Lake

TDAN

JANUARY 3, 2023

For a while now, vendors have been advocating that people put their data in a data lake when they put their data in the cloud. The Data Lake The idea is that you put your data into a data lake. Then, at a later point in time, the end user analyst can come along and […].

Data Lake

Data Lake Data Architecture Data Warehouse Data Strategy

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

How Data Analytics Tools Eliminate Business Owner Headaches

Smart Data Collective

AUGUST 7, 2019

Big data has the power to transform any small business. One study found that 77% of small businesses don’t even have a big data strategy. If your company lacks a big data strategy, then you need to start developing one today. Using Big Data to Fix Your Biggest Problems as a Business Owner.

Data Analytics

Data Analytics Analytics Big Data Advertising

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

He has been building products for over 9 years using big data technologies. In his current role at Salesforce, Sriram works on Zero Copy integration with major data lake partners and helps customers deliver value with their data strategies. Jason Berkowitz is a Senior Product Manager with AWS Lake Formation.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

In reality MDM ( master data management ) means Major Data Mess at most large firms, the end result of 20-plus years of throwing data into data warehouses and data lakes without a comprehensive data strategy. Contributing to the general lack of data about data is complexity.

Management

Management Data Architecture Data Lake Data Strategy

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Big Data technology in today’s world. Did you know that the big data and business analytics market is valued at $198.08 Or that the US economy loses up to $3 trillion per year due to poor data quality? quintillion bytes of data which means an average person generates over 1.5 Big Data Ecosystem.

Big Data

Big Data Data Analytics Management Analytics

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

There were thousands of attendees at the event – lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data. This was the gold rush of the 21st century, except the gold was data.

Big Data

Big Data Machine Learning Contextual Data Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Data Automation Has Become an Invaluable Part of Boosting Your Business

Smart Data Collective

NOVEMBER 30, 2020

They have found big data automation to provide an even higher ROI than traditional analog automation technology that became widely adapted in the mid-1900s. Could big data automation be a viable option for your company as well? Many companies have already taken advantage of data automation in their operations.

Big Data

Big Data Data Lake ROI Data-driven

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

A typical ask for this data may be to identify sales trends as well as sales growth on a yearly, monthly, or even daily basis. A key pillar of AWS’s modern data strategy is the use of purpose-built data stores for specific use cases to achieve performance, cost, and scale. This is achieved by partitioning the data.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

The Amazon Redshift service must be running in the same Region where the Salesforce Data Cloud is running. AWS admin roles for Lake Formation and Amazon Redshift: Lake Formation – A data lake admin for accepting the share and providing access to users. He helps customers become data-driven.

Data Lake

Data Lake Analytics Data-driven Management

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Artificial intelligence (AI) is now at the forefront of how enterprises work with data to help reinvent operations, improve customer experiences, and maintain a competitive advantage. It’s no longer a nice-to-have, but an integral part of a successful data strategy. How does an open data lakehouse architecture support AI?

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

CIO Business Intelligence

MAY 24, 2022

To meet these demands many IT teams find themselves being systems integrators, having to find ways to access and manipulate large volumes of data for multiple business functions and use cases. Without a clear data strategy that’s aligned to their business requirements, being truly data-driven will be a challenge.

Data-driven

Data-driven Data Lake Data Warehouse Machine Learning

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021! Discover why.

Modeling

Modeling Big Data IoT Data Warehouse

Trends in Data Management and Analytics

TDAN

MARCH 19, 2019

Various databases, plus one or more data warehouses, have been the state-of-the art data management infrastructure in companies for years. The emergence of various new concepts, technologies, and applications such as Hadoop, Tableau, R, Power BI, or Data Lakes indicate that changes are under way.

Management

Management Data Warehouse Data Lake Analytics

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. The application sends the prompt to Amazon Bedrock and retrieves the LLM output.

Management

Management Analytics Data Lake Interactive

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

We can determine the following are needed: An open data format ingestion architecture processing the source dataset and refining the data in the S3 data lake. This requires a dedicated team of 3–7 members building a serverless data lake for all data sources. Vijay Bagur is a Sr.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

Register the S3 path storing the table using Lake Formation We register the S3 full path in Lake Formation: Navigate to the Lake Formation console. In the navigation pane, under Register and ingest , choose Data lake locations. Jack Ye is a software engineer of the Athena Data Lake and Storage team at AWS.

Interactive

Interactive Snapshot Data Lake Software

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

To pursue a data science career, you need a deep understanding and expansive knowledge of machine learning and AI. And you should have experience working with big data platforms such as Hadoop or Apache Spark. Your skill set should include the ability to write in the programming languages Python, SAS, R and Scala.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . The result has been an extraordinary volume of data redundancy across the business, leading to disaggregated data strategy, unknown compliance exposures, and inconsistencies in data-based processes. .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

This allows for transparency, speed to action, and collaboration across the group while enabling the platform team to evangelize the use of data: Altron engaged with AWS to seek advice on their data strategy and cloud modernization to bring their vision to fruition.

Optimization

Optimization B2B Data Quality Sales

A Retrospective of 2018’s Articles

Peter James Thomas

APRIL 9, 2019

These are as follows: General Data Articles. Data Visualisation. Statistics & Data Science. Analytics & Big Data. How to Spot a Flawed Data Strategy. Many companies want to become data driven, but getting started on the journey towards this goal can be tough. Analytics & Big Data.

Data-driven

Data-driven Statistics Data Science Big Data

A Simple Data Capability Framework

Peter James Thomas

MAY 3, 2019

Data Architecture / Infrastructure. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes , have appeared and – at least in some cases – begun to add significant value. Data Strategy.

Strategy

Strategy Data Architecture Data Quality Data Strategy

Patterns for enterprise data sharing at scale

AWS Big Data

FEBRUARY 27, 2023

Data sharing is becoming an important element of an enterprise data strategy. AWS services like AWS Data Exchange provide an avenue for companies to share or monetize their value-added data with other companies.

Enterprise

Enterprise Publishing Data Lake Management

ChatGPT: le nuove sfide della strategia sui dati nell’era dell’IA generativa

CIO Business Intelligence

MARCH 27, 2024

Le aziende italiane investono in infrastrutture, software e servizi per la gestione e l’analisi dei dati (+18% nel 2023, pari a 2,85 miliardi di euro, secondo l’Osservatorio Big Data & Business Analytics della School of Management del Politecnico di Milano), ma quante sono giunte alla data maturity?

Data Governance

Data Governance Data Lake Data Strategy Data-driven

Breaking down Business Intelligence

BizAcuity

MAY 16, 2022

To stay relevant in the market and to increase brand awareness, organizations use big data analytics and business intelligence to navigate their way after getting a full understanding of their ideal customers and their behavior before and during the buying journey. So, make sure you have a data strategy in place.

Business Intelligence

Business Intelligence Data mining Visualization Data Lake

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

Washington','DC','20500','USA'); Subscribe to demographic data from AWS Data Exchange AWS Data Exchange is a data marketplace with more than 3,500 products from over 300 providers delivered—through files, APIs, or Amazon Redshift queries—directly to the data lakes, applications, analytics, and machine learning models that use it.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Webinars

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Create an end-to-end data strategy for Customer 360 on AWS

Architecture for the Data Lake

The rise of the data lakehouse: A new era of data value

How Data Analytics Tools Eliminate Business Owner Headaches

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Top analytics announcements of AWS re:Invent 2024

What you don’t know about data management could kill your business

How Data Management and Big Data Analytics Speed Up Business Growth

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Lake Formation 2022 year in review

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

What is a data architect? Skills, salaries, and how to become a data framework master

Data Automation Has Become an Invaluable Part of Boosting Your Business

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

Data governance in the age of generative AI

Achieve your AI goals with an open data lakehouse approach

Unstructured data management and governance using AWS AI/ML and analytics services

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

Building Better Data Models to Unlock Next-Level Intelligence

Trends in Data Management and Analytics

Differentiate generative AI applications with your data using AWS analytics and managed databases

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Data architecture strategy for data quality

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Data science vs data analytics: Unpacking the differences

Modern Data Architecture for Telecommunications

How AWS helped Altron Group accelerate their vision for optimized customer engagement

A Retrospective of 2018’s Articles

A Simple Data Capability Framework

Patterns for enterprise data sharing at scale

ChatGPT: le nuove sfide della strategia sui dati nell’era dell’IA generativa

Breaking down Business Intelligence

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Stay Connected