Book, Data Lake and Data Strategy

Book

Data Lake

Data Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Note that the extra package ( delta-iceberg ) is required to create a UniForm table in AWS Glue Data Catalog. The extra package is also required to generate Iceberg metadata along with Delta Lake metadata for the UniForm table. Run the following cell and review the five records with Books as the product category.

Metadata

Metadata Data Warehouse Big Data Data Lake

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

In reality MDM ( master data management ) means Major Data Mess at most large firms, the end result of 20-plus years of throwing data into data warehouses and data lakes without a comprehensive data strategy. Contributing to the general lack of data about data is complexity.

Management

Management Data Architecture Data Lake Data Strategy

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

For those in the data world, this post provides a curated guide for all analytics sessions that you can use to quickly schedule and build your itinerary. Book your spot early for the sessions you do not want to miss. 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

There were thousands of attendees at the event – lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data. This was the gold rush of the 21st century, except the gold was data.

Big Data

Big Data Machine Learning Contextual Data Data Lake

What’s cooking with Amazon Redshift at AWS re:Invent 2023

AWS Big Data

NOVEMBER 15, 2023

Connect with experts, meet with book authors on data warehousing and analytics (at the Meet the Authors event on November 29 and 30, 3:00 PM – 4:00 PM), win prizes, and learn all about the latest innovations from our AWS Analytics services.

Data Lake

Data Lake Data Warehouse B2B Deep Learning

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

This allows for transparency, speed to action, and collaboration across the group while enabling the platform team to evangelize the use of data: Altron engaged with AWS to seek advice on their data strategy and cloud modernization to bring their vision to fruition.

Optimization

Optimization B2B Data Quality Sales

A Retrospective of 2018’s Articles

Peter James Thomas

APRIL 9, 2019

How to Spot a Flawed Data Strategy. What alarm bells might alert you to problems with your Data Strategy ; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones. Analytics & Big Data. The Data and Analytics Dictionary. The Equation.

Data-driven

Data-driven Statistics Data Science Big Data

This Structure has Novel Features which are of Considerable Business Interest

Peter James Thomas

APRIL 3, 2020

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. In fact is is the crucial final link between an organisation’s data and the people who need to use it. In many ways how people experience data capabilities will be determined by this final link.

Dashboards

Dashboards Reporting Sales Data Lake

A Guide to Data Analytics in the Travel Industry

Alation

MARCH 21, 2023

Why is data analytics important for travel organizations? Having been in business for over 50 years, ARC had accumulated a massive amount of data that was stored in siloed, on-premises servers across its 7 business domains. Using Alation, ARC automated the data curation and cataloging process. “So

Data Analytics

Data Analytics Analytics Data-driven Big Data

Enhance Trino Performance With Simba’s Powerful Connectivity

Jet Global

JANUARY 30, 2025

Its distributed architecture empowers organizations to query massive datasets across databases, data lakes, and cloud platforms with speed and reliability. Optimizing connections to your data sources is equally important, as it directly impacts the speed and efficiency of data access.

Data Lake

Data Lake Data-driven Optimization Enterprise

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

With Simba drivers acting as a bridge between Trino and your BI or ETL tools, you can unlock enhanced data connectivity, streamline analytics, and drive real-time decision-making. Let’s explore why this combination is a game-changer for data strategies and how it maximizes the value of Trino and Apache Iceberg for your business.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

The Right Tool to Support Your Microsoft Dynamics Migration

Jet Global

JUNE 13, 2022

When migrating to the cloud, there are a variety of different approaches you can take to maintain your data strategy. Those options include: Data lake or Azure Data Lake Services (ADLS) is Microsoft’s new data solution, which provides unstructured date analytics through AI. Get a Demo. What to expect.

Reporting

Reporting Data Lake Sales Operational Reporting

Enforce fine-grained access control on data lake tables using AWS Glue 5.0 integrated with AWS Lake Formation

AWS Big Data

DECEMBER 4, 2024

FGAC enables you to granularly control access to your data lake resources at the table, column, and row levels. This level of control is essential for organizations that need to comply with data governance and security regulations, or those that deal with sensitive data. through Lake Formation permissions.

Data Lake

Data Lake Big Data Management Software

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

AWS Big Data

APRIL 28, 2025

Use existing AWS Glue tables This section has following prerequisites: A data lake administrator user by following Create a data lake administrator. For detailed instruction see Revoking permission using the Lake Formation console. He is also the author of the book Serverless ETL and Analytics with AWS Glue.

Metadata

Metadata Data Lake Big Data Publishing

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

AWS Big Data

OCTOBER 30, 2024

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with Amazon Redshift Spectrum and create the gold (consumption) layer. The following diagram illustrates the different layers of the data lake.

Data Lake

Data Lake Machine Learning Data Architecture Data-driven

Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Webinars

Trending Sources

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Webinars

Top analytics announcements of AWS re:Invent 2024

What you don’t know about data management could kill your business

Your guide to AWS Analytics at AWS re:Invent 2023

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

What’s cooking with Amazon Redshift at AWS re:Invent 2023

Unstructured data management and governance using AWS AI/ML and analytics services

How AWS helped Altron Group accelerate their vision for optimized customer engagement

A Retrospective of 2018’s Articles

This Structure has Novel Features which are of Considerable Business Interest

A Guide to Data Analytics in the Travel Industry

Enhance Trino Performance With Simba’s Powerful Connectivity

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

The Right Tool to Support Your Microsoft Dynamics Migration

Enforce fine-grained access control on data lake tables using AWS Glue 5.0 integrated with AWS Lake Formation

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

Stay Connected