Data Lake, Machine Learning and Strategy

Data Lake

Machine Learning

Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Oracle recently hosted its annual Database Analyst Summit, sharing the vision and strategy for its data platform. While much of the event was under non-disclosure as product plans and launch schedules are finalized, it still served as a useful recap of the broad portfolio of data platform capabilities that Oracle has to offer.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used.

Data Lake

Data Lake Metadata Snapshot Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Is Machine Learning The Unspoken Secret To Gaming Success?

Smart Data Collective

AUGUST 13, 2019

Machine learning is rewriting the rules of the gaming industry. One report showed that Caesars is investing $1 billion in big data. I still remember playing my favorite games growing up, before machine learning was a thing or big data was a household word. Other companies are following suit.

Machine Learning

Machine Learning Big Data Data Lake Strategy

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. For this post, we use the Shared Key authentication method.

Data Lake

Data Lake Metadata Management Software

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

SEPTEMBER 4, 2020

There is an established body of practice around creating, managing, and accessing OLAP data (known as “cubes”). Data Lakes. There has been a lot of talk over the past year or two in the D365F&SCM world about “data lakes.” Traditional databases and data warehouses do not lend themselves to that task.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

Option 3: Azure Data Lakes. This leads us to Microsoft’s apparent long-term strategy for D365 F&SCM reporting: Azure Data Lakes. Azure Data Lakes are highly complex and designed with a different fundamental purpose in mind than financial and operational reporting. Azure Data Lakes are complicated.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In our previous post Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg , we showed how to use Apache Iceberg in the context of strategy backtesting. Our analysis shows that Iceberg can accelerate query performance by up to 52%, reduce operational costs, and significantly improve data management at scale.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Outdated business apps can cloud your AI vision

CIO Business Intelligence

FEBRUARY 20, 2025

Decades-old apps designed to retain a limited amount of data due to storage costs at the time are also unlikely to integrate easily with AI tools, says Brian Klingbeil, chief strategy officer at managed services provider Ensono. The aim is to create integration pipelines that seamlessly connect different systems and data sources.

Insurance

Insurance Cost-Benefit Unstructured Data Data Lake

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Deriving Value from Data Lakes with AI

Sisense

DECEMBER 23, 2019

Artificial Intelligence and machine learning are the future of every industry, especially data and analytics. Let’s talk about AI and machine learning (ML). AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information.

Data Lake

Data Lake Machine Learning Data Warehouse Digital Transformation

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

It continues to position its document database product as a developer data platform which is primarily used to support the development and deployment of net-new applications rather than as a direct replacement for relational databases. The recent launch of MongoDB 8.0

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Corinium

JUNE 6, 2019

Some of the work is very foundational, such as building an enterprise data lake and migrating it to the cloud, which enables other more direct value-added activities such as self-service. Newer methods can work with large amounts of data and are able to unearth latent interactions.

Insurance

Insurance Analytics Forecasting Deep Learning

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

I aim to outline pragmatic strategies to elevate data quality into an enterprise-wide capability. However, even the most sophisticated models and platforms can be undone by a single point of failure: poor data quality. This challenge remains deceptively overlooked despite its profound impact on strategy and execution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. Data integrity presented a major challenge for the team, as there were many instances of duplicate data.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

Real-time AI involves processing data for making decisions within a given time frame. Real-time AI brings together streaming data and machine learning algorithms to make fast and automated decisions; examples include recommendations, fraud detection, security monitoring, and chatbots. It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Rocket Mortgage lays foundation for generative AI success

CIO Business Intelligence

MARCH 29, 2024

That’s why Rocket Mortgage has been a vigorous implementor of machine learning and AI technologies — and why CIO Brian Woodring emphasizes a “human in the loop” AI strategy that will not be pinned down to any one generative AI model. It’s a powerful strategy.” The rest are on premises.

Data Lake

Data Lake Machine Learning Data Warehouse Unstructured Data

OCBC Bank Accelerates Its Data Strategy with Cloudera

Cloudera

DECEMBER 14, 2022

The company recently migrated to Cloudera Data Platform (CDP ) and CDP Machine Learning to power a number of solutions that have increased operational efficiency, enabled new revenue streams and improved risk management. OCBC also won a Cloudera Data Impact Award 2022 in the Transformation category for the project.

Data Strategy

Data Strategy Strategy IT Contextual Data

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

However, they do contain effective data management, organization, and integrity capabilities. As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. Warehouse, data lake convergence. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Subsequently, we’ll explore strategies for overcoming these challenges.

Metadata

Metadata Data Lake Modeling Data Warehouse

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Taking the broadest possible interpretation of data analytics , Azure offers more than a dozen services — and that’s before you include Power BI, with its AI-powered analysis and new datamart option , or governance-oriented approaches such as Microsoft Purview. Azure Data Lake Analytics. Microsoft. Azure Analysis Services.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. Amazon DataZone plays an essential role in facilitating data product management for the domain teams.

Metadata

Metadata Data Governance Data Quality Data-driven

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Smart Data Collective

DECEMBER 13, 2022

It is important to note that data analytics relies on computer tools and software to collect and analyze data so that business choices may be made properly. Data analytics is widely used in business since it allows organizations to better understand their consumers and improve their advertising strategies.

Visualization

Visualization Key Performance Indicator Sales Advertising

Carhartt turns to data under new CIO

CIO Business Intelligence

NOVEMBER 25, 2022

As part of that transformation, Agusti has plans to integrate a data lake into the company’s data architecture and expects two AI proofs of concept (POCs) to be ready to move into production within the quarter. Like many CIOs, Carhartt’s top digital leader is aware that data is the key to making advanced technologies work.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Architecture

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Decentralize LF-tag management with AWS Lake Formation

AWS Big Data

NOVEMBER 16, 2023

One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes.

Management

Management Data Lake Sales Machine Learning

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

This integration empowers organizations to break down data silos, accelerate analytics, and drive more agile customer-centric strategies. In his current role at Salesforce, Sriram works on Zero Copy integration with major data lake partners and helps customers deliver value with their data strategies.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Chipotle’s recipe for digital transformation: Cloud plus AI

CIO Business Intelligence

OCTOBER 21, 2022

Chipotle IT’s secret sauce Garner credits Chipotle’s wholly owned business model for enabling him to deploy advanced technologies such as the cloud, analytics, data lake, and AI uniformly to all restaurants because they are all based on the same digital backbone. Chipotle’s digital business in 2022 was $3.5

Digital Transformation

Digital Transformation Data Lake Forecasting Technology

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

MARCH 4, 2024

As enterprises collect increasing amounts of data from various sources, the structure and organization of that data often need to change over time to meet evolving analytical needs. Schema evolution enables adding, deleting, renaming, or modifying columns without needing to rewrite existing data.

Snapshot

Snapshot Data Lake Metadata Recreation/Entertainment

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Oracle Wants to Be the Database for AI

Webinars

Trending Sources

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Recap of Amazon Redshift key product announcements in 2024

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Choosing an open table format for your transactional data lake on AWS

Is Machine Learning The Unspoken Secret To Gaming Success?

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Build a high-performance quant research platform with Apache Iceberg

Outdated business apps can cloud your AI vision

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Deriving Value from Data Lakes with AI

What is data architecture? A framework to manage data

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

MongoDB Enhances Developer Data Platform

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Data’s dark secret: Why poor quality cripples AI and growth

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Building a vision for real-time artificial intelligence

Rocket Mortgage lays foundation for generative AI success

OCBC Bank Accelerates Its Data Strategy with Cloudera

Building a Beautiful Data Lakehouse

The rise of the data lakehouse: A new era of data value

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

7 key Microsoft Azure analytics services (plus one extra)

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Carhartt turns to data under new CIO

Create an end-to-end data strategy for Customer 360 on AWS

Decentralize LF-tag management with AWS Lake Formation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Chipotle’s recipe for digital transformation: Cloud plus AI

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Stay Connected