Data Analytics, Data Lake and Machine Learning

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Customers use data lake tables to achieve cost effective storage and interoperability with other tools.

Data Lake

Data Lake Data Warehouse Optimization Testing

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used. 5 seconds $0.08 8 seconds $0.07

Data Lake

Data Lake Metadata Snapshot Analytics

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. See the pattern? The problem is not “you.”

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

Option 3: Azure Data Lakes. This leads us to Microsoft’s apparent long-term strategy for D365 F&SCM reporting: Azure Data Lakes. Azure Data Lakes are highly complex and designed with a different fundamental purpose in mind than financial and operational reporting. Data lakes are not a mature technology.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Azure Data Sources for Data Science and Machine Learning

Jen Stirrup

MAY 5, 2020

Organizations need to recast storing their data. It is more than just some giant USB stick in the sky that’s going to store all of the data. It has a lot of services that you can use, such as Big Data analytics. You can also use Azure Data Lake storage as well, which is optimized for high-performance analytics.

Machine Learning

Machine Learning Data Science Data Lake Big Data

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

If data analytics is like a factory, the DataOps Engineer owns the assembly line used to build a data and analytic product. Most organizations run the data factory using manual labor. For example, a Hub-Spoke architecture could integrate data from a multitude of sources into a data lake.

Testing

Testing Data Lake Data Architecture Manufacturing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

How Data Analytics Tools Eliminate Business Owner Headaches

Smart Data Collective

AUGUST 7, 2019

One study found that 77% of small businesses don’t even have a big data strategy. If your company lacks a big data strategy, then you need to start developing one today. The best thing that you can do is find some data analytics tools to solve your most pressing challenges. There are many benefits to data analytics.

Data Analytics

Data Analytics Analytics Big Data Advertising

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Corinium

JUNE 6, 2019

For instance, for a variety of reasons, in the short term, CDAOS are challenged with quantifying the benefits of analytics’ investments. Some of the work is very foundational, such as building an enterprise data lake and migrating it to the cloud, which enables other more direct value-added activities such as self-service.

Insurance

Insurance Analytics Forecasting Deep Learning

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

Amazon SageMaker Unified Studio (preview) provides a unified experience for using data, analytics, and AI capabilities. You can use familiar AWS services for model development, generative AI, data processing, and analyticsall within a single, governed environment.

Data Analytics

Data Analytics Analytics Modeling Management

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

5 Best Practices for Extracting, Analyzing, and Visualizing Data

Smart Data Collective

DECEMBER 13, 2022

However, computerization in the digital age creates massive volumes of data, which has resulted in the formation of several industries, all of which rely on data and its ever-increasing relevance. Data analytics and visualization help with many such use cases. It is the time of big data. What Is Data Analytics?

Visualization

Visualization Key Performance Indicator Sales Advertising

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Taking the broadest possible interpretation of data analytics , Azure offers more than a dozen services — and that’s before you include Power BI, with its AI-powered analysis and new datamart option , or governance-oriented approaches such as Microsoft Purview. Azure Data Explorer. Azure Data Lake Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

Rigid requirements to ensure the accuracy of data and veracity of scientific formulas as well as machine learning algorithms and data tools are common in modern laboratories. When Bob McCowan was promoted to CIO at Regeneron Pharmaceuticals in 2018, he had previously run the data center infrastructure for the $81.5

Data Lake

Data Lake IT Experimentation Data-driven

Carhartt turns to data under new CIO

CIO Business Intelligence

NOVEMBER 25, 2022

Carhartt’s signature workwear is near ubiquitous, and its continuing presence on factory floors and at skate parks alike is fueled in part thanks to an ongoing digital transformation that is advancing the 133-year-old Midwest company’s operations to make the most of advanced digital technologies, including the cloud, data analytics, and AI.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Architecture

How Salesforce optimized their detection and response platform using AWS managed services

AWS Big Data

APRIL 18, 2024

The Salesforce Trust Intelligence Platform (TIP) log platform team is responsible for data pipeline and data lake infrastructure, providing log ingestion, normalization, persistence, search, and detection capability to ensure Salesforce is safe from threat actors. This is the bronze layer of the TIP data lake.

Optimization

Optimization Data Lake Management Key Performance Indicator

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

AWS Big Data

JUNE 20, 2023

Apache Iceberg is an open table format for very large analytic datasets. It manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. Mikhail specializes in data analytics services.

Data Lake

Data Lake Data Science Recreation/Entertainment Experimentation

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

AWS Big Data

DECEMBER 4, 2024

In today’s data-driven world , organizations are constantly seeking efficient ways to process and analyze vast amounts of information across data lakes and warehouses. This post will showcase how this data can also be queried by other data teams using Amazon Athena. Verify that you have Python version 3.7

Data Lake

Data Lake Metadata Insurance Data-driven

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. Overall, DataOps is an essential component of modern data-driven organizations. Query> DataOps. Query> Write an essay on DataOps.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

Amazon Redshift integrates with AWS HealthLake and data lakes through Redshift Spectrum and Amazon S3 auto-copy features, enabling you to query data directly from files on Amazon S3. This means you no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

CIO Business Intelligence

MAY 24, 2022

Organisations have to contend with legacy data and increasing volumes of data spread across multiple silos. To meet these demands many IT teams find themselves being systems integrators, having to find ways to access and manipulate large volumes of data for multiple business functions and use cases. Pharmaceutical research.

Data-driven

Data-driven Data Lake Data Warehouse Machine Learning

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

A CIO’s first rule for automation: Have a clear business case

CIO Business Intelligence

MARCH 2, 2023

A catalyst to make this happen will be the ongoing improvements in AI-enabled data capture. Fast and accurate data extraction will speed up transactions and automation capabilities, and be the foundational technology within any business intelligence or data analytics platform, enabling better collaboration and B2B communications, he says.

Data Lake

Data Lake Forecasting B2B Optimization

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data.

Analytics

Analytics Data Lake Metadata Data Warehouse

The New Normal for FP&A: Data Analytics

Jedox

OCTOBER 22, 2020

The term “data analytics” refers to the process of examining datasets to draw conclusions about the information they contain. Data analysis techniques enhance the ability to take raw data and uncover patterns to extract valuable insights from it. Data analytics is not new.

Data Analytics

Data Analytics Analytics Unstructured Data Data mining

TransUnion transforms its business model with IT

CIO Business Intelligence

APRIL 26, 2024

billion acquisition of data and analytics company Neustar in 2021, TransUnion has expanded into other services such as marketing, fraud detection and prevention, and robust analytical services. We’re modernizing existing products to get to this entire data analytics value chain.” But following its $3.1

Modeling

Modeling IT Machine Learning Data Governance

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. AWS also offers developers the technology to develop smart apps using machine learning and complex algorithms.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Enterprise customers architect their data workloads on AWS. Xiao Qin is a senior applied scientist with the Learned Systems Group (LSG) at Amazon Web Services (AWS).

Metadata

Metadata Sales Data Warehouse Optimization

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Recap of Amazon Redshift key product announcements in 2024

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Choosing an open table format for your transactional data lake on AWS

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Monitor data pipelines in a serverless data lake

What is a Data Mesh?

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Azure Data Sources for Data Science and Machine Learning

Eight Top DataOps Trends for 2022

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How Data Analytics Tools Eliminate Business Owner Headaches

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

Building a Beautiful Data Lakehouse

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

5 Best Practices for Extracting, Analyzing, and Visualizing Data

7 key Microsoft Azure analytics services (plus one extra)

Data science vs data analytics: Unpacking the differences

Regeneron turns to IT to accelerate drug discovery

Carhartt turns to data under new CIO

How Salesforce optimized their detection and response platform using AWS managed services

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

The Future of the Data Lakehouse – Open

An AI Chat Bot Wrote This Blog Post …

The Future of the Data Lakehouse – Open

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

A CIO’s first rule for automation: Have a clear business case

Top analytics announcements of AWS re:Invent 2024

The New Normal for FP&A: Data Analytics

TransUnion transforms its business model with IT

10 Things AWS Can Do for Your SaaS Company

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Stay Connected