Data Analytics, Data Lake and Management

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. source_s3_bucket – The raw S3 bucket name. S3FileIO").getOrCreate()

Data Lake

Data Lake Data Processing Optimization Machine Learning

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Customers use data lake tables to achieve cost effective storage and interoperability with other tools. The sample files are ‘|’ delimited text files.

Data Lake

Data Lake Data Warehouse Optimization Testing

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

At AWS, we are committed to empowering organizations with tools that streamline data analytics and transformation processes. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data Analytics in the Cloud for Developers and Founders

Speaker: Javier Ramírez, Senior AWS Developer Advocate, AWS

Will the data lake scale when you have twice as much data? Is your data secure? In this session, we address common pitfalls of building data lakes and show how AWS can help you manage data and analytics more efficiently. A live demo of lake formation.

Data Lake

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

The company focused on delivering small increments of customer value data sets, reports, and other items as their guiding principle. Small, manageable increments marked the projects delivery cadence. They opted for Snowflake, a cloud-native data platform ideal for SQL-based analysis. See the graph below.

Data Quality

Data Quality Data Lake Testing Statistics

Informatica Continues to Evolve Data Management Platform

David Menninger's Analyst Perspectives

APRIL 8, 2021

Organizations are accelerating their digital transformation and looking for innovative ways to engage with customers in this new digital era of data management. The challenge is to ensure that processes, applications and data can still be integrated across cloud and on-premises systems.

Management

Management Digital Transformation Business Intelligence Internet of Things

7 Key Benefits of Proper Data Lake Ingestion

Smart Data Collective

APRIL 24, 2020

It’s impossible to deny the importance of data in several industries, but that data can get overwhelming if it isn’t properly managed. The problem is that managing and extracting valuable insights from all this data needs exceptional data collecting, which makes data ingestion vital.

Data Lake

Data Lake Data Collection Deep Learning Management

Important Considerations When Migrating to a Data Lake

Smart Data Collective

MARCH 30, 2022

Azure Data Lake Storage Gen2 is based on Azure Blob storage and offers a suite of big data analytics features. If you don’t understand the concept, you might want to check out our previous article on the difference between data lakes and data warehouses. Determine your preparedness.

Data Lake

Data Lake Cost-Benefit Data Warehouse Big Data

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Amazon Redshift has established itself as a highly scalable, fully managed cloud data warehouse trusted by tens of thousands of customers for its superior price-performance and advanced data analytics capabilities. This allows you to maintain a comprehensive view of your data while optimizing for cost-efficiency.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale. and Delta Lake 2.3.0. Apache Iceberg 1.2.0,

Data Lake

Data Lake Metadata Statistics Optimization

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. This user can only query data from ADLS.

Data Lake

Data Lake Analytics Cost-Benefit Management

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Apache Iceberg integration is supported by AWS analytics services including Amazon EMR , Amazon Athena , and AWS Glue. AWS Glue 3.0

Data Lake

Data Lake Data Processing Metadata Snapshot

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. This is where Amazon Bedrock comes into play.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Cloud computing has made it much easier to integrate data sets, but that’s only the beginning. Creating a data lake has become much easier, but that’s only ten percent of the job of delivering analytics to users. It often takes months to progress from a data lake to the final delivery of insights.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

The open table format accelerates companies’ adoption of a modern data strategy because it allows them to use various tools on top of a single copy of the data. A solution based on Apache Iceberg encompasses complete data management, featuring simple built-in table optimization capabilities within an existing storage solution.

Data Lake

Data Lake Metadata Snapshot Analytics

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Cost-Benefit Testing

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards.

Analytics

Analytics Data Lake Data Warehouse Data-driven

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Data Gets Meshier. 2022 will bring further momentum behind modular enterprise architectures like data mesh. The data mesh addresses the problems characteristic of large, complex, monolithic data architectures by dividing the system into discrete domains managed by smaller, cross-functional teams.

Testing

Testing Data Lake Data Architecture Manufacturing

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

If your company is using Microsoft Dynamics AX, you’ll be aware of the company’s shift to Microsoft Dynamics 365 Finance and Supply Chain Management (D365 F&SCM). Option 3: Azure Data Lakes. This leads us to Microsoft’s apparent long-term strategy for D365 F&SCM reporting: Azure Data Lakes.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. datazone_env_twinsimsilverdata"."cycle_end";')

IoT

IoT Machine Learning Metadata Data-driven

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

Amazon SageMaker Unified Studio (preview) provides a unified experience for using data, analytics, and AI capabilities. You can use familiar AWS services for model development, generative AI, data processing, and analyticsall within a single, governed environment.

Data Analytics

Data Analytics Analytics Modeling Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Another challenge is how to manage ordered data dependencies.

Data Warehouse

Data Warehouse Data Lake Manufacturing Testing

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

AWS Big Data

FEBRUARY 14, 2023

Organizations have chosen to build data lakes on top of Amazon Simple Storage Service (Amazon S3) for many years. A data lake is the most popular choice for organizations to store all their organizational data generated by different teams, across business domains, from all different formats, and even over history.

Data Lake

Data Lake Statistics Data Architecture Finance

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

When internal resources fall short, companies outsource data engineering and analytics. There’s no shortage of consultants who will promise to manage the end-to-end lifecycle of data from integration to transformation to visualization. . The challenge is that data engineering and analytics are incredibly complex.

Consulting

Consulting Testing Data Lake Data Quality

Race Ahead of Threats with a Security Data Lake

CDW Research Hub

APRIL 18, 2022

Many security operations centers (SOCs) are finding themselves overwhelmed by telemetry data to correlate, a proliferation of tools, expanding attack surfaces that are challenging to monitor and secure, and data silos across security and IT products, security information and event management (SIEM) systems, enterprise data, and threat intelligence.

Data Lake

Data Lake Data Collection Enterprise Risk

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Amazon OpenSearch Service is a fully managed service offered by AWS that enables you to deploy, operate, and scale OpenSearch domains effortlessly. OpenSearch is a distributed search and analytics engine, which is an open-source project. This makes sure only authorized entities can create, manage, or restore snapshots.

Snapshot

Snapshot Strategy Dashboards Data Lake

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Managing projects with Jira leads to rich datasets, which can provide historical and predictive insights about project and development efforts. An AWS account and a login with access to the AWS Management Console.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

Their business unit colleagues ask an endless stream of urgent questions that require analytic insights. Business analysts must rapidly deliver value and simultaneously manage fragile and error-prone analytics production pipelines. Analytics Hub and Spoke. Teams under the CDO and CAO are sometimes separate from the CIO.

Business Analytics

Business Analytics Analytics Testing Dashboards

How Salesforce optimized their detection and response platform using AWS managed services

AWS Big Data

APRIL 18, 2024

is a cloud-based customer relationship management (CRM) software company building artificial intelligence (AI)-powered business applications that allow businesses to connect with their customers in new and personalized ways. The data lake consumers then use Apache Presto running on Amazon EMR cluster to perform one-time queries.

Optimization

Optimization Data Lake Management Key Performance Indicator

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

Implementing a data mesh does not require you to throw away your existing architecture and start over. The data industry has a wide variety of approaches and philosophies for managing data: Inman data factory, Kimball methodology, s tar schema , or the data vault pattern, which can be a great way to store and organize raw data, and more.

Testing

Testing Data Lake Metadata Publishing

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Big Data

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations. A distributed data mesh is a better choice. the data scientist, the engineer, and the operations engineer).

Data-driven

Data-driven Data Governance Big Data Data Science

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data Analytics in the Cloud for Developers and Founders

Drug Launch Case Study: Amazing Efficiency Using DataOps

Informatica Continues to Evolve Data Management Platform

7 Key Benefits of Proper Data Lake Ingestion

Important Considerations When Migrating to a Data Lake

Recap of Amazon Redshift key product announcements in 2024

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Choosing an open table format for your transactional data lake on AWS

Multicloud data lake analytics with Amazon Athena

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

Use Apache Iceberg in a data lake to support incremental data processing

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Enrich your serverless data lake with Amazon Bedrock

Centralize Your Data Processes With a DataOps Process Hub

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Monitor data pipelines in a serverless data lake

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

What is a Data Mesh?

Eight Top DataOps Trends for 2022

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

How EUROGATE established a data mesh architecture using Amazon DataZone

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Implementing a Pharma Data Mesh using DataOps

Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue

Fire Your Super-Smart Data Consultants with DataOps

Race Ahead of Threats with a Security Data Lake

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

DataOps For Business Analytics Teams

How Salesforce optimized their detection and response platform using AWS managed services

Addressing Data Mesh Technical Challenges with DataOps

Building a Beautiful Data Lakehouse

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

2021 Gift Giving Guide for Data Nerds

Stay Connected