Data Integration, Data Lake and Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. It enables teams to securely find, prepare, and collaborate on data assets and build analytics and AI applications through a single experience, accelerating the path from data to value.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. Customers are using AWS and Snowflake to develop purpose-built data architectures that provide the performance required for modern analytics and artificial intelligence (AI) use cases.

Data Lake

Data Lake Snapshot Metadata Data Architecture

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

The rapid adoption of software as a service (SaaS) solutions has led to data silos across various platforms, presenting challenges in consolidating insights from diverse sources. Introducing the Salesforce connector for AWS Glue To meet the demands of diverse data integration use cases, AWS Glue now supports SaaS connectivity for Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

Analytics are prone to frequent data errors and deployment of analytics is slow and laborious. When internal resources fall short, companies outsource data engineering and analytics. There’s no shortage of consultants who will promise to manage the end-to-end lifecycle of data from integration to transformation to visualization. .

Consulting

Consulting Testing Data Lake Data Quality

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Data is the foundation of innovation, agility and competitive advantage in todays digital economy. As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Data quality is no longer a back-office concern.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

AWS Big Data

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month on AWS, moving, processing, and preparing data for analytics and machine learning. Data practitioners need to upgrade to the latest Spark releases to benefit from performance improvements, new features, bug fixes, and security enhancements. Original code (Glue 2.0)

Cost-Benefit

Cost-Benefit Data-driven Software Testing

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Although the terms data fabric and data mesh are often used interchangeably, I previously explained that they are distinct but complementary. The popularity of data fabric and data mesh has highlighted the importance of software providers, such as Denodo, that utilize data virtualization to enable logical data management.

Management

Management Data-driven Data Governance Data Lake

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

The New Data Integration Requirements

In(tegrate) the Clouds

JUNE 23, 2016

This week SnapLogic posted a presentation of the 10 Modern Data Integration Platform Requirements on the company’s blog. They are: Application integration is done primarily through REST & SOAP services. Large-volume data integration is available to Hadoop-based data lakes or cloud-based data warehouses.

Data Integration

Data Integration Data Lake Data Warehouse Data-driven

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

At Atlanta’s Hartsfield-Jackson International Airport, an IT pilot has led to a wholesale data journey destined to transform operations at the world’s busiest airport, fueled by machine learning and generative AI. He is a very visual person, so our proof of concept collects different data sets and ingests them into our Azure data house.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. The company stores vast amounts of transactional data, customer information, and product catalogs in Snowflake.

Analytics

Analytics Data-driven Data Integration Data Lake

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

For those in the data world, this post provides a curated guide for all analytics sessions that you can use to quickly schedule and build your itinerary. A shapeshifting guardian and protector of data like Data Lynx? Or a digitally clairvoyant master of data insights like Cloud Sight?

Analytics

Analytics Data Lake Data Warehouse Data-driven

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. This need has generated a market opportunity for a universal data distribution service. Why does every organization need it when using a modern data stack?

Enterprise

Enterprise Data Lake Data Collection Data-driven

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

So if you’re going to move from your data from on-premise legacy data stores and warehouse systems to the cloud, you should do it right the first time. And as you make this transition, you need to understand what data you have, know where it is located, and govern it along the way. Then you must bulk load the legacy data.

Data Governance

Data Governance Metadata Testing Data Lake

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. Undetected errors result in bad data and impact downstream analysis.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation. Software development, once solely the domain of human programmers, is now increasingly the by-product of data being carefully selected, ingested, and analysed by machine learning (ML) systems in a recurrent cycle.

Data Governance

Data Governance IT Risk Data Lake

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Once you’ve determined what part(s) of your business you’ll be innovating — the next step in a digital transformation strategy is using data to get there. Constructing A Digital Transformation Strategy: Data Enablement. Many organizations prioritize data collection as part of their digital transformation strategy.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

Data warehousing is getting on in years. However, data warehousing and BI applications are only considered moderately successful. Advanced analytics and new ways of working with data also create new requirements that surpass the traditional concepts. Can the basic nature of the data be proactively improved?

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

In today’s world that is largely data-driven, organizations depend on data for their success and survival, and therefore need robust, scalable data architecture to handle their data needs. This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

The partners say they will create the future of digital manufacturing by leveraging the industrial internet of things (IIoT), digital twin , data, and AI to bring products to consumers faster and increase customer satisfaction, all while improving productivity and reducing costs. Data and AI as digital fundamentals.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue. Choose Add new data source.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

Data is at the center of every application, process, and business decision. When data is used to improve customer experiences and drive innovation, it can lead to business growth. According to Forrester , advanced insights-driven businesses are 8.5 Firstly, it invariably requires data engineers to create custom code.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. This need has generated a market opportunity for a universal data distribution service. Why does every organization need it when using a modern data stack?

Enterprise

Enterprise Data Lake Data Collection Data-driven

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

Most organizations have come to understand the importance of being data-driven. To compete in a digital economy, it’s essential to base decisions and actions on accurate data, both real-time and historical. But the sheer volume of the world’s data is expected to nearly triple between 2020 and 2025 to a whopping 180 zettabytes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog.

Data Processing

Data Processing Visualization Data Lake Data Processing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Recap of Amazon Redshift key product announcements in 2024

Webinars

Trending Sources

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Webinars

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

How EUROGATE established a data mesh architecture using Amazon DataZone

Accelerate data integration with Salesforce and AWS using AWS Glue

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Fire Your Super-Smart Data Consultants with DataOps

Data’s dark secret: Why poor quality cripples AI and growth

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Scaling RISE with SAP data and AWS Glue

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

Build a high-performance quant research platform with Apache Iceberg

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Denodo Provides a Logical Approach to Data Management

Top analytics announcements of AWS re:Invent 2024

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

The New Data Integration Requirements

Compose your ETL jobs for MongoDB Atlas with AWS Glue

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Top 15 data management platforms

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Your guide to AWS Analytics at AWS re:Invent 2023

Moving Enterprise Data From Anywhere to Any System Made Easy

Doing Cloud Migration and Data Governance Right the First Time

AWS re:Invent 2023 Amazon Redshift Sessions Recap

An AI Chat Bot Wrote This Blog Post …

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Modernizing the Data Warehouse: Challenges and Benefits

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

P&G turns to AI to create digital manufacturing of the future

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Unlock scalable analytics with AWS Glue and Google BigQuery

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Moving Enterprise Data From Anywhere to Any System Made Easy

Your 5-Step Journey from Analytics to AI

Use AWS Glue to streamline SFTP data processing

Stay Connected