Data Integration, Data Lake and Software

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Amazon Q data integration , introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Founded as Software Development Laboratories in 1977, Oracle is a behemoth in the software industry, generating more than $50 billion in revenue in its fiscal year 2024. Originally focused solely on the relational database market, the software provider operated as Relational Systems, Inc.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Accelerate data integration with Salesforce and AWS using AWS Glue

AWS Big Data

SEPTEMBER 4, 2024

The rapid adoption of software as a service (SaaS) solutions has led to data silos across various platforms, presenting challenges in consolidating insights from diverse sources. This solution also allows you to update certain fields of the account object in the data lake and push it back to Salesforce.

Data Integration

Data Integration Data Lake Data-driven Cost-Benefit

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

Unified access to your data is provided by Amazon SageMaker Lakehouse , a unified, open, and secure data lakehouse built on Apache Iceberg open standards. Now, theyre able to build and collaborate with their data and tools available in one experience, dramatically reducing time-to-value.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Giving the mobile workforce access to this data via the cloud allows them to be productive from anywhere, fosters collaboration, and improves overall strategic decision-making.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Collaborate and build faster using familiar AWS tools for model development, generative AI, data processing, and SQL analytics with Amazon Q Developer , the most capable generative AI assistant for software development, helping you along the way. The tools to transform your business are here.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Solving the small file problem and improving query performance In modern data architectures, stream processing engines such as Amazon EMR are often used to ingest continuous streams of data into data lakes using Apache Iceberg. Iceberg provides several maintenance operations to keep your tables in good shape.

Data Lake

Data Lake Metadata Snapshot Analytics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

AWS Big Data

JANUARY 26, 2023

AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. AWS Glue provides an extensible architecture that enables users with different data processing use cases.

Data Lake

Data Lake Big Data Software Interactive

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

Now you can author data preparation transformations and edit them with the AWS Glue Studio visual editor. The AWS Glue Studio visual editor is a graphical interface that enables you to create, run, and monitor data integration jobs in AWS Glue. She is passionate about helping customers build data lakes using ETL workloads.

Interactive

Interactive Visualization Data Integration Statistics

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

How will organizations wield AI to seize greater opportunities, engage employees, and drive secure access without compromising data integrity and compliance? While it may sound simplistic, the first step towards managing high-quality data and right-sizing AI is defining the GenAI use cases for your business.

Strategy

Strategy Modeling Management Data Lake

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. He has around 20 years of software development and architecture experience.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

AWS Big Data

NOVEMBER 22, 2024

About the Authors Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. Pradeep Patel is a Software Development Manager on the AWS Glue team. Chuhan Liu is a Software Engineer at AWS Glue.

Cost-Benefit

Cost-Benefit Data-driven Software Testing

ESG software: 6 tips for selecting the best fit for your business

CIO Business Intelligence

FEBRUARY 22, 2024

“Ultimately, CIOs may increasingly be held accountable for the veracity of the reporting, the third-party assurance of the data, and ensuring their organizations’ compliant disclosures align with their corporate ESG goals.” That’s where the single source of truth comes into perspective and increases performance,” Karcher says.

Software

Software Reporting KPI Enterprise

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Data Warehouse Consulting

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Data fabric and data mesh are also both related to logical data management, which is the approach of providing virtualized access to data across an enterprise without the requirement to first extract and load it into a central repository.

Management

Management Data-driven Data Governance Data Lake

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

AWS Big Data

AUGUST 22, 2024

The infrastructure provides an analytics experience to hundreds of in-house analysts, data scientists, and student-facing frontend specialists. The data engineering team is on a mission to modernize its data integration platform to be agile, adaptive, and straightforward to use.

Data Warehouse

Data Warehouse Data Lake Data Integration Management

What CEOs really need from today’s CIOs

CIO Business Intelligence

AUGUST 3, 2022

In today’s data economy, in which software and analytics have emerged as the key drivers of business, CEOs must rethink the silos and hierarchies that fueled the businesses of the past. They can no longer have “technology people” who work independently from “data people” who work independently from “sales” people or from “finance.”

Finance

Finance IoT Digital Transformation Sales

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

The architecture is comprised of a number of components: Source data Data may be coming from many tens to hundreds of sources, including databases, file transfers, logs, software as a service (SaaS) applications, and more. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance. Under Administration , choose Data catalog settings.

Data Lake

Data Lake Snapshot Metadata Optimization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation. The shift away from ‘Software 1.0’ where applications have been based on hard-coded rules has begun and the ‘Software 2.0’ era is upon us. Addressing the Challenge.

Data Governance

Data Governance IT Risk Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

It enables data engineers, data scientists, and analytics engineers to define the business logic with SQL select statements and eliminates the need to write boilerplate data manipulation language (DML) and data definition language (DDL) expressions.

Data Lake

Data Lake Management Metrics Data Warehouse

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. Distribute cloud data: erwin DI’s Business User Portal provides self-service access to cloud data asset discovery and reporting tools.

Data Governance

Data Governance Metadata Testing Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. About the Authors Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU).

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

AWS Big Data

MARCH 20, 2023

In the first post of this series , we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. Even without prior experience using Hudi, Delta Lake or Iceberg, you can easily achieve typical use cases.

Visualization

Visualization Data Lake Snapshot Big Data

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

It is noteworthy that business users in particular consider the inability to provide required data and the lack of user acceptance as even more important than enhanced self-service. In particular executives (31 percent) and business intelligence/analytics teams (30 percent) agree that software licenses are too expensive in general.

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

This typically requires a data warehouse for analytics needs that is able to ingest and handle real time data of huge volumes. Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts allowing secure data sharing across the organization.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

What is an Information Steward, and Why You Should Care

Grooper

MARCH 5, 2020

If your organization has any kind of data and analytics initiative, then chances are you have people – maybe even an entire department dedicated to managing and integrating data for (and between) software applications to achieve some sort of business outcome. Is a Power-User or a Data Scientist an Information Steward?

Data Lake

Data Lake Metadata Data Quality Software

5 financial planning software capabilities that drive business value

Jedox

JANUARY 13, 2023

Now finance teams are looking for more efficient and flexible planning that encourages a “total company mindset,” according to the Gartner 2022 Critical Capabilities for Financial Planning Software report. This reflects Jedox’s ability to adjust data models to incorporate operational planning changes,” according to Gartner analysts.

Software

Software Finance Forecasting Data Lake

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

It has been well published since the State of DevOps 2019 DORA Metrics were published that with DevOps, companies can deploy software 208 times more often and 106 times faster, recover from incidents 2,604 times faster, and release 7 times fewer defects. For users that require a unified view of software quality, this is unacceptable.

Software

Software Data Lake Testing Cost-Benefit

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Webinars

Trending Sources

Oracle Wants to Be the Database for AI

Webinars

Introducing Amazon Q data integration in AWS Glue

Load data incrementally from transactional data lakes to data warehouses

Talend Data Fabric Simplifies Data Life Cycle Management

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Accelerate data integration with Salesforce and AWS using AWS Glue

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Bridging the gap between mainframe data and hybrid cloud environments

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

What is data architecture? A framework to manage data

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

The success of GenAI models lies in your data management strategy

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

ESG software: 6 tips for selecting the best fit for your business

Scaling RISE with SAP data and AWS Glue

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Denodo Provides a Logical Approach to Data Management

Top 15 data management platforms

How Kaplan, Inc. implemented modern data pipelines using Amazon MWAA and Amazon AppFlow with Amazon Redshift as a data warehouse

What CEOs really need from today’s CIOs

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Introducing Apache Hudi support with AWS Glue crawlers

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Doing Cloud Migration and Data Governance Right the First Time

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Data’s dark secret: Why poor quality cripples AI and growth

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Modernizing the Data Warehouse: Challenges and Benefits

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

What is an Information Steward, and Why You Should Care

5 financial planning software capabilities that drive business value

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Stay Connected