Data Integration, Data Lake and Data Transformation

Data Integration

Data Lake

Data Transformation

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Amazon Q data integration , introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

Additionally, integrating mainframe data with the cloud enables enterprises to feed information into data lakes and data lake houses, which is ideal for authorized data professionals to easily leverage the best and most modern tools for analytics and forecasting.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. While real-time data is processed by other applications, this setup maintains high-performance analytics without the expense of continuous processing.

IoT

IoT Machine Learning Metadata Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

Now you can author data preparation transformations and edit them with the AWS Glue Studio visual editor. The AWS Glue Studio visual editor is a graphical interface that enables you to create, run, and monitor data integration jobs in AWS Glue. You can configure all these steps in the visual editor in AWS Glue Studio.

Interactive

Interactive Data Integration Visualization Statistics

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU).

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. Azure Blob Storage serves as the data lake to store raw data.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

As organizations increasingly rely on data stored across various platforms, such as Snowflake , Amazon Simple Storage Service (Amazon S3), and various software as a service (SaaS) applications, the challenge of bringing these disparate data sources together has never been more pressing.

Analytics

Analytics Data-driven Data Integration Data Lake

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless. Reduce the waiting period to 7 days and schedule the deletion.

Analytics

Analytics IT Data Lake Visualization

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. Kamen Sharlandjiev is a Sr.

Data Processing

Data Processing Visualization Data Lake Data Processing

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

For these, AWS Glue provides fast, scalable data transformation. Third, AWS continues adding support for more data sources including connections to software as a service (SaaS) applications, on-premises applications, and other clouds so organizations can act on their data. Visit Data integration with AWS to learn more.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Turning the page

Cloudera

JUNE 1, 2021

These acquisitions usher in a new era of “ self-service ” by automating complex operations so customers can focus on building great data-driven apps instead of managing infrastructure. Datacoral powers fast and easy data transformations for any type of data via a robust multi-tenant SaaS architecture that runs in AWS.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

If your team has easy-to-use tools and features, you are much more likely to experience the user adoption you want and to improve data literacy and data democratization across the organization. Sophisticated Functionality – Don’t sacrifice functionality to get ease-of-use.

Data Lake

Data Lake Machine Learning Data Integration Optimization

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. As a result, alternative data integration technologies (e.g., CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. Everybody’s trying to solve this same problem (of leveraging mountains of data), but they’re going about it in slightly different ways.

Metadata

Metadata Data Warehouse Data Quality Data Lake

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The company started its New Analytics Era initiative by migrating its data from outdated SQL servers to a modern AWS data lake. It then built a cutting-edge cloud-based analytics platform, designed with an innovative data architecture.

IT Insurance Cost-Benefit Testing

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping is important for several reasons.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

More companies have realized there is an opportunity to integrate, enhance, and present this SaaS data to improve internal operations and gain valuable insights on their data. From there, they can perform meaningful analytics, gain valuable insights, and optionally push enriched data back to external SaaS platforms.

Data Lake

Data Lake Testing Data Integration Metadata

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

This configuration allows you to augment your sensitive on-premises data with cloud data while making sure all data processing and compute runs on-premises in AWS Outposts Racks. Additionally, we show you how to submit batch jobs to Amazon EMR using EMR steps for automated, scheduled data processing.

Big Data

Big Data Data Analytics Analytics Interactive

Data Leaders Brief

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Introducing Amazon Q data integration in AWS Glue

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

How EUROGATE established a data mesh architecture using Amazon DataZone

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data’s dark secret: Why poor quality cripples AI and growth

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Straumann Group is transforming dentistry with data, AI

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Use AWS Glue to streamline SFTP data processing

Unlock scalable analytics with AWS Glue and Google BigQuery

Connect your data for faster decisions with AWS

Turning the page

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

An AI Chat Bot Wrote This Blog Post …

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Addressing the Three Scalability Challenges in Modern Data Platforms

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

CIO 100 Award winners drive business results with IT

What is a Data Pipeline?

What is Data Mapping?

Introducing the HubSpot connector for AWS Glue

Hybrid big data analytics with Amazon EMR on AWS Outposts

Stay Connected