Data Integration, Data Lake and Data Quality

Data Integration

Data Lake

Data Quality

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Data Management on Display at Informatica World 2019

David Menninger's Analyst Perspectives

JUNE 12, 2019

Under that focus, Informatica's conference emphasized capabilities across six areas (all strong areas for Informatica): data integration, data management, data quality & governance, Master Data Management (MDM), data cataloging, and data security.

Management

Management Data Quality Data Integration Data Lake

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Publishing Data Governance

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Big Data

MAY 23, 2024

Hundreds of thousands of organizations build data integration pipelines to extract and transform data. They establish data quality rules to ensure the extracted data is of high quality for accurate business decisions. We also show how to take action based on the data quality results.

Data Quality

Data Quality Metrics Sales Data Lake

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

Ensuring that data is available, secure, correct, and fit for purpose is neither simple nor cheap. Companies end up paying outside consultants enormous fees while still having to suffer the effects of poor data quality and lengthy cycle time. . For example, DataOps can be used to automate data integration.

Consulting

Consulting Testing Data Lake Data Quality

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor freshness, schema changes, volume, and column health are standard.

Data Quality

Data Quality Testing Data Lake Data Integration

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Architecture

Data Architecture Data Quality Strategy Data Lake

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Working with large language models (LLMs) for enterprise use cases requires the implementation of quality and privacy considerations to drive responsible AI. However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed. Addressing the Challenge.

Data Governance

Data Governance IT Risk Data Lake

What is an Information Steward, and Why You Should Care

Grooper

MARCH 5, 2020

These stewards monitor the input and output of data integrations and workflows to ensure data quality. Their focus is on master data management , data lakes / warehouses, and ensuring the trackability of data using audit trails and metadata. How to Get Started with Information Stewardship.

Data Lake

Data Lake Metadata Data Quality Software

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless.

Analytics

Analytics IT Data Lake Visualization

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Outsourcing these data management efforts to professional services firms only delays schedules and increases costs. With automation, data quality is systemically assured. The data pipeline is seamlessly governed and operationalized to the benefit of all stakeholders. Digital Transformation Strategy: Smarter Data.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Selling the value of data transformation Iyengar and his team are 18 months into a three- to five-year journey that started by building out the data layer — corralling data sources such as ERP, CRM, and legacy databases into data warehouses for structured data and data lakes for unstructured data.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

Which type(s) of storage consolidation you use depends on the data you generate and collect. . One option is a data lake—on-premises or in the cloud—that stores unprocessed data in any type of format, structured or unstructured, and can be queried in aggregate. Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Data quality for account and customer data – Altron wanted to enable data quality and data governance best practices. Goals – Lay the foundation for a data platform that can be used in the future by internal and external stakeholders.

Optimization

Optimization B2B Data Quality Sales

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. The application sends the prompt to Amazon Bedrock and retrieves the LLM output.

Management

Management Analytics Data Lake Interactive

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? What’s a data mesh?

Management

Management Metadata Data Architecture Data Lake

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. option("header",True).schema(schema).load("s3://"+

Insurance

Insurance Data Lake Data-driven Management

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Data Virtualization

JANUARY 21, 2025

The post OReilly Releases First Chapters of a New Book about Logical Data Management appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Gartner predicts that by the end of this year, 30%.

Management

Management Data Integration Technology Data Warehouse

Migrate workloads from AWS Data Pipeline

AWS Big Data

JULY 25, 2024

Migrating workloads to AWS Glue AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources. You can visually create, run, and monitor ETL pipelines to load data into your data lakes.

Visualization

Visualization Management Data Integration Testing

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

Modeling, Modernization and Automation

BI-Survey

APRIL 27, 2023

While most continue to struggle with data quality issues and cumbersome manual processes, best-in-class companies are making improvements with commercial automation tools. The data vault has strong adherents among best-in-class companies, even though its usage lags the alternative approaches of third-normal-form and star schema.

Modeling

Modeling Data Warehouse Data Quality Business Driver

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. By using DataOps, organizations can improve. Query> When do DataOps?

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Improved Decision Making : Well-modeled data provides insights that drive informed decision-making across various business domains, resulting in enhanced strategic planning. Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies.

Data-driven

Data-driven Modeling Enterprise Structured Data

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

Data Pipeline Use Cases Here are just a few examples of the goals you can achieve with a robust data pipeline: Data Prep for Visualization Data pipelines can facilitate easier data visualization by gathering and transforming the necessary data into a usable state.

Data Lake

Data Lake Data Governance Data Warehouse Data Processing

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

By leveraging data services and APIs, a data fabric can also pull together data from legacy systems, data lakes, data warehouses and SQL databases, providing a holistic view into business performance. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Dashboards

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

If your team has easy-to-use tools and features, you are much more likely to experience the user adoption you want and to improve data literacy and data democratization across the organization. Sophisticated Functionality – Don’t sacrifice functionality to get ease-of-use.

Data Lake

Data Lake Machine Learning Data Integration Optimization

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Ensuring data quality is made easier as a result.

Metadata

Metadata Data Quality Data-driven Data Governance

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

As a design concept, data fabric requires a combination of existing and emergent data management technologies beyond just metadata. Data fabric does not replace data warehouses, data lakes, or data lakehouses. This applies policies based on consumer profiles to automate policy enforcements.

Metadata

Metadata Data-driven Data Architecture Data Quality

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Equally crucial is the ability to segregate and audit problematic data, not just for maintaining data integrity, but also for regulatory compliance, error analysis, and potential data recovery. One of its key features is the ability to manage data using branches.

Data Quality

Data Quality Publishing Snapshot Data Lake

How Data Governance Supports Analytics

Alation

JANUARY 27, 2022

Creating a single view of any data, however, requires the integration of data from disparate sources. Data integration is valuable for businesses of all sizes due to the many benefits of analyzing data from different sources. But data integration is not trivial. Establishes Trust in Data.

Data Governance

Data Governance Analytics Cost-Benefit Data-driven

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Webinars

Trending Sources

Talend Data Fabric Simplifies Data Life Cycle Management

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

Data Management on Display at Informatica World 2019

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Get started with AWS Glue Data Quality dynamic rules for ETL pipelines

AWS Glue Data Quality is Generally Available

Fire Your Super-Smart Data Consultants with DataOps

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

How EUROGATE established a data mesh architecture using Amazon DataZone

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Data architecture strategy for data quality

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Data governance in the age of generative AI

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

What is an Information Steward, and Why You Should Care

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Straumann Group is transforming dentistry with data, AI

Create an end-to-end data strategy for Customer 360 on AWS

Your 5-Step Journey from Analytics to AI

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Differentiate generative AI applications with your data using AWS analytics and managed databases

Augmented data management: Data fabric versus data mesh

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

O’Reilly Releases First Chapters of a New Book about Logical Data Management

Migrate workloads from AWS Data Pipeline

How Knowledge Graphs Power Data Mesh and Data Fabric

Data Mesh vs. Data Fabric: A Love Story

Modeling, Modernization and Automation

An AI Chat Bot Wrote This Blog Post …

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Data democratization: How data architecture can drive business decisions and AI initiatives

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Five benefits of a data catalog

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

How Data Governance Supports Analytics

Stay Connected