Data Lake, Data Quality and Events

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

data engineers delivered over 100 lines of code and 1.5 data quality tests every day to support a cast of analysts and customers. They opted for Snowflake, a cloud-native data platform ideal for SQL-based analysis. It is necessary to have more than a data lake and a database.

Data Quality

Data Quality Data Lake Testing Statistics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Yet, despite growing investments in advanced analytics and AI, organizations continue to grapple with a persistent and often underestimated challenge: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? Data warehouses do a great job of standardizing data from disparate sources for analysis. Taking a Dip.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

These formats, exemplified by Apache Iceberg, Apache Hudi, and Delta Lake, addresses persistent challenges in traditional data lake structures by offering an advanced combination of flexibility, performance, and governance capabilities. Tags help address this by allowing you to point to specific snapshots with arbitrary names.

Snapshot

Snapshot Metadata Data Lake Optimization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. These conflicts are particularly common in large-scale data cleanup operations.

Snapshot

Snapshot Management Metadata Big Data

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

NOVEMBER 18, 2022

As he thinks through the various journeys that data take in his company, Jason sees that his dashboard idea would require extracting or testing for events along the way. So, the only way for a data journey to truly observe what’s happening is to get his tools and pipelines to auto-report events. Data and tool tests.

Testing

Testing Statistics Measurement Metrics

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Porsche Carrera Cup Brasil gets real-time data boost

CIO Business Intelligence

MAY 21, 2024

Unlike many other events, which consist of multiple racing teams and manufacturers, Porsche Carrera Cup Brasil provides and maintains all 75 cars used in the race. If I don’t do predictive maintenance, if I have to do corrective maintenance at events, a lot of money is wasted.”

Broadcasting

Broadcasting Recreation/Entertainment Manufacturing Data Lake

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

The DataOps pipeline you have built has enough automated tests to catch errors, and error events are tied to some form of real-time alerts. Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments.

Testing

Testing Metadata Dashboards Statistics

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years. The ability to pivot quickly to address rapidly changing customer or market demands is driving the need for real-time data.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Data: the foundation of your foundation model Data quality matters. An AI model trained on biased or toxic data will naturally tend to produce biased or toxic outputs. When objectionable data is identified, we remove it, retrain the model, and repeat. Data curation is a task that’s never truly finished.

Enterprise

Enterprise Technology Modeling Cost-Benefit

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data. You need to process this to make it ready for analysis.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

There were thousands of attendees at the event – lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data. This was the gold rush of the 21st century, except the gold was data.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

As part of their cloud modernization initiative, they sought to migrate and modernize their legacy data platform. This process has been scheduled to run daily, ensuring a consistent batch of fresh data for analysis. AWS Glue – AWS Glue is used to load files into Amazon Redshift through the S3 data lake.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. A key area of focus for the symposium this year was the design and deployment of modern data platforms.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

These models allow us to predict failures early, and we forecast a 20% reduction in furnace unplanned events, improving repair times by at least two days. We’ve built digital twins for several furnaces we operate across the globe, and we currently have 70 AI models running on those furnaces. So AI helps us have fewer emergencies.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

The Role of the Data Catalog in Data Security

Alation

JUNE 14, 2021

Breaches are resumé generating events.”. Dan Kirsch, Analyst, Hurwitz Associates, agrees that CISOs must take responsibility, when he says that “data protection is absolutely part of the CISO’s job. These are essential to enabling a more rapid process of sensitive data discovery. It seems that way these days.

Data Governance

Data Governance Recreation/Entertainment Data Lake Metadata

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Unless, of course, the rest of their data also resides in the Google Cloud. In this post we showcase how we used AWS Glue to move siloed digital analytics data, with inconsistent arrival times, to AWS S3 (our Data Lake) and our central data warehouse (DWH), Snowflake. It consists of full-day and intraday tables.

Analytics

Analytics Data Lake Testing Optimization

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

This data can come from a diverse range of sources, including Internet of Things (IoT) devices, user applications, and logging and telemetry information from applications, to name a few. By harnessing the power of streaming data, organizations are able to stay ahead of real-time events and make quick, informed decisions.

Management

Management Metadata Internet of Things Testing

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. By using DataOps, organizations can improve. Query> When do DataOps?

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

What Is Data Modernization? 5 Benefits Worth Knowing

Alation

APRIL 19, 2022

In that sense, data modernization is synonymous with cloud migration. Modern data architectures, like cloud data warehouses and cloud data lakes , empower more people to leverage analytics for insights more efficiently. Efficient Data Processing. So what’s the appeal of this new infrastructure?

Cost-Benefit

Cost-Benefit Data Governance Manufacturing Data Architecture

Fact-based Decision-making

Peter James Thomas

AUGUST 12, 2018

However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process. These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality. million ± £0.5

Metrics

Metrics Statistics Data Quality Measurement

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

AWS Big Data

JULY 12, 2023

Organizations use their data to extract valuable insights and drive informed business decisions. Amazon Redshift delivers up to five times better price performance than other cloud data warehouses out of the box and helps you keep costs predictable.

Data Warehouse

Data Warehouse Modeling Dashboards Data Lake

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. Run the workflow with default input.

Metadata

Metadata Visualization Data-driven Data Lake

Enhance Trino Performance With Simba’s Powerful Connectivity

Jet Global

JANUARY 30, 2025

Its distributed architecture empowers organizations to query massive datasets across databases, data lakes, and cloud platforms with speed and reliability. Optimizing connections to your data sources is equally important, as it directly impacts the speed and efficiency of data access. Privacy Policy.

Data Lake

Data Lake Data-driven Optimization Enterprise

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

The quick and dirty definition of data mapping is the process of connecting different types of data from various data sources. Data mapping is a crucial step in data modeling and can help organizations achieve their business goals by enabling data integration, migration, transformation, and quality.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

Data lakes were originally designed to store large volumes of raw, unstructured, or semi-structured data at a low cost, primarily serving big data and analytics use cases. Enabling automatic compaction on Iceberg tables reduces metadata overhead on your Iceberg tables and improves query performance.

Data Lake

Data Lake IoT Metadata Testing

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

CIO Business Intelligence

JANUARY 30, 2025

For example, AI can perform real-time data quality checks flagging inconsistencies or missing values, while intelligent query optimization can boost database performance. Real-time data integration at scale Real-time data integration is crucial for businesses like e-commerce and finance, where speed is critical.

Management

Management Data-driven Data Governance Unstructured Data

Redefining enterprise transformation in the age of intelligent ecosystems

CIO Business Intelligence

JANUARY 16, 2025

The mega-vendor era By 2020, the basis of competition for what are now referred to as mega-vendors was interoperability, automation and intra-ecosystem participation and unlocking access to data to drive business capabilities, value and manage risk. Learning systems pivot and adapt based on events and new training data.

Enterprise

Enterprise Digital Transformation Scorecard Interactive

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Drug Launch Case Study: Amazing Efficiency Using DataOps

Webinars

Trending Sources

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Data Lakes: What Are They and Who Needs Them?

Use open table format libraries on AWS Glue 5.0 for Apache Spark

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

How EUROGATE established a data mesh architecture using Amazon DataZone

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

DataOps Observability: Taming the Chaos (Part 3)

AWS Lake Formation 2022 year in review

Porsche Carrera Cup Brasil gets real-time data boost

Governing data in relational databases using Amazon DataZone

A Day in the Life of a DataOps Engineer

Better, faster decisions: Why businesses thrive on real-time data

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

Create an end-to-end data strategy for Customer 360 on AWS

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Demystifying Modern Data Platforms

How data literacy allows gen AI to drive productivity at Dow

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

How Knowledge Graphs Power Data Mesh and Data Fabric

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

The Role of the Data Catalog in Data Security

How SumUp made digital analytics more accessible using AWS Glue

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

An AI Chat Bot Wrote This Blog Post …

What Is Data Modernization? 5 Benefits Worth Knowing

Fact-based Decision-making

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Enhance Trino Performance With Simba’s Powerful Connectivity

What is a Data Pipeline?

What is Data Mapping?

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Revolutionizing data management: Trends driving security, scalability, and governance in 2025

Redefining enterprise transformation in the age of intelligent ecosystems

Stay Connected