Data Integration, Data Lake and Publishing

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Plug-and-play integration : A seamless, plug-and-play integration between data producers and consumers should facilitate rapid use of new data sets and enable quick proof of concepts, such as in the data science teams. As part of the required data, CHE data is shared using Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Amazon SageMaker Lakehouse , now generally available, unifies all your data across Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data. Having confidence in your data is key.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. Apache Hudi connector for AWS Glue For this post, we use AWS Glue 4.0,

Data Lake

Data Lake Visualization Dashboards Insurance

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity. Choose Run all.

Visualization

Visualization Data Processing Testing Publishing

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. Ingest and Delivery.

Data Lake

Data Lake Enterprise Management Metadata

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? 11 May 2021. .

Management

Management Metadata Data Architecture Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient. Select Publish new dashboard as , and enter GlueObservabilityDashboard. Choose Publish dashboard.

Metrics

Metrics Visualization Dashboards Publishing

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

Our services consuming this data inherit the same resilience from Amazon MSK. If our backend ingestion services face disruptions, no event is lost, because Kafka retains all published messages. Amazon MSK enables us to tailor the data retention duration to our specific requirements, ranging from seconds to unlimited duration.

Data-driven

Data-driven Cost-Benefit Metrics Management

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Key Design Principles of a Data Mesh. Introduction.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Migrate workloads from AWS Data Pipeline

AWS Big Data

JULY 25, 2024

Migrating workloads to AWS Glue AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources. You can visually create, run, and monitor ETL pipelines to load data into your data lakes.

Visualization

Visualization Management Data Integration Testing

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies. Efficient Development : Accurate data models expedite database development, leading to efficient data integration, migration, and application development.

Data-driven

Data-driven Modeling Enterprise Structured Data

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

It has been well published since the State of DevOps 2019 DORA Metrics were published that with DevOps, companies can deploy software 208 times more often and 106 times faster, recover from incidents 2,604 times faster, and release 7 times fewer defects. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. For Report path prefix , enter cur-data/account-cur-daily.

Reporting

Reporting Data Lake Management Optimization

Process price transparency data using AWS Glue

AWS Big Data

MAY 4, 2023

Phase 1 implementation of this regulation, which went into effect on July 1, 2022, requires that payors publish machine-readable files publicly for each plan that they offer. Athena provides a simplified, flexible way to analyze petabytes of data. The second is a processing step, which prepares and publishes data for analysis.

Insurance

Insurance Publishing Cost-Benefit Data Lake

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions. Choose Save ruleset.

Data Quality

Data Quality Data-driven Data Lake Metrics

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Data Lake

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. Lisa Levy is a Content Specialist at Satori.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Data Management Infrastructure/Data Fabric 5. Data Integration tactics 4. Figure 3: The Data and Analytics (infrastructure) Continuum. Business Innovation with D&A 6.

IT

IT Data Lake Data Science Strategy

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Big Data

DECEMBER 9, 2024

Given the importance of data in the world today, organizations face the dual challenges of managing large-scale, continuously incoming data while vetting its quality and reliability. AWS Glue is a serverless data integration service that you can use to effectively monitor and manage data quality through AWS Glue Data Quality.

Data Quality

Data Quality Publishing Snapshot Data Lake

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Acting as a bridge between producer and consumer apps, it enforces the schema, reduces the data footprint in transit, and safeguards against malformed data. AWS Glue is an ideal solution for running stream consumer applications, discovering, extracting, transforming, loading, and integrating data from multiple sources.

Management

Management Metadata Internet of Things Testing

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

We show how to perform extract, transform, and load (ELT), an integration process focused on getting the raw data from a data lake into a staging layer to perform the modeling. Data can either be loaded when there is a new sale, or daily; this is where the inserted date or load date comes in handy.

Modeling

Modeling Sales Data Warehouse Snapshot

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

The longer answer is that in the context of machine learning use cases, strong assumptions about data integrity lead to brittle solutions overall. Data coming from machines tends to land (aka, data at rest ) in durable stores such as Amazon S3, then gets consumed by Hadoop, Spark, etc. Agile Manifesto get published.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Enhance Trino Performance With Simba’s Powerful Connectivity

Jet Global

JANUARY 30, 2025

Its distributed architecture empowers organizations to query massive datasets across databases, data lakes, and cloud platforms with speed and reliability. Optimizing connections to your data sources is equally important, as it directly impacts the speed and efficiency of data access.

Data Lake

Data Lake Data-driven Optimization Enterprise

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping helps standardize, visualize, and understand data across different systems and applications.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

The Right Tool to Support Your Microsoft Dynamics Migration

Jet Global

JUNE 13, 2022

When migrating to the cloud, there are a variety of different approaches you can take to maintain your data strategy. Those options include: Data lake or Azure Data Lake Services (ADLS) is Microsoft’s new data solution, which provides unstructured date analytics through AI. Different Approaches to Migration.

Reporting

Reporting Data Lake Sales Operational Reporting

Oracle Cloud Migration FAQs Answered by Angles

Jet Global

SEPTEMBER 30, 2022

What are the best practices for analyzing cloud ERP data? Data Management. How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP? How can we rapidly build BI reports on cloud ERP data without any help from IT?

Reporting

Reporting Data Warehouse Operational Reporting Enterprise

Overcome These 4 Common D365 F&SCM Challenges with Jet Reports

Jet Global

APRIL 26, 2022

Jet Reports now offers high performance connectivity with options to connect to Synapse/Azure Data Lakes, BYOD, SQL or your Cubes and Tabular models. So D365 F&SCM customers can enjoy all the reporting benefits that the GP, NAV, and BC community already have. With the release of Jet Reports 22.1,

Reporting

Reporting Finance Cost-Benefit Forecasting

SAP BPC Alternatives: Which One is Right for You?

Jet Global

MARCH 27, 2025

Indeed, the transition is not merely a trend but a reality rooted in the need for enhanced flexibility, scalability, and data integration capabilities not sufficiently provided by SAP BPC. data lakes & warehouses like Cloudera, Google Big Query, etc., This includes databases like Microsoft SQL server, IBM DB2, etc.,

Finance

Finance Reporting Cost-Benefit Forecasting

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Webinars

Trending Sources

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Webinars

Recap of Amazon Redshift key product announcements in 2024

How EUROGATE established a data mesh architecture using Amazon DataZone

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Data’s dark secret: Why poor quality cripples AI and growth

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Top 15 data management platforms

Data Management Requirements for the Enterprise Data Lake

Top analytics announcements of AWS re:Invent 2024

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Top 15 data management platforms available today

Augmented data management: Data fabric versus data mesh

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Nexthink scales to trillions of events per day with Amazon MSK

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Migrate workloads from AWS Data Pipeline

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Process price transparency data using AWS Glue

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Accelerate Amazon Redshift secure data use with Satori – Part 1

How Cargotec uses metadata replication to enable cross-account data sharing

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Build Write-Audit-Publish pattern with Apache Iceberg branching and AWS Glue Data Quality

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Dimensional modeling in Amazon Redshift

Themes and Conferences per Pacoid, Episode 8

Enhance Trino Performance With Simba’s Powerful Connectivity

What is a Data Pipeline?

What is Data Mapping?

The Right Tool to Support Your Microsoft Dynamics Migration

Oracle Cloud Migration FAQs Answered by Angles

Overcome These 4 Common D365 F&SCM Challenges with Jet Reports

SAP BPC Alternatives: Which One is Right for You?

Stay Connected