Data Architecture, Data Integration and Reference

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. What is data integrity?

Data Integration

Data Integration Testing Data Quality Data-driven

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This architecture is valuable for organizations dealing with large volumes of diverse data sources, where maintaining accuracy and accessibility at every stage is a priority. It sounds great, but how do you prove the data is correct at each layer? How do you ensure data quality in every layer ?

Data Quality

Data Quality Testing Metrics Reporting

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Active Data Architecture: The Need of the Hour

Data Virtualization

OCTOBER 3, 2024

Reading Time: 3 minutes As organizations continue to pursue increasingly time-sensitive use-cases including customer 360° views, supply-chain logistics, and healthcare monitoring, they need their supporting data infrastructures to be increasingly flexible, adaptable, and scalable.

Data Architecture

Data Architecture Data Integration Management Data Governance

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In your Google Cloud project, youve enabled the following APIs: Google Analytics API Google Analytics Admin API Google Analytics Data API Google Sheets API Google Drive API For more information, refer to Amazon AppFlow support for Google Sheets. Refer to the Amazon Redshift Database Developer Guide for more details.

Analytics

Analytics Data Warehouse Big Data Metrics

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

In turn, they both must also have the data literacy skills to be able to verify the data’s accuracy, ensure its security, and provide or follow guidance on when and how it should be used. Data democratization uses a fit-for-purpose data architecture that is designed for the way today’s businesses operate, in real-time.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data. Videos, pictures etc.

Big Data

Big Data B2B Cost-Benefit Structured Data

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Andries has over 20 years of experience in the field of data and analytics.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

To upgrade your existing Athena engine to version 3 in your Athena workgroup, follow the instructions in Upgrade to Athena engine version 3 to increase query performance and access more analytics features or refer to Changing the engine version in the Athena console. For more details on Iceberg format versions, refer to Format Versioning.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

AWS Glue A data integration service, AWS Glue consolidates major data integration capabilities into a single service. These include data discovery, modern ETL, cleansing, transforming, and centralized cataloging. Its also serverless, which means theres no infrastructure to manage.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Explore visualizations with AWS Glue interactive sessions

AWS Big Data

SEPTEMBER 20, 2023

With this functionality, you’re empowered to focus on extracting valuable insights from their data, while AWS Glue handles the infrastructure heavy lifting using a serverless compute model. To get started today, refer to Developing AWS Glue jobs with Notebooks and Interactive sessions. Big Data Architect. Zach Mitchell is a Sr.

Interactive

Interactive Visualization Measurement Data Architecture

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. For more details, refer to Spark Release 3.3.0 runtime ( 3.5 amzn-8 Hive 2.39-amzn-2

Testing

Testing Data Lake Cost-Benefit Data Integration

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. Various data stores are supported in AWS Glue; for example, AWS Glue 4.0

Data Lake

Data Lake Data Warehouse Visualization Snapshot

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Components of a Data Mesh. How CDF enables successful Data Mesh Architectures.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights.

Metadata

Metadata Dashboards Metrics Visualization

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. For more information, refer to Download and Installation of NW RFC SDK.

Testing

Testing Data Integration Data Lake Enterprise

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 1: Multi-function analytics . The *Any*-house.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

As Gameskraft’s portfolio of gaming products increased, it led to an approximate five-times growth of dedicated data analytics and data science teams. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

This solution is suitable for customers who don’t require real-time ingestion to OpenSearch Service and plan to use data integration tools that run on a schedule or are triggered through events. Before data records land on Amazon S3, we implement an ingestion layer to bring all data streams reliably and securely to the data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Okay, You Got a Knowledge Graph Built with Semantic Technology… And Now What?

Ontotext

JULY 26, 2019

Whether you refer to the use of semantic technology as Linked Data technology or smart data management technology, these concepts boil down to connectivity. Connectivity in the sense of connecting data from different sources and assigning these data additional machine-readable meaning. Read more at: [link].

Technology

Technology Enterprise Data Integration Structured Data

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

For detailed information on managing your Apache Hive metastore using Lake Formation permissions, refer to Query your Apache Hive metastore with AWS Lake Formation permissions. In this post, we present a methodology for deploying a data mesh consisting of multiple Hive data warehouses across EMR clusters.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. Leave the rest of the tabs with their default settings and choose Save.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. We call this the Bad Data Tax.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

For more information about performance improvement capabilities, refer to the list of announcements below. Neeraja is a seasoned Product Management and GTM leader, bringing over 20 years of experience in product vision, strategy and leadership roles in data products and platforms.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data.

Data Lake

Data Lake Marketing Data Processing Management

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data exploration Data exploration helps unearth inconsistencies, outliers, or errors.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Knowledge graphs and semantic metadata Knowledge graphs (KGs) are the key to: Advanced Data Architecture & Models like Data Fabric, Data Mesh Unified Data Access Semantic Data Integration These fundamental capabilities of KGs enable them to bridge the chasm between information and knowledge in the DIKW pyramid.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Ontotext

FEBRUARY 7, 2024

However, what we usually don’t talk about when generating an asset, are the huge invisible or unplanned costs occurring at a later stage when the data needs to be made available for analysis or secondary usage. As a result, a big portion of the IT capacity in Pharma is bound by data integration.

Metadata

Metadata Data Integration Measurement Data-driven

Okay, You Got a Knowledge Graph Built with Semantic Technology… And Now What?

Ontotext

JULY 26, 2019

Whether you refer to the use of semantic technology as Linked Data technology or smart data management technology, these concepts boil down to connectivity. Connectivity in the sense of connecting data from different sources and assigning these data additional machine-readable meaning. Read more at: [link].

Technology

Technology Data Integration Enterprise Structured Data

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

BizAcuity

MAY 24, 2022

This is done to gain better visibility of the operations, and capture data points of interest for the clients. Reasons may vary from business to business but integration is the cornerstone for customer success. With cloud data integration, it gets easier to make reports across departments and data storage will never be an issue.

Data-driven

Data-driven Cost-Benefit Digital Transformation Strategy

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

AWS Big Data

APRIL 20, 2023

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. One of the most common use cases for data preparation on Amazon Redshift is to ingest and transform data from different data stores into an Amazon Redshift data warehouse.

Visualization

Visualization Data Warehouse Big Data Data Lake

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

The data catalog is a foundational layer of the data fabric. This zoomed-in version has references to corresponding vendor markets removed.). Using this diagram as our guide, this blog will deep-dive into each layer of the data fabric, starting with the data catalog. But what does integration look like in action?

Metadata

Metadata IT Data-driven Metrics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog. They chose AWS Glue as their preferred data integration tool due to its serverless nature, low maintenance, ability to control compute resources in advance, and scale when needed.

Metadata

Metadata Data Lake Machine Learning Big Data

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases. Much as the analytics world shifted to augmented analytics, the same is happening in data management. Here is a suggested note: Use Gartner’s Reference Model to Deliver Intelligent Composable Business Applications.

Analytics

Analytics Measurement Data-driven Modeling

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

Bad Data Tax and the Data Bill of Rights So far, our discussion has been pretty theoretical, so we need a compelling business justification for moving in this direction. In the race to become data-driven, most efforts have resulted in a tangled web of data integrations and reconciliations across a sea of data silos.

Technology

Technology Cost-Benefit Data-driven Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

More companies have realized there is an opportunity to integrate, enhance, and present this SaaS data to improve internal operations and gain valuable insights on their data. From there, they can perform meaningful analytics, gain valuable insights, and optionally push enriched data back to external SaaS platforms.

Data Lake

Data Lake Testing Data Integration Metadata

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

AWS Big Data

JANUARY 13, 2025

We cover batch ingestion methods, share practical examples, and discuss best practices to help you build optimized and scalable data pipelines on AWS. Overview of solution AWS Glue is a serverless data integration service that simplifies data preparation and integration tasks for analytics, machine learning, and application development.

Visualization

Visualization Interactive Data-driven Data Architecture

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ontotext

NOVEMBER 11, 2024

This often leaves business insights and opportunities lost among a tangled complexity of meaningless, siloed data and content. Knowledge graphs help overcome these challenges by unifying data access, providing flexible data integration, and automating data management.

Metadata

Metadata Knowledge Discovery Data Integration Management

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data Integrity, the Basis for Reliable Insights

Webinars

Trending Sources

The Race For Data Quality in a Medallion Architecture

Webinars

Active Data Architecture: The Need of the Hour

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Data integrity vs. data quality: Is there a difference?

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

What is data governance? Best practices for managing data assets

Data democratization: How data architecture can drive business decisions and AI initiatives

Big Data Ingestion: Parameters, Challenges, and Best Practices

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Explore visualizations with AWS Glue interactive sessions

Dive deep into AWS Glue 4.0 for Apache Spark

Load data incrementally from transactional data lakes to data warehouses

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Extract data from SAP ERP using AWS Glue and the SAP SDK

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Okay, You Got a Knowledge Graph Built with Semantic Technology… And Now What?

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Accelerate Amazon Redshift secure data use with Satori – Part 1

You Cannot Get to the Moon on a Bike!

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Create an end-to-end data strategy for Customer 360 on AWS

How Knowledge Graphs Power Data Mesh and Data Fabric

Building a Semantic Capability Stack to Support FAIR Knowledge Graphs at Scale

Okay, You Got a Knowledge Graph Built with Semantic Technology… And Now What?

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

What Is a Data Fabric and How Does a Data Catalog Support It?

How Cargotec uses metadata replication to enable cross-account data sharing

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Strategically Approaching Graph Technologies

What Is Embedded Analytics?

Introducing the HubSpot connector for AWS Glue

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Stay Connected