Data Architecture, Data Integration and Testing

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This architecture is valuable for organizations dealing with large volumes of diverse data sources, where maintaining accuracy and accessibility at every stage is a priority. It sounds great, but how do you prove the data is correct at each layer? How do you ensure data quality in every layer ?

Data Quality

Data Quality Testing Metrics Reporting

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. What is data integrity?

Data Integration

Data Integration Testing Data Quality Data-driven

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

CIO Business Intelligence

OCTOBER 24, 2024

The challenge is that these architectures are convoluted, requiring diverse and multiple models, sophisticated retrieval-augmented generation stacks, advanced data architectures, and niche expertise,” they said. They predicted more mature firms will seek help from AI service providers and systems integrators.

ROI

ROI Data-driven Enterprise Experimentation

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

DECEMBER 20, 2020

DataOps Engineers implement the continuous deployment of data analytics. They give data scientists tools to instantiate development sandboxes on demand. They automate the data operations pipeline and create platforms used to test and monitor data from ingestion to published charts and graphs.

Data-driven

Data-driven Manufacturing Data Architecture Data Analytics

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

Deploying higher quality data sources with the appropriate structural veracity: Automate and enforce data model design tasks to ensure data integrity. From regulatory compliance and business intelligence to target marketing, data modeling maintains an automated connection back to the source.

Data Governance

Data Governance Modeling Metadata Unstructured Data

Analytics is changing. How are you keeping pace?

CIO Business Intelligence

NOVEMBER 14, 2022

Being locked into a data architecture that can’t evolve isn’t acceptable.” Aurora built a cloud testing environment on AWS to better understand the safety of its technology by seeing how it would react to scenarios too dangerous or rare to simulate in the real world.

Analytics

Analytics Machine Learning Testing Data Strategy

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

Advanced analytics and new ways of working with data also create new requirements that surpass the traditional concepts. Many companies are therefore forced to put these concepts to the test. But what are the right measures to make the data warehouse and BI fit for the future? What role do technology and IT infrastructure play?

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time.

Data Governance

Data Governance Management Metadata Data Quality

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. AWS Glue released version 4.0 runtime ( 3.5 AWS Glue released version 4.0

Testing

Testing Data Lake Cost-Benefit Data Integration

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Test the connection with SAP using the wheel file. Create the PyRFC wheel file.

Testing

Testing Data Integration Data Lake Enterprise

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Our approach The migration initiative consisted of two main parts: building the new architecture and migrating data pipelines from the existing tool to the new architecture. Often, we would work on both in parallel, testing one component of the architecture while developing another at the same time.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. We will partition and format the server access logs with Amazon Web Services (AWS) Glue , a serverless data integration service, to generate a catalog for access logs and create dashboards for insights.

Metadata

Metadata Dashboards Metrics Visualization

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Test access to the producer cataloged Amazon S3 data using EMR Serverless in the consumer account. Test access using Athena queries in the consumer account. Test access using SageMaker Studio in the consumer account. It is recommended to use test accounts. The catalog account will host Lake Formation and AWS Glue.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

As Gameskraft’s portfolio of gaming products increased, it led to an approximate five-times growth of dedicated data analytics and data science teams. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

CIO Business Intelligence

MARCH 19, 2025

However, embedding ESG into an enterprise data strategy doesnt have to start as a C-suite directive. Developers, data architects and data engineers can initiate change at the grassroots level from integrating sustainability metrics into data models to ensuring ESG data integrity and fostering collaboration with sustainability teams.

IT

IT Data Governance Data-driven Metrics

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Many of the tests to check performance and volumes of data scanned have used Athena because it provides a simple to use, fully serverless, cost effective, interface without the need to setup infrastructure. This approach was deemed efficient and effectively mitigated Amazon S3 throttling problems.

Data Lake

Data Lake Metadata Snapshot Analytics

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

AWS Big Data

JULY 2, 2024

It also provides timely refreshes of data in your data warehouse. The following architecture diagram highlights the end-to-end solution using AWS services. For Name , enter a name (for example, dms-test ). Test the solution Run the task and wait for the workload to complete. Choose Create rule. Choose Create rule.

Data Warehouse

Data Warehouse Sales Testing Big Data

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. Then complete the following steps to connect to Amazon Redshift: Log in to Satori.

Data Warehouse

Data Warehouse Interactive Data Architecture Data-driven

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Components of a Data Mesh. How CDF enables successful Data Mesh Architectures.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Amazon Redshift Serverless, generally available since 2021, allows you to run and scale analytics without having to provision and manage the data warehouse. Neeraja is a seasoned Product Management and GTM leader, bringing over 20 years of experience in product vision, strategy and leadership roles in data products and platforms.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Salesforce certification guide: Roles, paths, exams, cost, training, requirements

CIO Business Intelligence

FEBRUARY 20, 2023

To earn the Salesforce Data Architect certification , candidates should be able to design and implement data solutions within the Salesforce ecosystem, such as data modelling, data integration and data governance. Prerequisites include earning Salesforce Application Architect certification (see above).

B2B

B2B Consulting Sales Cost-Benefit

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

Synchronize your Salesforce and Snowflake data to speed up your time to insight with Amazon AppFlow

AWS Big Data

FEBRUARY 9, 2023

Integrating third-party SaaS applications is often complicated and requires significant effort and development. Developers need to understand the application APIs, write implementation and test code, and maintain the code for future API changes. Amazon AppFlow , which is a low-code/no-code AWS service, addresses this challenge.

Data Warehouse

Data Warehouse Data-driven Snapshot Testing

Introducing erwin Data Modeler 14.0: The next step in a tradition of data modeling excellence

erwin

SEPTEMBER 16, 2024

The gold standard in data modeling solutions for more than 30 years continues to evolve with its latest release, highlighted by: PostgreSQL 16.x Migration and modernization : It enables seamless transitions between legacy systems and modern platforms, ensuring your data architecture evolves without disruption.

Modeling

Modeling Visualization Data Governance Data Architecture

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

Cloudera

SEPTEMBER 7, 2023

Perhaps the biggest challenge of all is that AI solutions—with their complex, opaque models, and their appetite for large, diverse, high-quality datasets—tend to complicate the oversight, management, and assurance processes integral to data management and governance. Formalize ethics and bias testing.

Insurance

Insurance Risk Data-driven Finance

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

BizAcuity

MAY 24, 2022

IaaS provides a platform for compute, data storage and networking capabilities. IaaS is mainly used for developing softwares (testing and development, batch processing), hosting web applications and data analysis. To try and test the platforms in accordance with data strategy and governance.

Data-driven

Data-driven Cost-Benefit Digital Transformation Strategy

Deliver on your data strategy with a data fabric

IBM Big Data Hub

JUNE 21, 2022

A data fabric orchestrates various data sources across a hybrid and multicloud landscape to provide business-ready data in support of analytics, AI and other applications. How IBM built its own data fabric . When I rejoined IBM in 2016, enterprise-level data and its use was having a pivotal moment.

Data Strategy

Data Strategy Strategy Data-driven Enterprise

What is a knowledge worker and what do they do?

IBM Big Data Hub

JULY 7, 2023

Business analytics: Data and insights help knowledge workers make informed decisions and find new opportunities. While Big Data and artificial intelligence (AI) provide the numbers, knowledge workers are key to understanding them. The aim is to break down silos between departments with better data management and integration.

Cost-Benefit

Cost-Benefit Consulting Management Data-driven

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

Gartner is explicit: Data catalogs play a foundational role in the data fabric. And leaders are recognizing the value of a strong data foundation. Indeed, the foundation of your data architecture and strategy – and thus your business strategy – begins with choosing the best data catalog to support your business.

Metadata

Metadata IT Data-driven Metrics

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Through meticulous testing and research, we’ve curated a list of the ten best BI tools, ensuring accessibility and efficacy for businesses of all sizes. In essence, the core capabilities of the best BI tools revolve around four essential functions: data integration, data transformation, data visualization, and reporting.

Dashboards

Dashboards Visualization Data mining Data-driven

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The Project Kernel framework utilizes templates and AI augmentation to streamline coding processes, with the AI augmentation generating test cases using training models built on the organization’s data, use cases, and past test cases. This enabled the team to expose the technology to a small group of senior leaders to test.

IT

IT Insurance Cost-Benefit Testing

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source. Build your first set of reports.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

More companies have realized there is an opportunity to integrate, enhance, and present this SaaS data to improve internal operations and gain valuable insights on their data. From there, they can perform meaningful analytics, gain valuable insights, and optionally push enriched data back to external SaaS platforms.

Data Lake

Data Lake Testing Data Integration Metadata

Trade routes of the digital age: How data gravity shapes cloud strategy

CIO Business Intelligence

APRIL 15, 2025

In the digital world, data integrity faces similar threats, from unauthorized access to manipulation and corruption, requiring strict governance and validation mechanisms to ensure reliability and trust. Moreover, the very nature of supply and demand forced manufacturers to rethink how they produced and delivered goods.

Strategy

Strategy Data-driven Manufacturing Risk

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

AWS Big Data

JANUARY 13, 2025

We cover batch ingestion methods, share practical examples, and discuss best practices to help you build optimized and scalable data pipelines on AWS. Overview of solution AWS Glue is a serverless data integration service that simplifies data preparation and integration tasks for analytics, machine learning, and application development.

Visualization

Visualization Interactive Data-driven Data Processing

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

AWS Big Data

MAY 9, 2025

This seamless integration particularly benefits existing AWS customers who already use the Data Catalog and Lake Formation, because they can immediately take advantage of SageMaker Lakehouse capabilities. AWS Glue is a serverless service that makes data integration simpler, faster, and cheaper. We launched AWS Glue 5.0

Data Lake

Data Lake Data Warehouse Marketing Management

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data testing is an essential aspect of DataOps Observability; it helps to ensure that data is accurate, complete, and consistent with its specifications, documentation, and end-user requirements.

Testing

Testing Data Governance Data Quality Data-driven

The Race For Data Quality in a Medallion Architecture

Data Integrity, the Basis for Reliable Insights

Webinars

Trending Sources

Companies to shift AI goals in 2025 — with setbacks inevitable, Forrester predicts

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Improve Business Agility by Hiring a DataOps Engineer

5 Ways Data Modeling Is Critical to Data Governance

Analytics is changing. How are you keeping pace?

Modernizing the Data Warehouse: Challenges and Benefits

What is data governance? Best practices for managing data assets

Dive deep into AWS Glue 4.0 for Apache Spark

Extract data from SAP ERP using AWS Glue and the SAP SDK

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Migrate an existing data lake to a transactional data lake using Apache Iceberg

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

Accelerate Amazon Redshift secure data use with Satori – Part 1

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Salesforce certification guide: Roles, paths, exams, cost, training, requirements

Top analytics announcements of AWS re:Invent 2024

Synchronize your Salesforce and Snowflake data to speed up your time to insight with Amazon AppFlow

Introducing erwin Data Modeler 14.0: The next step in a tradition of data modeling excellence

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

Deliver on your data strategy with a data fabric

What is a knowledge worker and what do they do?

What Is a Data Fabric and How Does a Data Catalog Support It?

Best BI Tools For 2024 You Need to Know

CIO 100 Award winners drive business results with IT

What Is Embedded Analytics?

Introducing the HubSpot connector for AWS Glue

Trade routes of the digital age: How data gravity shapes cloud strategy

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected