Data Architecture, Data Integration and Machine Learning

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). An organizations data architecture is the purview of data architects.

Data Architecture

Data Architecture Management Consulting Internet of Things

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

Our customers are telling us that they are seeing their analytics and AI workloads increasingly converge around a lot of the same data, and this is changing how they are using analytics tools with their data. They aren’t using analytics and AI tools in isolation.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This architecture is valuable for organizations dealing with large volumes of diverse data sources, where maintaining accuracy and accessibility at every stage is a priority. It sounds great, but how do you prove the data is correct at each layer? How do you ensure data quality in every layer ?

Data Quality

Data Quality Testing Metrics Reporting

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

How to Pinpoint Where Your Organization Wins (and Loses) with Data

CIO Business Intelligence

NOVEMBER 29, 2022

Here, I’ll highlight the where and why of these important “data integration points” that are key determinants of success in an organization’s data and analytics strategy. Layering technology on the overall data architecture introduces more complexity. Data and cloud strategy must align.

Data Architecture

Data Architecture Data Integration IoT Data-driven

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Today, the way businesses use data is much more fluid; data literate employees use data across hundreds of apps, analyze data for better decision-making, and access data from numerous locations. Security Data security is a high priority.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Andries has over 20 years of experience in the field of data and analytics.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Analytics is changing. How are you keeping pace?

CIO Business Intelligence

NOVEMBER 14, 2022

While many organizations still struggle to get started, the most innovative organizations are using modern analytics to improve business outcomes, deliver personalized experiences, monetize data as an asset, and prepare for the unexpected. Being locked into a data architecture that can’t evolve isn’t acceptable.”

Analytics

Analytics Machine Learning Testing Data Strategy

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

AWS Big Data

NOVEMBER 28, 2023

We think that by automating the undifferentiated parts, we can help our customers increase the pace of their data-driven innovation by breaking down data silos and simplifying data integration. This integration is currently in limited preview, use this link to request access.

Data Warehouse

Data Warehouse Data-driven Machine Learning B2B

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

AWS Step Functions With AWS Step Functions, you can create workflows, also called State machines, to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning pipelines. These include data discovery, modern ETL, cleansing, transforming, and centralized cataloging.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Modernizing the Data Warehouse: Challenges and Benefits

BI-Survey

AUGUST 21, 2020

The primary modernization approach is data warehouse/ETL automation, which helps promote broad usage of the data warehouse but can only partially improve efficiency in data management processes. However, an automation approach alone is of limited usefulness when data management processes are inefficient.

Data Warehouse

Data Warehouse Data Lake Data Governance Data Architecture

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. IBM Data Governance IBM Data Governance leverages machine learning to collect and curate data assets.

Data Governance

Data Governance Management Metadata Data Quality

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Since then, customer demands for better scale, higher throughput, and agility in handling a wide variety of changing, but increasingly business critical analytics and machine learning use cases has exploded, and we have been keeping pace. Here’s a couple of highlights from this week and for the full list, see below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Databricks’ new data lakehouse aims at media, entertainment sector

CIO Business Intelligence

APRIL 25, 2022

The other 10% represents the effort of initial deployment, data-loading, configuration and the setup of administrative tasks and analysis that is specific to the customer, the Henschen said. The joint solution with Labelbox is targeted toward media companies and is expected to help firms derive more value out of unstructured data.

Recreation/Entertainment

Recreation/Entertainment Data Lake Data Warehouse Unstructured Data

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Conclusion In this post, we walked you through the process of using Amazon AppFlow to integrate data from Google Ads and Google Sheets. We demonstrated how the complexities of data integration are minimized so you can focus on deriving actionable insights from your data.

Analytics

Analytics Data Warehouse Big Data Metrics

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. Figure 2: Apache Iceberg within Cloudera Data Platform. #5: 1: Multi-function analytics .

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

How Automation and No-Code are Driving Modern Data Warehousing

CIO Business Intelligence

APRIL 5, 2022

By consolidating and enriching data assets from disparate sources across the enterprise, these next-gen warehouses allow businesses to deploy advanced analytics – the autonomous (or semi-autonomous) examination of data using cutting-edge techniques such as machine learning and complex event processing.

Data Warehouse

Data Warehouse Visualization Data-driven Data Architecture

Breaking down data silos for digital success

CIO Business Intelligence

NOVEMBER 7, 2023

This team has helped the company to align data across business areas; establish a data governance function to enable trust, privacy, and security of the data; and invest in the talent and technology needed to build a holistic data architecture across Lexmark, Gupta says.

Data Warehouse

Data Warehouse Digital Transformation Data-driven Reporting

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Big data: Architecture and Patterns. The Big data problem can be comprehended properly using a layered architecture. Big data architecture consists of different layers and each layer performs a specific function. The architecture of Big data has 6 layers. Artificial Intelligence.

Big Data

Big Data B2B Cost-Benefit Structured Data

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

He highlights innovations in data, infrastructure, and artificial intelligence and machine learning that are helping AWS customers achieve their goals faster, mine untapped potential, and create a better future. KEY003 | Swami Sivasubramanian (Vice President, Data and AI at AWS) | Nov.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

Data Virtualization

JULY 8, 2020

In my last post, I covered some of the latest best practices for enhancing data management capabilities in the cloud. Despite the increasing popularity of cloud services, enterprises continue to struggle with creating and implementing a comprehensive cloud strategy that.

Data Integration

Data Integration Strategy Enterprise Management

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift enables you to use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning (ML) to deliver the best price-performance at scale.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue , AWS Lambda , and Amazon API Gateway. In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. Prahalathan M is the Data Integration Architect at Vyaire Medical Inc.

Testing

Testing Data Integration Data Lake Enterprise

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

So, KGF 2023 proved to be a breath of fresh air for anyone interested in topics like data mesh and data fabric , knowledge graphs, text analysis , large language model (LLM) integrations, retrieval augmented generation (RAG), chatbots, semantic data integration , and ontology building.

Metadata

Metadata Sales Machine Learning Consulting

Dive deep into security management: The Data on EKS Platform

AWS Big Data

APRIL 29, 2024

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS , an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS).

Management

Management Big Data Data Warehouse Metadata

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. These robust capabilities ensure that data within the data lake remains accurate, consistent, and reliable.

Data Lake

Data Lake Analytics Snapshot Data Quality

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. We call this the Bad Data Tax.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

For consumer access, a centralized catalog is necessary where producers can publish their data assets. Cross-producer data access – Consumers may need to access data from multiple producers within the same catalog environment.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Data Mesh vs Data Fabric: Understanding the Key Differences

Data Virtualization

JANUARY 17, 2023

Reading Time: 2 minutes In recent years, there has been a growing interest in data architecture. One of the key considerations is how best to handle data, and this is where data mesh and data fabric come into play. But what are the key.

Data Architecture

Data Architecture Data Integration Management Metadata

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This amalgamation empowers vendors with authority over a diverse range of workloads by virtue of owning the data. This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used.

Data Lake

Data Lake Metadata Snapshot Analytics

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

A data fabric utilizes an integrated data layer over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of data across enterprises, including hybrid and multi-cloud platforms.

Metadata

Metadata Data-driven Data Architecture Data Quality

Achieve competitive advantage in precision medicine with IBM and Amazon Omics

IBM Big Data Hub

JUNE 28, 2023

Processing terabytes or even petabytes of increasing complex omics data generated by NGS platforms has necessitated development of omics informatics. gene expression; microbiome data) and any tabular data (e.g., clinical) using a range of machine learning models.

Informatics

Informatics Consulting Cost-Benefit Data Architecture

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Introducing the next generation of Amazon SageMaker AWS announces the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. enables you to develop, run, and scale your data integration workloads and get insights faster. With AWS Glue 5.0, AWS Glue 5.0 AWS Glue 5.0

Analytics

Analytics Data Lake Metadata Data Warehouse

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Data ingestion You have to build ingestion pipelines based on factors like types of data sources (on-premises data stores, files, SaaS applications, third-party data), and flow of data (unbounded streams or batch data). Data exploration Data exploration helps unearth inconsistencies, outliers, or errors.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

Cloudera

SEPTEMBER 7, 2023

Perhaps the biggest challenge of all is that AI solutions—with their complex, opaque models, and their appetite for large, diverse, high-quality datasets—tend to complicate the oversight, management, and assurance processes integral to data management and governance. Find out more about CDP, modern data architectures and AI here.

Insurance

Insurance Risk Data-driven Finance

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

How to Unlock the True Value of Your Data

Data Virtualization

MARCH 1, 2023

Reading Time: 4 minutes Some say that data is the new “black gold,” but I believe that just like crude oil, data has little value until you extract it, refine it, and put it to use. In this post, I will share some of.

Data Integration

Data Integration Management IT Data Architecture

How to Unlock the True Value of Your Data

Data Virtualization

MARCH 1, 2023

Reading Time: 4 minutes Some say that data is the new “black gold,” but I believe that just like crude oil, data has little value until you extract it, refine it, and put it to use. In this post, I will share some of.

Data Integration

Data Integration Management IT Data Architecture

What is data architecture? A framework to manage data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Data’s dark secret: Why poor quality cripples AI and growth

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

The Race For Data Quality in a Medallion Architecture

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

How to Pinpoint Where Your Organization Wins (and Loses) with Data

Data integrity vs. data quality: Is there a difference?

Data democratization: How data architecture can drive business decisions and AI initiatives

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Analytics is changing. How are you keeping pace?

Announcing zero-ETL integrations with AWS Databases and Amazon Redshift

AWS re:Invent 2023 Amazon Redshift Sessions Recap

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Modernizing the Data Warehouse: Challenges and Benefits

What is data governance? Best practices for managing data assets

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Databricks’ new data lakehouse aims at media, entertainment sector

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Augmented data management: Data fabric versus data mesh

How Automation and No-Code are Driving Modern Data Warehousing

Breaking down data silos for digital success

Big Data Ingestion: Parameters, Challenges, and Best Practices

Your guide to AWS Analytics at AWS re:Invent 2023

Accelerate Cloud Data Integration with Data Virtualization in the Cloud

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Extract data from SAP ERP using AWS Glue and the SAP SDK

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Dive deep into security management: The Data on EKS Platform

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

You Cannot Get to the Moon on a Bike!

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Data Mesh vs Data Fabric: Understanding the Key Differences

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Achieve competitive advantage in precision medicine with IBM and Amazon Omics

Top analytics announcements of AWS re:Invent 2024

Create an end-to-end data strategy for Customer 360 on AWS

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

How Cargotec uses metadata replication to enable cross-account data sharing

How to Unlock the True Value of Your Data

How to Unlock the True Value of Your Data

Stay Connected