Data Architecture and Reference - Data Leaders Brief

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

AWS Big Data

APRIL 12, 2024

Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. For instructions, refer to Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator.

Data Architecture

Data Architecture IoT Internet of Things Recreation/Entertainment

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

For more details, refer to the BladeBridge Analyzer Demo. Refer to this BladeBridge documentation to get more details on SQL and expression conversion. If you encounter any challenges or have additional requirements, refer to the BladeBridge community support portal or reach out to the BladeBridge team for further assistance.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Active Data Architecture: The Need of the Hour

Data Virtualization

OCTOBER 3, 2024

Reading Time: 3 minutes As organizations continue to pursue increasingly time-sensitive use-cases including customer 360° views, supply-chain logistics, and healthcare monitoring, they need their supporting data infrastructures to be increasingly flexible, adaptable, and scalable.

Data Architecture

Data Architecture Data Integration Management Data Governance

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This means that if data is moved from a bucket in the source Region to another bucket in the target Region, the data access permissions need to be reapplied in the target Region. AWS Glue Data Catalog The AWS Glue Data Catalog is a central repository of metadata about data stored in your data lake.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In your Google Cloud project, youve enabled the following APIs: Google Analytics API Google Analytics Admin API Google Analytics Data API Google Sheets API Google Drive API For more information, refer to Amazon AppFlow support for Google Sheets. Refer to the Amazon Redshift Database Developer Guide for more details.

Analytics

Analytics Data Warehouse Big Data Metrics

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Automate ingestion from a single data source With a auto-copy job, you can automate ingestion from a single data source by creating one job and specifying the path to the S3 objects that contain the data. The S3 object path can reference a set of folders that have the same key prefix.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

To succeed in todays landscape, every company small, mid-sized or large must embrace a data-centric mindset. This article proposes a methodology for organizations to implement a modern data management function that can be tailored to meet their unique needs.

Management

Management Data Governance Data Science Reporting

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

SageMaker still includes all the existing ML and AI capabilities you’ve come to know and love for data wrangling, human-in-the-loop data labeling with Amazon SageMaker Ground Truth , experiments, MLOps, Amazon SageMaker HyperPod managed distributed training, and more.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data Architecture Movements in 2020

TDAN

DECEMBER 17, 2019

Data is commonly referred to as the new oil, a resource so immensely powerful that its true potential is yet to be discovered. We haven’t achieved enough with data research and other statistical modeling techniques to be able to see data for what it truly is and even our methods of accruing data are rudimentary […].

Data Architecture

Data Architecture Statistics Modeling IT

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. For more detailed configuration, refer to Write properties in the Iceberg documentation.

Snapshot

Snapshot Management Metadata Big Data

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This architecture is valuable for organizations dealing with large volumes of diverse data sources, where maintaining accuracy and accessibility at every stage is a priority. It sounds great, but how do you prove the data is correct at each layer? How do you ensure data quality in every layer ?

Data Quality

Data Quality Testing Metrics Reporting

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

In turn, they both must also have the data literacy skills to be able to verify the data’s accuracy, ensure its security, and provide or follow guidance on when and how it should be used. Data democratization uses a fit-for-purpose data architecture that is designed for the way today’s businesses operate, in real-time.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

AWS Big Data

FEBRUARY 1, 2024

Amazon Redshift features like streaming ingestion, Amazon Aurora zero-ETL integration , and data sharing with AWS Data Exchange enable near-real-time processing for trade reporting, risk management, and trade optimization. This will be your OLTP data store for transactional data. version cluster. version cluster.

Data Warehouse

Data Warehouse Dashboards Risk Management Risk

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Cloudera

JANUARY 7, 2025

Heres a deep dive into why and how enterprises master multi-cloud deployments to enhance their data and AI initiatives. While multi-cloud generally refers to the use of multiple cloud providers, hybrid encompasses both cloud and on-premises integrations, as well as multi-cloud setups.

Cost-Benefit

Cost-Benefit Optimization Data-driven Strategy

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Refer to IAM Identity Center identity source tutorials for the IdP setup. For more details, refer to Creating a workgroup with a namespace. Refer to Authorization servers for more information about authorization servers in Okta. For more information, refer to the CreateTokenWithIAM API reference.

Visualization

Visualization Sales Data Warehouse Management

Processing large records with Amazon Kinesis Data Streams

AWS Big Data

OCTOBER 16, 2023

This service seamlessly integrates into your data architecture, allowing you to tap into the full potential of your data for informed decision-making. Data streaming technologies like Kinesis Data Streams are designed to efficiently process and manage continuous streams of data in real time at large scale.

Cost-Benefit

Cost-Benefit Testing Optimization Strategy

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Whereas data governance is about the roles, responsibilities, and processes for ensuring accountability for and ownership of data assets, DAMA defines data management as “an overarching term that describes the processes used to plan, specify, enable, create, acquire, maintain, use, archive, retrieve, control, and purge data.”

Data Governance

Data Governance Management Metadata Data Quality

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

To upgrade your existing Athena engine to version 3 in your Athena workgroup, follow the instructions in Upgrade to Athena engine version 3 to increase query performance and access more analytics features or refer to Changing the engine version in the Athena console. For more details on Iceberg format versions, refer to Format Versioning.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

AWS Big Data

MAY 22, 2024

Refer to Amazon OpenSearch Ingestion to learn about other capabilities provided by OpenSearch Ingestion to build scalable pipelines for your OpenSearch data ingestion needs. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Data Architecture

Data Architecture Visualization Data Transformation Management

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

In a two-part series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom ODP solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

Today we have had over 20,000 signatures , millions of page views, and copycat clones, and it is frequently used as a reference guide. For example, just a few weeks ago, Microsoft announced data fabric, and John Kerski used it to frame up the discussion of how Microsoft data fabric supports DataOps principles.

Testing

Testing Dashboards Data Science Data Lake

SAP unveils tools to help enterprises build their own gen AI apps

CIO Business Intelligence

NOVEMBER 1, 2023

It’s published two new resources for using BTP — a guidance framework with methodologies and reference architectures, and a developers’ guide including building blocks and step-by-step guides — and released an open-source SDK for building extensions on BTP.

Enterprise

Enterprise Cost-Benefit Unstructured Data Software

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

In this case, consumers can query the data lake tables directly or join them with their own local tables, allowing them to add their own conditional logic as needed. Create a view on the producer that references the data lake table that you created. For more information, see Creating datashares and adding objects (preview).

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Andries has over 20 years of experience in the field of data and analytics.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Apache Ozone – A Multi-Protocol Aware Storage System

Cloudera

NOVEMBER 7, 2023

Are you struggling to manage the ever-increasing volume and variety of data in today’s constantly evolving landscape of modern data architectures? References: [1] [link] [2] [link] The post Apache Ozone – A Multi-Protocol Aware Storage System appeared first on Cloudera Blog.

Unstructured Data

Unstructured Data Data Architecture Optimization Interactive

SAP and Nvidia expand partnership to aid customers with gen AI

CIO Business Intelligence

MARCH 18, 2024

RAG optimizes LLMs by giving them the ability to reference authoritative knowledge bases outside their training data. “There are tons of documents that are not residing in an SAP system,” Herzig said. Artificial Intelligence, Data Architecture, Data Science, Digital Transformation, Generative AI, IT Leadership, Nvidia, SAP

Digital Transformation

Digital Transformation Optimization Data Science Modeling

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

erwin

MAY 31, 2019

At Avydium , we believe there’s an important middle ground where different architecture disciplines coexist, including enterprise, solution, application, data, metadata and technical architectures. Applications fail to work together, data is integrated incorrectly causing massive duplication, and worse.

Data-driven

Data-driven Enterprise Metadata Strategy

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Refer to How can I access OpenSearch Dashboards from outside of a VPC using Amazon Cognito authentication for a detailed evaluation of the available options and the corresponding pros and cons. For more information, refer to the AWS CDK v2 Developer Guide. For instructions, refer to Creating a public hosted zone.

Dashboards

Dashboards Data Processing Metadata Consulting

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

These are six main steps in the data pipeline: Amazon EventBridge triggers an AWS Lambda function when the event pattern for AWS Glue Data Quality matches the defined rule. For more information, refer to Working with Query Results, Output Files, and Query History. For S3 path , enter the S3 path to your data source. (

Data Quality

Data Quality Metrics Visualization Dashboards

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

Independent data products often only have value if you can connect them, join them, and correlate them to create a higher order data product that creates additional insights. A modern data architecture is critical in order to become a data-driven organization.

Technology

Technology Data-driven Machine Learning Sales

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

This allowed them to enable a modern data architecture, enhance their streaming capabilities and prepare for the next phase of the CDP Journey. References: CDP Runtime release notes: CDP 7.1.3 Install references: Install references. Customer A was able to upgrade successfully from CDH 5.14.2 Release Notes.

Testing

Testing Metadata Risk Data Science

The latest edition of The Data & Analytics Dictionary is now out

Peter James Thomas

AUGUST 2, 2019

Data Architecture – Definition (2). Data Catalogue. Data Community. Data Domain (contributor: Taru Väre ). Data Enrichment. Data Federation. Data Function. Data Model. Data Operating Model. Geospatial Data. Reference Data (contributor: George Firican ).

Analytics

Analytics Data Analytics Data Architecture Statistics

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Sisense

JULY 6, 2020

First off, this involves defining workflows for every business process within the enterprise: the what, how, why, who, when, and where aspects of data. These regulations, ultimately, ensure key business values: data consistency, quality, and trustworthiness.

Enterprise

Enterprise Management Data Architecture Data-driven

Large Language Models and Data Management

Ontotext

JULY 24, 2023

It was emphasized many times that LLMs are only as good as the data sources. A Few Cautions LLM references a huge amount of data to become truly functional, making it a quite expensive and time consuming effort to train the model. Another concern relates to the definition of ‘data constraints.’

Modeling

Modeling Management Structured Data Data Architecture

An Introduction to Disaster Recovery with the Cloudera Data Platform

Cloudera

AUGUST 9, 2022

The CDP Disaster Recovery Reference Architecture. Today we announce the official release of the CDP Disaster Recovery Reference Architecture (DRRA). The CDP Disaster Recovery Reference Architecture is available in our public documentation within the CDP Reference Architectures microsite.

Data Lake

Data Lake Data Warehouse Data-driven IoT

Explore visualizations with AWS Glue interactive sessions

AWS Big Data

SEPTEMBER 20, 2023

With this functionality, you’re empowered to focus on extracting valuable insights from their data, while AWS Glue handles the infrastructure heavy lifting using a serverless compute model. To get started today, refer to Developing AWS Glue jobs with Notebooks and Interactive sessions.

Interactive

Interactive Visualization Measurement Data Architecture

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Active Data Architecture: The Need of the Hour

What is a data architect? Skills, salaries, and how to become a data framework master

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

The future of data: A 5-pillar approach to modern data management

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data Architecture Movements in 2020

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

The Race For Data Quality in a Medallion Architecture

Data democratization: How data architecture can drive business decisions and AI initiatives

Combine transactional, streaming, and third-party data on Amazon Redshift for financial services

Mastering Multi-Cloud with Cloudera: Strategic Data & AI Deployments Across Clouds

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Processing large records with Amazon Kinesis Data Streams

AWS Lake Formation 2022 year in review

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

What is data governance? Best practices for managing data assets

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Introducing blueprint discovery and other UI enhancements for Amazon OpenSearch Ingestion

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Why the Data Journey Manifesto?

SAP unveils tools to help enterprises build their own gen AI apps

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Apache Ozone – A Multi-Protocol Aware Storage System

SAP and Nvidia expand partnership to aid customers with gen AI

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Upgrade Journey: The Path from CDH to CDP Private Cloud

The latest edition of The Data & Analytics Dictionary is now out

Enterprise Data Management — Driving Large-Scale Change in Your Organization

Large Language Models and Data Management

An Introduction to Disaster Recovery with the Cloudera Data Platform

Explore visualizations with AWS Glue interactive sessions

Stay Connected