Data Architecture, Data Lake and Metrics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

In 2022, data organizations will institute robust automated processes around their AI systems to make them more accountable to stakeholders. Quality test suites will enforce “equity,” like any other performance metric. Data Gets Meshier. 2022 will bring further momentum behind modular enterprise architectures like data mesh.

Testing

Testing Data Lake Data Architecture Manufacturing

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Since Apache Iceberg is well supported by AWS data services and Cloudinary was already using Spark on Amazon EMR, they could integrate writing to Data Catalog and start an additional Spark cluster to handle data maintenance and compaction. For example, for certain queries, Athena runtime was 2x–4x faster than Snowflake.

Data Lake

Data Lake Metadata Snapshot Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

It shows the aggregate metrics of the files that have been processed by a auto-copy job. He has over 13 years of professional experience building and optimizing enterprise data warehouses and is passionate about enabling customers to realize the power of their data. Prior to AWS, he built data warehouse solutions at Amazon.com.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Analytics

Analytics Data Warehouse Big Data Metrics

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

AWS Big Data

OCTOBER 9, 2024

Today, customers are embarking on data modernization programs by migrating on-premises data warehouses and data lakes to the AWS Cloud to take advantage of the scale and advanced analytical capabilities of the cloud. Compare ongoing data that is replicated from the source on-premises database to the target S3 data lake.

Data Quality

Data Quality Data Lake Data Warehouse Metrics

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

There’s a recent trend toward people creating data lake or data warehouse patterns and calling it data enablement or a data hub. DataOps expands upon this approach by focusing on the processes and workflows that create data enablement and business analytics. DataOps Process Hub.

Business Analytics

Business Analytics Analytics Testing Dashboards

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

Tens of thousands of customers use Amazon Redshift every day to run analytics, processing exabytes of data for business insights. times better price performance than other cloud data warehouses. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

After walking his executive team through the data hops, flows, integrations, and processing across different ingestion software, databases, and analytical platforms, they were shocked by the complexity of their current data architecture and technology stack. It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments.

Testing

Testing Metadata Dashboards Statistics

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

We had been talking about “Agile Analytic Operations,” “DevOps for Data Teams,” and “Lean Manufacturing For Data,” but the concept was hard to get across and communicate. I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science.

Testing

Testing Dashboards Data Lake Data Science

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Stream processing, however, can enable the chatbot to access real-time data and adapt to changes in availability and price, providing the best guidance to the customer and enhancing the customer experience. When the model finds an anomaly or abnormal metric value, it should immediately produce an alert and notify the operator.

Data Lake

Data Lake Unstructured Data Management Snapshot

Crossing the Data Divide: Metrics Stores Remind Me That Data Work Is Hard

TDAN

JULY 17, 2024

If you haven’t heard about metrics stores yet, they’re “newish,” so you likely will. They are interesting to an extent, but mostly, they feel like a late-night re-run and remind me that data work is hard. So, what is a metrics store? Most of the young vendors trying to create this category will tell you that […]

Metrics

Metrics OLAP Data Lake Data Architecture

You Can’t Hit What You Can’t See

Cloudera

DECEMBER 1, 2022

Full-stack observability is a critical requirement for effective modern data platforms to deliver the agile, flexible, and cost-effective environment organizations are looking for. RI is a global leader in the design and deployment of large-scale, production-level modern data platforms for the world’s largest enterprises.

Data Quality

Data Quality Metrics Data Lake Statistics

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Let’s look at some key metrics. After analyzing YARN logs by various metrics, you’re ready to design future EMR architectures. He also understands how to apply technologies to solve big data problems and build a well-designed data architecture. George Zhao is a Senior Data Architect at AWS ProServe.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Get maximum value out of your cloud data warehouse with Amazon Redshift

AWS Big Data

APRIL 19, 2023

Building an optimal data system As data grows at an extraordinary rate, data proliferation across your data stores, data warehouse, and data lakes can become a challenge. This performance innovation allows Nasdaq to have a multi-use data lake between teams.

Data Warehouse

Data Warehouse Data Lake Unstructured Data Optimization

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

How effectively and efficiently an organization can conduct data analytics is determined by its data strategy and data architecture , which allows an organization, its users and its applications to access different types of data regardless of where that data resides.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Third-party APIs – These provide analytics and survey data related to ecommerce websites. This could include details like traffic metrics, user behavior, conversion rates, customer feedback, and more. Flat files – Other systems supply data in the form of flat files of different formats.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Extend your data mesh with Amazon Athena and federated views

AWS Big Data

JULY 28, 2023

In this post, we show how to create and query views on federated data sources in a data mesh architecture featuring data producers and consumers. The term data mesh refers to a data architecture with decentralized data ownership. The following diagram depicts our data architecture.

Big Data

Big Data Data Architecture Data Lake Interactive

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

A major challenge with ServiceMax implementation is building a data pipeline between ERP and the ServiceMax application, precisely integrating pricing, orders, and primary data (product, customer) from SAP ERP to ServiceMax using Vyaire’s custom-built integration platform iDataHub.

Testing

Testing Data Integration Data Lake Enterprise

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

The following figure shows some of the metrics derived from the study. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud. Learn from this to build querying capabilities across your data lake and the data warehouse.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Parameters of success Acast succeeded in bootstrapping and scaling a new team- and domain-oriented data product and its corresponding infrastructure and setup, resulting in less friction in gathering insights and happier users and consumers.

Data-driven

Data-driven Advertising Metadata Data Architecture

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Kinesis Data Streams has native integrations with other AWS services such as AWS Glue and Amazon EventBridge to build real-time streaming applications on AWS. Refer to Amazon Kinesis Data Streams integrations for additional details. Lambda is good for event-based and stateless processing.

Analytics

Analytics IoT Data-driven Snapshot

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. He has over 14 years of experience in data and analytics, and helps customers design and build scalable and high-performant analytics solutions. Sudipta Bagchi is a Sr.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Refactoring coupled compute and storage to a decoupling architecture is a modern data solution. It enables compute such as EMR instances and storage such as Amazon Simple Storage Service (Amazon S3) data lakes to scale. George Zhao is a Senior Data Architect at AWS ProServe.

Cost-Benefit

Cost-Benefit Data Lake Dashboards Big Data

Go Fast Using Data Virtualization

Data Virtualization

JANUARY 14, 2022

Reading Time: 3 minutes During a recent house move I discovered an old notebook with metrics from when I was in the role of a Data Warehouse Project Manager and used to estimate data delivery projects. For the delivery a single data mart with.

Data Warehouse

Data Warehouse Metrics Data Integration Management

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Truly data-driven companies see significantly better business outcomes than those that aren’t. According to a recent IDC whitepaper , leaders saw on average two and a half times better results than other organizations in many business metrics. Most organizations don’t end up with data lakes, says Orlandini.

Data Lake

Data Lake Data-driven Finance Data Architecture

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Eight Top DataOps Trends for 2022

Webinars

Trending Sources

Choosing an open table format for your transactional data lake on AWS

Webinars

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

How EUROGATE established a data mesh architecture using Amazon DataZone

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Perform data parity at scale for data modernization programs using AWS Glue Data Quality

DataOps For Business Analytics Teams

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Building a vision for real-time artificial intelligence

Top analytics announcements of AWS re:Invent 2024

A Day in the Life of a DataOps Engineer

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Why the Data Journey Manifesto?

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Exploring real-time streaming for generative AI Applications

Crossing the Data Divide: Metrics Stores Remind Me That Data Work Is Hard

You Can’t Hit What You Can’t See

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Get maximum value out of your cloud data warehouse with Amazon Redshift

Data science vs data analytics: Unpacking the differences

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Extend your data mesh with Amazon Athena and federated views

Extract data from SAP ERP using AWS Glue and the SAP SDK

Create an end-to-end data strategy for Customer 360 on AWS

Design a data mesh on AWS that reflects the envisioned organization

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Amazon Redshift data ingestion options

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Go Fast Using Data Virtualization

How smava makes loans transparent and affordable using Amazon Redshift Serverless

The essential check list for effective data democratization

Stay Connected