Data Architecture, Metadata and Statistics

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Over the last year, Amazon Redshift added several performance optimizations for data lake queries across multiple areas of query engine such as rewrite, planning, scan execution and consuming AWS Glue Data Catalog column statistics.

Data Lake

Data Lake Statistics Broadcasting Optimization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

While traditional extract, transform, and load (ETL) processes have long been a staple of data integration due to its flexibility, for common use cases such as replication and ingestion, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures.

Data Integration

Data Integration Data Lake Statistics Data-driven

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architecture is a complex and varied field and different organizations and industries have unique needs when it comes to their data architects. Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. With AWS Glue 5.0,

Analytics

Analytics Data Lake Metadata Data Warehouse

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Historic Balance – compares current data to previous or expected values. Statistical Process Control – applies statistical methods to control a process.

Testing

Testing Metadata Dashboards Statistics

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

The Future of Data Lineage and the Role of Metadata

Alation

AUGUST 18, 2022

The complex challenge here is to have the lineage be intelligently updated as the data landscape and processing dynamically bubbles and changes daily across an enterprise. Active metadata will play a critical role in automating such updates as they arise. Get the latest data cataloging news and trends in your inbox.

Metadata

Metadata Visualization Statistics Data Architecture

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

The business end-users were given a tool to discover data assets produced within the mesh and seamlessly self-serve on their data sharing needs. The integration of Databricks Delta tables into Amazon DataZone is done using the AWS Glue Data Catalog. Oghosa Omorisiagbon is a Senior Data Engineer at HEMA.

Data Governance

Data Governance Publishing Data-driven Metadata

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Iceberg stores the metadata pointer for all the metadata files. When a SELECT query is reading an Iceberg table, the query engine first goes to the Iceberg catalog, then retrieves the entry of the location of the latest metadata file, as shown in the following diagram.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience. Learn more about the zero-ETL integrations, data lake performance enhancements, and other announcements below.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. The File Manager Lambda function consumes those messages, parses the metadata, and inserts the metadata to the DynamoDB table odpf_file_tracker.

Data Lake

Data Lake Data Processing Metadata Snapshot

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice. The use cases for metadata are boundless, offering opportunities for innovation in every sector.

Metadata

Metadata Data-driven Insurance Statistics

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

As a reminder, here’s Gartner’s definition of data fabric: “A design concept that serves as an integrated layer (fabric) of data and connecting processes. In this blog, we will focus on the “integrated layer” part of this definition by examining each of the key layers of a comprehensive data fabric in more detail.

Metadata

Metadata IT Data-driven Metrics

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

If the asset has AWS Glue Data Quality enabled, you can now quickly visualize the data quality score directly in the catalog search pane. By selecting the corresponding asset, you can understand its content through the readme, glossary terms , and technical and business metadata.

Data Quality

Data Quality Visualization Metadata Metrics

How to Build a Performant Data Warehouse in Redshift

Sisense

SEPTEMBER 3, 2019

Modeling Your Data for Performance. Data architecture. The data landscape has changed significantly over the last two decades. The volume of data being created has increased, and the storage and computational resources needed to store and analyze that data has become cheaper and more widely available.

Data Warehouse

Data Warehouse OLAP Statistics Cost-Benefit

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

But whatever your industry, perfecting your processes for making important decisions about how to handle data is crucial. Whether you deal in customer contact information, website traffic statistics, sales data, or some other type of valuable information, you’ll need to put a framework of policies in place to manage your data seamlessly.

Data Governance

Data Governance Marketing Machine Learning Sales

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

But whatever your industry, perfecting your processes for making important decisions about how to handle data is crucial. Whether you deal in customer contact information, website traffic statistics, sales data, or some other type of valuable information, you’ll need to put a framework of policies in place to manage your data seamlessly.

Data Governance

Data Governance Marketing Machine Learning Sales

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

We found anecdotal data that suggested things such as a) CDO’s with a business, more than a technical, background tend to be more effective or successful, and b) CDOs most often came from a business background, and c) those that were successful had a good chance at becoming CEO or CEO or some other CXO (but not really CIO).

Analytics

Analytics Measurement Data-driven Modeling

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well.

Data Lake

Data Lake Data Warehouse Data mining Statistics

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Some cloud applications can even provide new benchmarks based on customer data. Advanced Analytics Some apps provide a unique value proposition through the development of advanced (and often proprietary) statistical models. Advanced Analytics Provide the unique benefit of advanced (and often proprietary) statistical models in your app.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ontotext

NOVEMBER 11, 2024

Knowledge graphs, while not as well-known as other data management offerings, are a proven dynamic and scalable solution for addressing enterprise data management requirements across several verticals. The RDF-star extension makes it easy to model provenance and other structured metadata.

Metadata

Metadata Knowledge Discovery Data Integration Management

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

Data testing is an essential aspect of DataOps Observability; it helps to ensure that data is accurate, complete, and consistent with its specifications, documentation, and end-user requirements. Data testing can be done through various methods, such as data profiling, Statistical Process Control, and quality checks.

Testing

Testing Data Quality Data Governance Data-driven

Data Leaders Brief

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Trending Sources

What is a data architect? Skills, salaries, and how to become a data framework master

Webinars

Top analytics announcements of AWS re:Invent 2024

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

A Day in the Life of a DataOps Engineer

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

The Future of Data Lineage and the Role of Metadata

Choosing an open table format for your transactional data lake on AWS

HEMA accelerates their data governance journey with Amazon DataZone

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Demystifying Modern Data Platforms

Why We Started the Data Intelligence Project

What Is a Data Fabric and How Does a Data Catalog Support It?

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

How to Build a Performant Data Warehouse in Redshift

5 Data Governance Mistakes to Avoid

5 Data Governance Mistakes to Avoid

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Convergent Evolution

What Is Embedded Analytics?

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected