Interactive, Metadata and Testing

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. Airflow REST API The Airflow REST API is a programmatic interface that allows you to interact with Airflow’s core functionalities.

Interactive

Interactive Testing Data-driven Data Lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon EMR provides a big data environment for data processing, interactive analysis, and machine learning using open source frameworks such as Apache Spark, Apache Hive, and Presto.

Metadata

Metadata Data Lake Modeling Data Warehouse

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Central to a transactional data lake are open table formats (OTFs) such as Apache Hudi , Apache Iceberg , and Delta Lake , which act as a metadata layer over columnar formats. XTable isn’t a new table format but provides abstractions and tools to translate the metadata associated with existing formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. What Is Metadata? Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”.

Metadata

Metadata Management Data Quality Cost-Benefit

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

Solution overview By combining the powerful vector search capabilities of OpenSearch Service with the access control features provided by Amazon Cognito , this solution enables organizations to manage access controls based on custom user attributes and document metadata. If you don’t already have an AWS account, you can create one.

Management

Management Metadata Manufacturing Testing

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Q generative SQL for Amazon Redshift uses generative AI to analyze user intent, query patterns, and schema metadata to identify common SQL query patterns directly within Amazon Redshift, accelerating the query authoring process for users and reducing the time required to derive actionable data insights.

Metadata

Metadata Sales Data Warehouse Optimization

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

It reads metadata from your structured data store to generate SQL queries. Under Default storage metadata , select Amazon Redshift databases and for Database , choose dev. For this demo, we use a native testing interface on the Amazon Bedrock Knowledge Bases console. Choose Test. Choose your Redshift workgroup.

Structured Data

Structured Data Data Warehouse Analytics Finance

How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs

AWS Big Data

AUGUST 8, 2024

AppsFlyer empowers digital marketers to precisely identify and allocate credit to the various consumer interactions that lead up to an app installation, utilizing in-depth analytics. Additionally, we discuss the thorough testing, monitoring, and rollout process that resulted in a successful transition to the new Athena architecture.

Interactive

Interactive Metadata Optimization Testing

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

AWS Big Data

NOVEMBER 29, 2023

The Eightfold Talent Intelligence Platform integrates with Amazon Redshift metadata security to implement visibility of data catalog listing of names of databases, schemas, tables, views, stored procedures, and functions in Amazon Redshift. This post discusses restricting listing of data catalog metadata as per the granted permissions.

Metadata

Metadata Data Warehouse Analytics Data Analytics

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

O'Reilly on Data

JUNE 14, 2024

It offers a wealth of books, on-demand courses, live events, short-form posts, interactive labs, expert playlists, and more—formed from the proprietary content of thousands of independent authors, industry experts, and several of the largest education publishers in the world.

Metadata

Metadata Publishing Data-driven Modeling

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

Install and configure the AWS CLI The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command line shell. When you’re logged in, you can start interacting with the application. Make sure the function is already deployed and working in your account.

Data Processing

Data Processing Metadata Publishing Testing

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

To address this, we used the AWS performance testing framework for Apache Kafka to evaluate the theoretical performance limits. We conducted performance and capacity tests on the test MSK clusters that had the same cluster configurations as our development and production clusters.

Metrics

Metrics Dashboards Testing Optimization

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

To interact with and analyze data stored in Amazon Redshift, AWS provides the Amazon Redshift Query Editor V2 , a web-based tool that allows you to explore, analyze, and share data using SQL. Save the federation metadata XML file You use the federation metadata file to configure the IAM IdP in a later step. Choose Add provider.

Sales

Sales Metadata Enterprise Testing

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

Applications are increasingly using AI and search to reinvent and improve user interactions, content discovery, and automation to uplift business outcomes. We will use generative multimodal AI to modernize image search, eliminating the need for labor to maintain image tags and other metadata. that can operate on text and images.

Machine Learning

Machine Learning Visualization Dashboards Metadata

What is SCOR? A model to improve supply chain management

CIO Business Intelligence

MAY 20, 2025

The updated version includes more emerging drivers of supply chain success, covering topics such as omnichannel, metadata, and blockchain , according to the ASCM. SCORs six primary processes As a framework, SCOR focuses on all customer interactions from the moment an order is placed until the invoice is paid.

Modeling

Modeling Management Metrics Measurement

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. In internal tests, AI-driven scaling and optimizations showcased up to 10 times price-performance improvements for variable workloads.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

AWS Big Data

OCTOBER 24, 2024

It allows organizations to secure data, perform searches, analyze logs, monitor applications in real time, and explore interactive log analytics. es.amazonaws.com' # e.g. my-test-domain.us-east-1.es.amazonaws.com, Amazon OpenSearch Service is a fully managed service for search and analytics. Leave the settings as default.

Visualization

Visualization Management Data Processing Testing

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

A five to nine-person team owns the dev, test, deployment, monitoring and maintenance of a domain. Discoverable – users have access to a catalog or metadata management tool which renders the domain discoverable and accessible. Clear accountability – users interact with a responsive, dedicated team that is accountable to them.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

In the context of Data in Place, validating data quality automatically with Business Domain Tests is imperative for ensuring the trustworthiness of your data assets. Running these automated tests as part of your DataOps and Data Observability strategy allows for early detection of discrepancies or errors.

Testing

Testing Data Quality Predictive Modeling Metrics

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

A catalog or a database that lists models, including when they were tested, trained, and deployed. Metadata and artifacts needed for audits. In particular, auditing and testing machine learning systems will rely on many of the tools I’ve described above. There are real, not just theoretical, risks and considerations.

Machine Learning

Machine Learning Technology Deep Learning Data Science

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

AWS Big Data

NOVEMBER 17, 2023

These include internet-scale web and mobile applications, low-latency metadata stores, high-traffic retail websites, Internet of Things (IoT) and time series data, online gaming, and more. Athena is a serverless, interactive service that allows you to query data from a variety of sources in heterogeneous formats, with no provisioning effort.

Visualization

Visualization Metadata Testing Internet of Things

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

In this post, we show you how you can convert existing data in an Amazon S3 data lake in Apache Parquet format to Apache Iceberg format to support transactions on the data using Jupyter Notebook based interactive sessions over AWS Glue 4.0. AWS Command Line Interface (AWS CLI) configured to interact with AWS Services.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

We introduce you to Amazon Managed Service for Apache Flink Studio and get started querying streaming data interactively using Amazon Kinesis Data Streams. The second streaming data source constitutes metadata information about the call center organization and agents that gets refreshed throughout the day.

Management

Management Metadata Analytics Dashboards

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

S3 Tables integration with the AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize dataincluding Amazon S3 Metadata tablesusing AWS analytics services such as Amazon Data Firehose , Amazon Athena , Amazon Redshift, Amazon EMR, and Amazon QuickSight. connection testing, metadata retrieval, and data preview.

Analytics

Analytics Data Lake Metadata Data Warehouse

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

It involves: Reviewing data in detail Comparing and contrasting the data to its own metadata Running statistical models Data quality reports. from the business interactions), but if not available, then through confirmation techniques of an independent nature. Your Chance: Want to test a professional analytics software?

Data Quality

Data Quality Metrics Data-driven Management

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

AWS Big Data

MARCH 22, 2024

Trino is an open source distributed SQL query engine designed for interactive analytic workloads. Benchmark setup In our testing, we used the 3 TB dataset stored in Amazon S3 in compressed Parquet format and metadata for databases and tables is stored in the AWS Glue Data Catalog. With Amazon EMR 6.10.0

Metadata

Metadata Statistics Broadcasting Optimization

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

APRIL 8, 2013

Sometimes, we escape the clutches of this sub optimal existence and do pick good metrics or engage in simple A/B testing. Testing out a new feature. Identify, hypothesize, test, react. But at the same time, they had to have a real test of an actual feature. You don’t need a beautiful beast to go out and test.

Metrics

Metrics KPI Analytics Key Performance Indicator

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

This populates the technical metadata in the business data catalog for each data asset. The business metadata, can be added by business users to provide business context, tags, and data classification for the datasets. Producers control what to share, for how long, and how consumers interact with it.

Data Lake

Data Lake Publishing Metadata Data-driven

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

We split the solution into two primary components: generating Spark job metadata and running the SQL on Amazon EMR. The first component (metadata setup) consumes existing Hive job configurations and generates metadata such as number of parameters, number of actions (steps), and file formats. X Python 3.8 Amazon EMR 6.1

Metadata

Metadata Data Lake Testing Consulting

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

They value NiFi’s visual, no-code, drag-and-drop UI, the 450+ out-of-the-box processors and connectors, as well as the ability to interactively explore data by starting individual processors in the flow and immediately seeing the impact as data streams through the flow. . Interactivity when needed while saving costs.

Testing

Testing Cost-Benefit Interactive Visualization

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Efficient metadata management : Unlike Hive Metastore (HMS), which needs to track all Hive table partitions (partition key-value pairs, data location and other metadata), the Iceberg partitions store the data in the Iceberg metadata files on the file system. Multi-function analytics . What’s Next.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Federating access to Amazon DataZone with AWS IAM Identity Center and Okta

AWS Big Data

JULY 30, 2024

Under SAML Signing Certificates , select Actions , and then select View IdP Metadata. Under Configure external identity provider , do the following: Under Service provider metadata , choose Download metadata file to download the IAM Identity Center metadata file and save it on your system. Choose the Sign On tab.

Metadata

Metadata Dashboards Data-driven Management

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Data and Metadata: Data inputs and data outputs produced based on the application logic. Also included, business and technical metadata, related to both data inputs / data outputs, that enable data discovery and achieving cross-organizational consensus on the definitions of data assets.

Metadata

Metadata Cost-Benefit Enterprise Interactive

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

JANUARY 31, 2022

Everything is being tested, and then the campaigns that succeed get more money put into them, while the others aren’t repeated. This methodology of “test, look at the data, adjust” is at the heart and soul of business intelligence. Your Chance: Want to try a professional BI analytics software?

Business Intelligence

Business Intelligence Analytics Cost-Benefit ROI

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

Metadata Caching. If you have ever interacted with Impala in the past you would have encountered the Catalog Cache Service. As Impala’s adoption grew the catalog service started to experience these growing pains, therefore recently we introduced two new features to alleviate the stress, On-demand Metadata and Zero Touch Metadata.

Optimization

Optimization Metadata Statistics Cost-Benefit

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

erwin

AUGUST 7, 2019

With the right data catalog tool, organizations can automate enterprise metadata management – including data cataloging, data mapping, data quality and code generation for faster time to value and greater accuracy for data movement and/or deployment projects. A data catalog benefits organizations in a myriad of ways.

Metadata

Metadata Data Governance Data Quality Data Warehouse

Integrate custom applications with AWS Lake Formation – Part 1

AWS Big Data

NOVEMBER 19, 2024

With Lake Formation, you can centralize data security and governance using the AWS Glue Data Catalog , letting you manage metadata and data permissions in one place with familiar database-style features. glue:GetUnfilteredTableMetadata – Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.

Data Lake

Data Lake Metadata Testing Data Processing

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

APRIL 25, 2024

VPC endpoints are created for Amazon S3 and Secrets Manager to interact with other resources. The policies attached to the Amazon MWAA role have full access and must only be used for testing purposes in a secure test environment. Otherwise, it will check the metadata database for the value and return that instead.

Metadata

Metadata Data Processing Management Testing

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

CIO Business Intelligence

DECEMBER 10, 2024

For example, AI-supported chat tools help our game designers to: Brainstorm ideas Test complex game mechanics Generate dialogs They act as digital sparring partners that open up new perspectives and accelerate the creative process. billion data records in real-time every day, based on player interactions with its games.

Data-driven

Data-driven Metadata Interactive KPI

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

For testing, this post includes a sample AWS Cloud Development Kit (AWS CDK) application. The following sections take you through the steps to deploy, test, and observe the example application. or higher Appropriate AWS credentials for interacting with resources in your AWS account. or higher Apache Maven version 3.8.4

Testing

Testing Metadata Cost-Benefit Internet of Things

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

NOVEMBER 17, 2023

Amazon Athena is a serverless, interactive analytics service built on open source frameworks, supporting open table file formats. Starting today, the Athena SQL engine uses a cost-based optimizer (CBO), a new feature that uses table and column statistics stored in the AWS Glue Data Catalog as part of the table’s metadata.

Optimization

Optimization Statistics Metadata Data Lake

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Ontotext

JANUARY 26, 2023

In today’s world, we increasingly interact with the environment around us through data. The catalog stores the asset’s metadata in RDF. This allows keeping a well-defined representation of the metadata of each asset and enables using a SPARQL endpoint to query it. Researchers used GraphDB to store semantic metadata.

Interactive

Interactive Metadata Data Integration Data-driven

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

AWS Big Data

MAY 16, 2024

First, the Airflow REST API support enables programmatic interaction with Airflow resources like connections, Directed Acyclic Graphs (DAGs), DAGRuns, and Task instances. Furthermore, the user’s permissions for interacting with the REST API are determined by the Airflow role assigned to them within Amazon MWAA. small instance class.

Testing

Testing Metrics Interactive Management

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

Webinars

Run Apache XTable in AWS Lambda for background conversion of open table formats

7 Benefits of Metadata Management

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs

How Eightfold AI implemented metadata security in a multi-tenant data analytics environment with Amazon Redshift

The New O’Reilly Answers: The R in “RAG” Stands for “Royalties”

Integrate custom applications with AWS Lake Formation – Part 2

How REA Group approaches Amazon MSK cluster capacity planning

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

What is SCOR? A model to improve supply chain management

Recap of Amazon Redshift key product announcements in 2024

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

What is a Data Mesh?

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Becoming a machine learning company means investing in foundational technologies

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Top analytics announcements of AWS re:Invent 2024

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Introducing Apache Iceberg in Cloudera Data Platform

Federating access to Amazon DataZone with AWS IAM Identity Center and Okta

How Cloudera Data Flow Enables Successful Data Mesh Architectures

6 Case Studies on The Benefits of Business Intelligence And Analytics

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Top 5 Data Catalog Benefits: Understanding Your Organization’s Data Lineage

Integrate custom applications with AWS Lake Formation – Part 1

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

Behind the scenes: The daily impact of genAI at Hamburg’s largest gaming company

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

Speed up queries with the cost-based optimizer in Amazon Athena

GraphDB in Action: Putting the Most Reliable RDF Database to Work for Better Human-machine Interaction

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

Stay Connected