Data Transformation, Interactive and Metadata

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. Airflow REST API The Airflow REST API is a programmatic interface that allows you to interact with Airflow’s core functionalities.

Interactive

Interactive Testing Data-driven Data Lake

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Table metadata is fetched from AWS Glue.

Metadata

Metadata Data Lake Modeling Data Warehouse

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. from the business interactions), but if not available, then through confirmation techniques of an independent nature.

Data Quality

Data Quality Metrics Data-driven Management

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

We introduce you to Amazon Managed Service for Apache Flink Studio and get started querying streaming data interactively using Amazon Kinesis Data Streams. Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources.

Management

Management Metadata Analytics Dashboards

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

It seamlessly consolidates data from various data sources within AWS, including AWS Cost Explorer (and forecasting with Cost Explorer ), AWS Trusted Advisor , and AWS Compute Optimizer. Data providers and consumers are the two fundamental users of a CDH dataset. You might notice that this differs slightly from traditional ETL.

Dashboards

Dashboards Analytics Metadata Data Warehouse

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

To fill in the gaps in existing data, HR&A creates digital equity surveys to build a more complete picture before developing digital equity plans. HR&A has used Amazon Redshift Serverless and CARTO to process survey findings more efficiently and create custom interactive dashboards to facilitate understanding of the results.

Measurement

Measurement Dashboards Data Warehouse Analytics

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website. SDK Feature overview The QuickSight SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. Interactivity when needed while saving costs. With NiFi you can configure your source processor and run it independently of any other processors to retrieve data.

Testing

Testing Cost-Benefit Interactive Visualization

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Due to this low complexity, the solution uses AWS serverless services to ingest the data, transform it, and make it available for analytics. The Data Catalog contains the table definition, which contains metadata about the data in the machine-readable file.

Visualization

Visualization Dashboards Data-driven Gap analysis

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

FINRA centralizes all its data in Amazon Simple Storage Service (Amazon S3) with a remote Hive metastore on Amazon Relational Database Service (Amazon RDS) to manage their metadata information. Note: -Your query environment must have the Hive Client tool installed and a connection to your Hive metastore or AWS Glue Data Catalog.

Big Data

Big Data Data Processing Interactive Testing

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Once a draft has been created or opened, developers use the visual Designer to build their data flow logic and validate it using interactive test sessions. In the DataFlow Designer, you can create Test Sessions to turn the canvas into an interactive interface that gives you all the feedback you need to quickly iterate your flow design.

Testing

Testing Publishing Metadata Interactive

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Athena is used to run geospatial queries on the location data stored in the S3 buckets. Choose Run. You’re now ready to query the tables using Athena.

Analytics

Analytics IoT Metadata Internet of Things

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Data Vault 2.0 allows for the following: Agile data warehouse development Parallel data ingestion A scalable approach to handle multiple data sources even on the same entity A high level of automation Historization Full lineage support However, Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. They are used in everything from robotics to tools that reason and interact with humans. Capture and document model metadata for report generation. Track models and drive transparent processes.

Risk

Risk Modeling Management Metadata

5 best open source data flow lineage tools

Octopai

AUGUST 11, 2024

By reverse-engineering, parsing, and converting scripts, Octopai seamlessly connects all data points within and across organizational systems. While open-source tools such as Apache Atlas, Open Metadata, Egeria, Spline, and OpenLineage offer valuable capabilities, they come with their own sets of pros and cons.

Metadata

Metadata Visualization Data Quality Data Governance

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Publishing Metadata

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Data fabric promotes data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases. offers a Prompt Lab, where users can interact with different prompts using prompt engineering on generative AI models for both zero-shot prompting and few-shot prompting.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

Incremental query refers to a query strategy that focuses on processing and analyzing only the new or updated data within a data lake since the last query. The key idea behind incremental queries is to use metadata or change tracking mechanisms to identify the new or modified data since the last query.

Data Lake

Data Lake Snapshot Big Data Data-driven

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Refer to Catalogs for more information.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In the past, First Service Credit Union’s Chief data officer Ty Robbins struggled to integrate data from the legacy, non-relational, and often proprietary tabular databases on which many credit unions run. Start early The time to standardize everything from data modeling to its security is when the data is acquired. “We

Analytics

Analytics Data Lake Metadata Cost-Benefit

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale. Another unexpected challenge was the introduction of Spark as a processing framework for big data. Comprehensive data security and data governance (i.e.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

After the data lands in Amazon S3, smava uses the AWS Glue Data Catalog and crawlers to automatically catalog the available data, capture the metadata, and provide an interface that allows querying all data assets. The data products from the Business Vault and Data Mart stages are now available for consumers.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

“But to us, it’s more than just having a data strategy; it’s also about building a great foundation of a data culture.” That’s where Tableau sees Pulse and Einstein Copilot for Tableau — a generative AI assistant that gives users the ability to interact with Tableau using natural language — coming in.

Analytics

Analytics Metrics Visualization Dashboards

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

Amazon EMR has long been the leading solution for processing big data in the cloud. Amazon EMR is the industry-leading big data solution for petabyte-scale data processing, interactive analytics, and machine learning using over 20 open source frameworks such as Apache Hadoop , Hive, and Apache Spark.

Big Data

Big Data Data Analytics Analytics Interactive

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs.

Data Lake

Data Lake Testing Data Integration Metadata

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

This is in contrast to traditional BI, which extracts insight from data outside of the app. As rich, data-driven user experiences are increasingly intertwined with our daily lives, end users are demanding new standards for how they interact with their business data. Yes—but basic dashboards won’t be enough.

Analytics

Analytics Cost-Benefit Visualization Dashboards

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

Jet Global

JULY 19, 2023

View mode must respect interactivity, responsive layout and limit operations with dashboard. New Interactive Legends for all Visuals simplifies report navigation for non-technical users. Context Menu for Non-Grouped Data provides further self-service user empowerment with our new context menu for ungrouped data.

Dashboards

Dashboards Visualization Reporting Interactive

Data Leaders Brief

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Trending Sources

Ensuring Data Transformation Quality with dbt Core

Webinars

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

How healthcare organizations can analyze and create insights using price transparency data

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How to use foundation models and trusted governance to manage AI workflow risk

5 best open source data flow lineage tools

Improve observability across Amazon MWAA tasks

Data platform trinity: Competitive or complementary?

Exploring the AI and data capabilities of watsonx

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

The Modern Data Stack Explained: What The Future Holds

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Build a data lake with Apache Flink on Amazon EMR

Lay the groundwork now for advanced analytics and AI

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How to modernize data lakes with a data lakehouse architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Tableau further democratizes analytics with AI-fueled features

Hybrid big data analytics with Amazon EMR on AWS Outposts

Introducing the HubSpot connector for AWS Glue

What Is Embedded Analytics?

What is Data Mapping?

Partners in Innovation: Voice of the Customer Enhancements to Logi Symphony

Stay Connected