2024, Data Transformation and Metadata

2024

Data Transformation

Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

123} ▶ Pre task execution logs [2024-10-21, 16:56:12 UTC] {subprocess.py:63} 63} INFO - Tmp dir root location: /tmp [2024-10-21, 16:56:12 UTC] {subprocess.py:75} 123} ▶ Pre task execution logs [2024-10-21, 16:56:12 UTC] {subprocess.py:63} 63} INFO - Tmp dir root location: /tmp [2024-10-21, 16:56:12 UTC] {subprocess.py:75}

Interactive

Interactive Testing Data-driven Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

The data transformation imperative What Denso and other industry leaders realise is that for IT-OT convergence to be realised, and the benefits of AI unlocked, data transformation is vital. Avanade is attending Hanover Messe 2024. Generative AI, Innovation

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Tableau further democratizes analytics with AI-fueled features

CIO Business Intelligence

APRIL 30, 2024

At Tableau Conference 2024 in San Diego today, Tableau announced new AI features for Tableau Pulse and Einstein Copilot for Tableau, along with several platform improvements aimed at democratizing data insights. This feature can automate a data transformation pipeline with step-by-step suggestions for preparing data for analysis.

Analytics

Analytics Metrics Visualization Dashboards

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. noaa_remote_original" ; Your data should look like the following screenshot. There are two folders: data and metadata. Drill down to data.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

To learn more about how to process Firehose records using Lambda, see Transform source data in Amazon Data Firehose. After executing your Lambda function, Firehose looks for routing information and operations in the metadata fields (in the following format) provided by your Lambda function. b64decode(record['data']).decode('utf-8')

Metadata

Metadata Data Lake Management Internet of Things

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

AWS Big Data

APRIL 28, 2025

The data is stored in Apache Parquet format with AWS Glue Catalog providing metadata management. While this architecture supported NI analytical needs, it lacked the flexibility required for a truly open and adaptable data platform. The gold layer was coupled only with query engines that supported Hive and AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Cost-Benefit Snapshot

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

These include managing complex extract, transform, and load (ETL) processes, handling schema validation, providing reliable delivery, and maintaining custom code for data transformations. Firehose delivers streaming data with configurable buffering options that can be optimized for near-zero latency. worker_type G.1X

Snapshot

Snapshot Optimization Data Lake Metadata

Data Leaders Brief

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Making OT-IT integration a reality with new data architectures and generative AI

Webinars

Tableau further democratizes analytics with AI-fueled features

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Stay Connected