2022, Data Transformation and Metadata

2022

Data Transformation

Metadata

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How does Data Virtualization complement Data Warehousing and SOA Architectures?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

It’s a set of HTTP endpoints to perform operations such as invoking Directed Acyclic Graphs (DAGs), checking task statuses, retrieving metadata about workflows, managing connections and variables, and even initiating dataset-related events, without directly accessing the Airflow web interface or command line tools.

Interactive

Interactive Testing Data-driven Data Lake

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

Are your data users overwhelmed by silos and frustrated by untrusted data? That was the message — delivered a little more elegantly than that — at Databricks’ Data+AI Summit 2022. The Power of Partnership to Accelerate Data Transformation. Tell them to grab a catalog … and go jump in a lake.

ROI

ROI Metadata Data Lake Digital Transformation

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Bayerische Motoren Werke AG (BMW) is a motor vehicle manufacturer headquartered in Germany with 149,475 employees worldwide and the profit before tax in the financial year 2022 was € 23.5 Data providers and consumers are the two fundamental users of a CDH dataset. The difference lies in when and where data transformation takes place.

Analytics

Analytics Dashboards Metadata Data Warehouse

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We have been continually improving the Spark performance in each Amazon EMR release to further shorten job runtime and optimize users’ spending on their Amazon EMR big data workloads. release in January 2022, the optimized Spark runtime was 3.5 times faster than our first release of 2022, Amazon EMR 6.5. As of the Amazon EMR 6.5

Testing

Testing Big Data Metadata Optimization

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

As you may have heard, the existing London Interbank Offered Rate (LIBOR) will be retired on January 1st, 2022. Naturally, this will produce a massive growth in data management needs. In fact, the LIBOR transition program marks one of the largest data transformation obstacles ever seen in financial services.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

He’s a true expert in the field, having worked at Oracle, Scient, BearingPoint, and Booz Allen Hamilton, and on data-focused projects with companies like LMVH, Major League Baseball, Toyota, American Express, Freddie Mac, and many, many others. I recently had the opportunity to connect with Mohan at Snowflake Summit 2022 in Las Vegas.

Metadata

Metadata Data Warehouse Data Quality Data Lake

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To ingest the data, smava uses a set of popular third-party customer data platforms complemented by custom scripts. After the data lands in Amazon S3, smava uses the AWS Glue Data Catalog and crawlers to automatically catalog the available data, capture the metadata, and provide an interface that allows querying all data assets.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. AWS Glue – The AWS Glue Data Catalog is your persistent technical metadata store in the AWS Cloud. Meters) GPS value Speed s 1.0 (km/h)

Data Lake

Data Lake Metadata Testing Data-driven

Data Leaders Brief

Biggest Trends in Data Visualization Taking Shape in 2022

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Webinars

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Why Data Lineage is Key to the LIBOR Transition

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected