Blog, Data Transformation and Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

These data processing and analytical services support Structured Query Language (SQL) to interact with the data. Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values.

Metadata

Metadata Data Lake Modeling Data Warehouse

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g.,

Data Quality

Data Quality Metrics Data-driven Management

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Publish data assets – As the data producer from the retail team, you must ingest individual data assets into Amazon DataZone. For this use case, create a data source and import the technical metadata of four data assets— customers , order_items , orders , products , reviews , and shipments —from AWS Glue Data Catalog.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. How did the data transform exactly?

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

An understanding of the data’s origins and history helps answer questions about the origin of data in a Key Performance Indicator (KPI) reports, including: How the report tables and columns are defined in the metadata? Who are the data owners? What are the transformation rules? Data Governance.

Metadata

Metadata Key Performance Indicator Data Governance Data Quality

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of.

Visualization

Visualization Data Processing Testing Publishing

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

Through this series of blog posts, we’ll discuss how to best scale and branch out an analytics solution using a knowledge graph technology stack. For the use case that this blog will explore, we have picked a perfect blend of the exciting and the fairly boring – building compliance. How to make sense of all that? But with robots.

Visualization

Visualization Reporting Metadata Enterprise

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

The What & Why of Data Governance

erwin

MARCH 4, 2021

In addition to drivers like digital transformation and compliance, it’s really important to look at the effect of poor data on enterprise efficiency/productivity. Then it is accessible and understandable via role-based, contextual views so stakeholders can make strategic decisions based on accurate insights.

Data Governance

Data Governance Digital Transformation Data-driven Cost-Benefit

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh approach distributes data ownership and decentralizes data architecture, paving the way for enhanced agility and scalability. With distributed ownership there is a need for effective governance to ensure the success of any data initiative. Business Glossaries – what is the business meaning of our data?

Metadata

Metadata Data Quality Data Governance Modeling

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The entire generative AI pipeline hinges on the data pipelines that empower it, making it imperative to take the correct precautions. 4 key components to ensure reliable data ingestion Data quality and governance: Data quality means ensuring the security of data sources, maintaining holistic data and providing clear metadata.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Alation

APRIL 4, 2023

Now, joint users will get an enhanced view into cloud and data transformations , with valuable context to guide smarter usage. Integrating helpful metadata into user workflows gives all people, from data scientists to analysts , the context they need to use data more effectively.

Metadata

Metadata Cost-Benefit Data Transformation Predictive Modeling

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

We just announced the general availability of Cloudera DataFlow Designer , bringing self-service data flow development to all CDP Public Cloud customers. In our previous DataFlow Designer blog post , we introduced you to the new user interface and highlighted its key capabilities.

Testing

Testing Publishing Metadata Interactive

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases. But, through it all, Mohan says it’s critical to view everything through the same lens: gaining business value from data. Data fabric is a technology architecture.

Metadata

Metadata Data Warehouse Data Quality Data Lake

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Track models and drive transparent processes.

Risk

Risk Modeling Management Metadata

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale. Another unexpected challenge was the introduction of Spark as a processing framework for big data. Comprehensive data security and data governance (i.e.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. With NiFi you can configure your source processor and run it independently of any other processors to retrieve data. Enabling self-service for developers.

Testing

Testing Cost-Benefit Interactive Visualization

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Click Launch Stack : For Stack name , enter a name for the stack (the default is aws-blog-jira-datalake-with-AppFlow ). For GlueDatabaseName , enter a unique name for the Data Catalog database to hold the Jira data table metadata (the default is jiralake ). Complete the following steps: Sign in to your AWS account.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

More specifically, IDF has been integrated with Alation at an API level; this means that all generated pipeline code, metadata attributes, configuration files, and lineage are automatically synced (representing a huge time savings). They can better understand data transformations, checks, and normalization. Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. And there’s control of that landscape to facilitate insight and collaboration and limit risk.

Data Governance

Data Governance Risk Metadata Management

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Specifically, the system uses Amazon SageMaker Processing jobs to process the data stored in the data lake, employing the AWS SDK for Pandas (previously known as AWS Wrangler) for various data transformation operations, including cleaning, normalization, and feature engineering.

Data Lake

Data Lake Analytics Snapshot Data Quality

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Why The Public Sector Needs Data Governance

Alation

NOVEMBER 22, 2022

Before you implement a data governance framework, you need to know the data you already have. This means you need to: Inventory data: Know all information resources and relevant metadata. Classify data: Organize structured and unstructured data into relevant categories. Reuse metadata productively.

Data Governance

Data Governance Metadata Data-driven Unstructured Data

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? Capabilities within the Prompt Lab include: Summarize: Transform text with domain-specific content into personalized overviews and capture key points (e.g., foundation models to help users discover, augment, and enrich data with natural language. What is watsonx.data?

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. RDF is used extensively for data publishing and data interchange and is based on W3C and other industry standards.

Modeling

Modeling Metadata Data Quality Enterprise

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Data fabric promotes data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

Incremental query refers to a query strategy that focuses on processing and analyzing only the new or updated data within a data lake since the last query. The key idea behind incremental queries is to use metadata or change tracking mechanisms to identify the new or modified data since the last query.

Data Lake

Data Lake Snapshot Big Data Data-driven

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 For Name , enter emr-delta-blog. For Type , choose Spark.

Data Lake

Data Lake Dashboards Metrics Metadata

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

For many organizations, a centralized data platform will fall short as it gives data teams much less autonomy over managing increasingly diverse and voluminous datasets. Netflix implemented this without domain users knowing the underlying technologies and complexity.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

AWS Glue establishes a secure connection to HubSpot using OAuth for authorization and TLS for data encryption in transit. AWS Glue also supports the ability to apply complex data transformations, enabling efficient data integration and preparation to meet your needs. For Secret type , select Other type of secret.

Data Lake

Data Lake Testing Data Integration Metadata

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

To capture a more complete picture of the data’s journey, it is important to have a DataOps Observability system in place. Data lineage is static and often lags by weeks or months. Data lineage is often considered static because it is typically based on snapshots of data and metadata taken at a specific time.

Testing

Testing Data Governance Data Quality Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Biggest Trends in Data Visualization Taking Shape in 2022

Top 6 Benefits of Automating End-to-End Data Lineage

What is Data Lineage? Top 5 Benefits of Data Lineage

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How Your Finance Team Can Lead Your Enterprise Data Transformation

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

The What & Why of Data Governance

Empowering data mesh: The tools to deliver BI excellence

The importance of data ingestion and integration for enterprise AI

Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

How to use foundation models and trusted governance to manage AI workflow risk

How to modernize data lakes with a data lakehouse architecture

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Turnkey Cloud DataOps: Solution from Alation and Accenture

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Addressing the Three Scalability Challenges in Modern Data Platforms

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Tackling AI’s data challenges with IBM databases on AWS

Why The Public Sector Needs Data Governance

Exploring the AI and data capabilities of watsonx

Choosing A Graph Data Model to Best Serve Your Use Case

Data platform trinity: Competitive or complementary?

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

The Modern Data Stack Explained: What The Future Holds

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Introducing the HubSpot connector for AWS Glue

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected