Data Transformation and Document

Semantization of Regulatory Documents in AECO

Ontotext

NOVEMBER 29, 2024

But even though technologies like Building Information Modelling (BIM) have finally introduced symbolic representation, in many ways, AECO still clings to outdated, analog practices and documents. Here, one of the challenges involves digitizing the national specifics of regulatory documents and building codes in multiple languages.

Modeling

Modeling Structured Data Technology Data Transformation

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This makes sure your data models are well-documented, versioned, and straightforward to manage within a collaborative environment.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. An index constructed from the processed documents.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Datasphere is a data discovery tool with essential functionalities: recommendations, data marketplace, and business content (i.e.,

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Get started with our technical documentation. Joel Farvault is Principal Specialist SA Analytics for AWS with 25 years’ experience working on enterprise architecture, data governance and analytics, mainly in the financial services industry.

Analytics

Analytics Visualization Data Governance Data-driven

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

Ensuring Data Transformation Results with Great Expectations

Wayne Yaddow

MARCH 12, 2025

Great Expectations can be integrated directly into existing data pipelines to define, test, and document expectations about the appearance of transformed or converted data. Data quality rules are codified into structured Expectation Suites by Great Expectations instead of relying on ad-hoc scripts or manual checks.

Data Transformation

Data Transformation Data Quality Testing Data Warehouse

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Testing

Testing Data Transformation Data-driven Data Quality

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

For example, automatically importing mappings from developers’ Excel sheets, flat files, Access and ETL tools into a comprehensive mappings inventory, complete with auto generated and meaningful documentation of the mappings, is a powerful way to support overall data governance. Data quality is crucial to every organization.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

Transition from Amazon CloudSearch to Amazon OpenSearch Service

AWS Big Data

JULY 25, 2024

With CloudSearch, you can search large collections of data such as webpages, document files, forum posts, or product information. You send your documents to OpenSearch Serverless, which indexes them for search using the OpenSearch REST API. With OpenSearch Serverless , you get improved, out-of-the-box, hands-free operation.

Cost-Benefit

Cost-Benefit Machine Learning Dashboards Management

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Adding data transformation details to metadata can be challenging because of the dispersed nature of this information across data processing pipelines, making it difficult to extract and incorporate into table-level metadata. The AWS Glue crawler will then populate the additional metadata in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Modeling Data Warehouse

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Data processes that depended upon the previously defective data will likely need to be re-initiated, especially if their functioning was at risk or compromised by the defected data. These processes could include reports, campaigns, or financial documentation. Accuracy should be measured through source documentation (i.e.,

Data Quality

Data Quality Metrics Data-driven Management

Turning the page

Cloudera

JUNE 1, 2021

These acquisitions usher in a new era of “ self-service ” by automating complex operations so customers can focus on building great data-driven apps instead of managing infrastructure. Datacoral powers fast and easy data transformations for any type of data via a robust multi-tenant SaaS architecture that runs in AWS.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Get started with our technical documentation. Joel Farvault is Principal Specialist SA Analytics for AWS with 25 years’ experience working on enterprise architecture, data governance and analytics, mainly in the financial services industry.

Visualization

Visualization Data Lake Testing Data Governance

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Build data validation rules directly into ingestion layers so that insufficient data is stopped at the gate and not detected after damage is done. Use lineage tooling to trace data from source to report. Understanding how data transforms and where it breaks is crucial for audibility and root-cause resolution.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

Business terms and data policies should be implemented through standardized and documented business rules. Compliance with these business rules can be tracked through data lineage, incorporating auditability and validation controls across data transformations and pipelines to generate alerts when there are non-compliant data instances.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Instead of invoking the open-source scikit-learn or Keras calls to build models, your team now goes from Pandas data transforms straight to … the API calls for AWS AutoPilot or GCP Vertex AI. It does not exist in the code. AutoML drives this point home. And it’s available to everyone. What if we go the other way?

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

OpenSearch is an open source, distributed search engine suitable for a wide array of use-cases such as ecommerce search, enterprise search (content management search, document search, knowledge management search, and so on), site search, application search, and semantic search. OpenSearch also includes capabilities to ingest and analyze data.

Dashboards

Dashboards Testing Data-driven Visualization

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

The techniques for managing organisational data in a standardised approach that minimises inefficiency. Extraction, Transform, Load (ETL). The extraction of raw data, transforming to a suitable format for business needs, and loading into a data warehouse. Data transformation.

Management

Management Data Warehouse Digital Transformation Dashboards

Breaking down data silos for digital success

CIO Business Intelligence

NOVEMBER 7, 2023

Given the importance of sharing information among diverse disciplines in the era of digital transformation, this concept is arguably as important as ever. The aim is to normalize, aggregate, and eventually make available to analysts across the organization data that originates in various pockets of the enterprise.

Data Warehouse

Data Warehouse Digital Transformation Data-driven Reporting

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. The bulk of our data scientists are heavy users of Jupyter Notebook. or later.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL.

Data Integration

Data Integration Data Lake Data Warehouse Software

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

This integration empowers developers and data scientists alike with advanced capabilities for code completion, generation, and troubleshooting. Whether you’re tackling data transformation challenges or refining intricate machine learning models, our Copilot is designed to be your reliable partner in innovation.

Machine Learning

Machine Learning Data Science Data-driven Testing

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

AWS Big Data

JUNE 19, 2024

SUPER data type columns in Amazon Redshift contain semi-structured data like JSON documents. Previously, data masking in Amazon Redshift only worked with regular table columns, but now you can apply masking policies specifically to elements within SUPER columns. All columns should masked for them.

Data Warehouse

Data Warehouse Testing Sales Structured Data

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

Take Grammarly as an example: This popular program checks the grammar, tone, and style of documents. Getting this AI properly trained required a huge learning dataset with countless documents that were tagged according to specific criteria. Accurately prepared data is the base of AI. What will it take to build your MVP?

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Integrating healthcare apps and data with FHIR + HL7

IBM Big Data Hub

NOVEMBER 20, 2023

Today’s healthcare providers use a wide variety of applications and data across a broad ecosystem of partners to manage their daily workflows. Integrating these applications and data is critical to their success, allowing them to deliver patient care efficiently and effectively.

Cost-Benefit

Cost-Benefit Data-driven Data Transformation Management

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines. In 2023, the average enterprise receives hundreds of disparate data streams, making efficient and accurate data transformations crucial for traditional and new AI model development.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. As part of the success criteria for operational service levels, you need to document the expected service levels for the new Amazon Redshift data warehouse environment. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Talk Data to Me: Why Employee Data Literacy Matters

erwin

MARCH 26, 2020

Increased data variety, balancing structured, semi-structured and unstructured data, as well as data originating from a widening array of external sources. Reducing the IT bottleneck that creates barriers to data accessibility. Hybrid on-premises/cloud environments that complicate data integration and preparation.

Data-driven

Data-driven Unstructured Data Enterprise Machine Learning

Applying Fine Grained Security to Apache Spark

Cloudera

AUGUST 3, 2022

By leveraging Hive to apply Ranger FGAC, Spark obtains secure access to the data in a protected staging area. Since Spark has direct access to the staged data, any Spark APIs can be used, from complex data transformations to data science and machine learning. . so stay tuned! .

Snapshot

Snapshot Cost-Benefit Machine Learning Data Science

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Dashboards

Dashboards Metrics Sales Reporting

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. PII detection and scrubbing.

Visualization

Visualization Metadata Data Transformation Testing

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

With Amazon AppFlow, you can run data flows at nearly any scale at the frequency you choose—on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence. Track models and drive transparent processes.

Risk

Risk Modeling Management Metadata

Extract time series from satellite weather data with AWS Lambda

AWS Big Data

JULY 6, 2023

It has not been specifically designed for heavy data transformation tasks. Additionally, check out the official documentation of AWS Glue , Lambda , and Step Functions. You also use AWS Glue to consolidate the files produced by the parallel tasks. Note that Lambda is a general purpose serverless engine.

Machine Learning

Machine Learning Visualization IoT Digital Transformation

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.

Optimization

Optimization Experimentation Metrics Enterprise

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

We will create a glue studio job, add events and venue data from the SFTP server, carry out data transformations and load transformed data to s3. For further details on the SFTP connector, see the SFTP Connector for Glue documentation. Select Visual ETL in the central pane.

Data Processing

Data Processing Visualization Data Lake Data Processing

Semantization of Regulatory Documents in AECO

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

SAP Datasphere Powers Business at the Speed of Data

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Ensuring Data Transformation Quality with dbt Core

Ensuring Data Transformation Results with Great Expectations

Available Now! Automated Testing for Data Transformations

Key Challenges Affecting Data Transformations—Dev and Testing

Top 6 Benefits of Automating End-to-End Data Lineage

Transition from Amazon CloudSearch to Amazon OpenSearch Service

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Turning the page

How Your Finance Team Can Lead Your Enterprise Data Transformation

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Data’s dark secret: Why poor quality cripples AI and growth

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

What is Data Lineage? Top 5 Benefits of Data Lineage

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Automating the Automators: Shift Change in the Robot Factory

Migrate from Apache Solr to OpenSearch

The Best Data Management Tools For Small Businesses

Breaking down data silos for digital success

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Introducing Amazon Q data integration in AWS Glue

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Apply fine-grained access and transformation on the SUPER data type in Amazon Redshift

Adding AI to Products: A High-Level Guide for Product Managers

Integrating healthcare apps and data with FHIR + HL7

The importance of data ingestion and integration for enterprise AI

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Talk Data to Me: Why Employee Data Literacy Matters

Applying Fine Grained Security to Apache Spark

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

How to use foundation models and trusted governance to manage AI workflow risk

Extract time series from satellite weather data with AWS Lambda

Deploy and Scale AI Applications With Cloudera AI Inference Service

Use AWS Glue to streamline SFTP data processing

Stay Connected