Data Transformation, Document and Modeling

Data Transformation

Document

Modeling

Semantization of Regulatory Documents in AECO

Ontotext

NOVEMBER 29, 2024

But even though technologies like Building Information Modelling (BIM) have finally introduced symbolic representation, in many ways, AECO still clings to outdated, analog practices and documents. Here, one of the challenges involves digitizing the national specifics of regulatory documents and building codes in multiple languages.

Modeling

Modeling Structured Data Technology Data Transformation

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

This middleware consists of custom code that runs data flows to stitch data transformations, search queries, and AI enrichments in varying combinations tailored to use cases, datasets, and requirements. Ingest flows are created to enrich data as its added to an index. An index constructed from the processed documents.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Create dbt models in dbt Cloud.

Data Warehouse

Data Warehouse Analytics Testing Sales

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Generative AI models can translate natural language questions into valid SQL queries, a capability known as text-to-SQL generation.

Metadata

Metadata Data Lake Modeling Data Warehouse

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Given that, what would you say is the job of a data scientist (or ML engineer, or any other such title)? Building Models. A common task for a data scientist is to build a predictive model. You know the drill: pull some data, carve it up into features, feed it into one of scikit-learn’s various algorithms.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. The insights are used to produce informative content for stakeholders (decision-makers, business users, and clients).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

These strategies, such as investing in AI-powered cleansing tools and adopting federated governance models, not only address the current data quality challenges but also pave the way for improved decision-making, operational efficiency and customer satisfaction. When financial data is inconsistent, reporting becomes unreliable.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Business/Data Analyst: The business analyst is all about the “meat and potatoes” of the business. These needs are then quantified into data models for acquisition and delivery. This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Turning the page

Cloudera

JUNE 1, 2021

This means we can double down on our strategy – continuing to win the Hybrid Data Cloud battle in the IT department AND building new, easy-to-use cloud solutions for the line of business. It also means we can complete our business transformation with the systems, processes and people that support a new operating model. .

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Foundation models: The power of curated datasets Foundation models , also known as “transformers,” are modern, large-scale AI models trained on large amounts of raw, unlabeled data.

Risk

Risk Modeling Management Metadata

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

OpenSearch is an open source, distributed search engine suitable for a wide array of use-cases such as ecommerce search, enterprise search (content management search, document search, knowledge management search, and so on), site search, application search, and semantic search. OpenSearch also includes capabilities to ingest and analyze data.

Dashboards

Dashboards Testing Data-driven Visualization

Breaking down data silos for digital success

CIO Business Intelligence

NOVEMBER 7, 2023

Given the importance of sharing information among diverse disciplines in the era of digital transformation, this concept is arguably as important as ever. The aim is to normalize, aggregate, and eventually make available to analysts across the organization data that originates in various pockets of the enterprise.

Data Warehouse

Data Warehouse Digital Transformation Data-driven Reporting

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. The bulk of our data scientists are heavy users of Jupyter Notebook. or later.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Adding AI to Products: A High-Level Guide for Product Managers

Sisense

AUGUST 6, 2020

Take Grammarly as an example: This popular program checks the grammar, tone, and style of documents. Getting this AI properly trained required a huge learning dataset with countless documents that were tagged according to specific criteria. Accurately prepared data is the base of AI. What will it take to build your MVP?

Management

Management Machine Learning Key Performance Indicator Cost-Benefit

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

The complexities of modern data workflows often translate into countless hours spent coding, debugging, and optimizing models. Recognizing this pain point, we set out to redefine the data science experience with AI-driven innovation. This practical support speeds up project initiation and maintains consistent coding practices.

Machine Learning

Machine Learning Data Science Data-driven Testing

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Companies still often accept the risk of using internal data when exploring large language models (LLMs) because this contextual data is what enables LLMs to change from general-purpose to domain-specific knowledge. In the generative AI or traditional AI development cycle, data ingestion serves as the entry point.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. By 2023, the focus shifted towards experimentation.

Optimization

Optimization Experimentation Metrics Enterprise

Declarative Knowledge Graph APIs

Ontotext

DECEMBER 9, 2020

We all want to solve the interesting data challenges, build analytics, generate graph embeddings and train smart machine learning models over our knowledge graph data. This leads to lots of small data fetches to/from GraphDB over the network. Custom code also tends to over-fetch data that is not required.

Modeling

Modeling Management Optimization Machine Learning

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. Jason: How do you use these models?

Dashboards

Dashboards Metrics Sales Reporting

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. As part of the success criteria for operational service levels, you need to document the expected service levels for the new Amazon Redshift data warehouse environment. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Applying Fine Grained Security to Apache Spark

Cloudera

AUGUST 3, 2022

The challenges of arbitrary code execution notwithstanding, there have been attempts to provide a stronger security model but with mixed results. By leveraging Hive to apply Ranger FGAC, Spark obtains secure access to the data in a protected staging area. Learn more on how to use the feature from our public documentation. .

Snapshot

Snapshot Cost-Benefit Machine Learning Data Science

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Efficient data integration – AWS Glue simplifies the ETL process, providing a scalable and flexible solution for data integration between Snowflake and Amazon S3. Scalability and flexibility – The architecture supports scalable data transfers and can be extended to integrate additional data sources and destinations as needed.

Analytics

Analytics Data-driven Data Integration Data Lake

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

In this post, I’ll walk you through how to copy data from one Amazon Relational Database Service (Amazon RDS) for PostgreSQL database to another, while scrubbing PII along the way using AWS Glue. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. PII detection and scrubbing.

Visualization

Visualization Metadata Data Transformation Testing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can modify the Lambda function to fetch additional vehicle information from a separate data store (for example, a DynamoDB table or a Customer Relationship Management system) to enrich the data, before storing the results in an S3 bucket. In this model, the Lambda function is invoked for each incoming event.

Analytics

Analytics IoT Metadata Internet of Things

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise. IBM watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on.

Data Governance

Data Governance Risk Metadata Management

Automate discovery of data relationships using ML and Amazon Neptune graph technology

AWS Big Data

APRIL 19, 2023

This allows for a new way of thinking and new organizational elements—namely, a modern data community. However, today’s data mesh platform contains largely independent data products. Even with well-documented data products, knowing how to connect or join data products is a time-consuming job.

Technology

Technology Data-driven Machine Learning Sales

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

DataRobot Blog

MARCH 10, 2022

By enabling data scientists to rapidly iterate through model development, validation, and deployment, DataRobot provides the tools to blitz through steps four and five of the machine learning lifecycle with AutoML and Auto Time-Series capabilities. Train, Compare, Rank, Validate, and Select Models for Production.

Manufacturing

Manufacturing IoT Machine Learning Forecasting

PODCAST: AI for Digital Enterprise – Episode 5: How Intelligent Operations can become prime advantage for enterprises

bridgei2i

SEPTEMBER 3, 2020

Ronobijay: Sure, I think it would, you know, what used to be anathema till a few months back, you know, data transformation is real now, right? We would have to visit a branch possibly, you know, multiple locations, submit multiple documents. So earlier customers would spend a week or two, trying to open a bank account.

Enterprise

Enterprise Insurance Digital Transformation Interactive

How to Build a Successful Metadata Management Framework

Alation

JUNE 28, 2022

A metadata management framework combines organizational structure and a set of tools to create a data asset taxonomy. Document type: describes creation, storage, and use during business processes. Collaborate more effectively: Break down data silos for better understanding of data assets across all business units.

Metadata

Metadata Management Data Governance Machine Learning

A step-by-step guide to setting up a data governance program

IBM Big Data Hub

FEBRUARY 9, 2023

In our last blog , we delved into the seven most prevalent data challenges that can be addressed with effective data governance. Today we will share our approach to developing a data governance program to drive data transformation and fuel a data-driven culture.

Data Governance

Data Governance Business Objectives Data Quality Measurement

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Octopai

JUNE 9, 2024

They invested heavily in data infrastructure and hired a talented team of data scientists and analysts. The goal was to develop sophisticated data products, such as predictive analytics models to forecast patient needs, patient care optimization tools, and operational efficiency dashboards.

IT Data-driven Predictive Analytics Data Strategy

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Alation

OCTOBER 18, 2022

Data analysts and engineers use dbt to transform, test, and document data in the cloud data warehouse. Yet every dbt transformation contains vital metadata that is not captured – until now. Data Transformation in the Modern Data Stack. Lineage between dbt sources, models, and metrics.

Metadata

Metadata Metrics Recreation/Entertainment Data Quality

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver fast performance for even the most demanding and unpredictable workloads, and you pay only for what you use. Solution overview The integration of Talend with Amazon Redshift adds new features and capabilities.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Prevent Rain Clouds Along Your Snowflake Migration

CDW Research Hub

OCTOBER 25, 2019

As we review data transformation and modernization strategies with our clients, we find many are investigating Snowflake as a data warehouse solution due to its ease of use, speed, and increased flexibility over a traditional data warehouse offering. Mapping a successful data migration can bring on rough weather.

Data Warehouse

Data Warehouse Testing Strategy Data-driven

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

APRIL 4, 2019

We translate their documents, presentations, tables, etc. Milena Yankova : We help the BBC and the Financial Times to model the knowledge available in various documents so they can manage it. Milena Yankova : We help the BBC and the Financial Times to model the knowledge available in various documents so they can manage it.

Recreation/Entertainment

Recreation/Entertainment Testing Enterprise Knowledge Discovery

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models. It is a framework for approaching ML as well as providing techniques for extracting features from raw data that can be used within the models. Feature Engineering Terminology and Motivation.

Testing

Testing Modeling Interactive Measurement

How Data Lineage Improves Data Compliance

Octopai

DECEMBER 11, 2022

It’s for that reason that even as the first BCBS-239 implementation deadline came into effect a few years ago, McKinsey reported that one-third of Global Systemically Important Banks had focused on “documenting data lineage up to the level of provisioning data elements and including data transformation.”.

Insurance

Insurance Risk Metadata Visualization

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

As data science is growing in popularity and importance , if your organization uses data science, you’ll need to pay more attention to picking the right tools for this. An example of a data science tool is Dataiku. Business Intelligence Tools: Business intelligence (BI) tools are used to visualize your data.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Semantization of Regulatory Documents in AECO

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Automating the Automators: Shift Change in the Robot Factory

SAP Datasphere Powers Business at the Speed of Data

Data’s dark secret: Why poor quality cripples AI and growth

Ensuring Data Transformation Quality with dbt Core

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Turning the page

How to use foundation models and trusted governance to manage AI workflow risk

Migrate from Apache Solr to OpenSearch

Breaking down data silos for digital success

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Adding AI to Products: A High-Level Guide for Product Managers

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

The importance of data ingestion and integration for enterprise AI

Deploy and Scale AI Applications With Cloudera AI Inference Service

Declarative Knowledge Graph APIs

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Applying Fine Grained Security to Apache Spark

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Exploring the AI and data capabilities of watsonx

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Automate discovery of data relationships using ML and Amazon Neptune graph technology

Smart Factories: Artificial Intelligence and Automation for Reduced OPEX in Manufacturing

PODCAST: AI for Digital Enterprise – Episode 5: How Intelligent Operations can become prime advantage for enterprises

How to Build a Successful Metadata Management Framework

A step-by-step guide to setting up a data governance program

Sure, Trust Your Data… Until It Breaks Everything: How Automated Data Lineage Saves the Day

Alation and dbt Unlock Metadata and Increase Modern Data Stack Visibility

Enable data analytics with Talend and Amazon Redshift Serverless

Prevent Rain Clouds Along Your Snowflake Migration

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Manual Feature Engineering

How Data Lineage Improves Data Compliance

The Modern Data Stack Explained: What The Future Holds

Stay Connected