Blog and Data Transformation - Data Leaders Brief

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

As an essential part of ETL, as data is being consolidated, we will notice that data from different sources are structured in different formats. It might be required to enhance, sanitize, and prepare data so that data is fit for consumption by the SQL engine. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

The data scientists and IT professionals were starting to get frustrated, when suddenly, a magical fairy appeared out of nowhere. The fairy was carrying a DataOps wand, and she waved it over the messy data, transforming it into a clean and organized dataset.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

You can now use your tool of choice, including Tableau, to quickly derive business insights from your data while using standardized definitions and decentralized ownership. Refer to the detailed blog post on how you can use this to connect through various other tools. Get started with our technical documentation.

Analytics

Analytics Visualization Data Governance Data-driven

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments.

Data Warehouse

Data Warehouse Analytics Testing Sales

It’s Essential — Verifying Data Transformations (Part 4)

Wayne Yaddow

FEBRUARY 4, 2025

Its EssentialVerifying Data Transformations (Part4) Uncovering the leading problems in data transformation workflowsand practical ways to detect and preventthem In Parts 13 of this series of blogs, categories of data transformations were identified as among the top causes of data quality defects in data pipeline workflows.

Data Transformation

Data Transformation Testing Data Quality Strategy

Key Challenges Affecting Data Transformations—Dev and Testing

Wayne Yaddow

FEBRUARY 6, 2025

Common challenges and practical mitigation strategies for reliable data transformations. Photo by Mika Baumeister on Unsplash Introduction Data transformations are important processes in data engineering, enabling organizations to structure, enrich, and integrate data for analytics , reporting, and operational decision-making.

Testing

Testing Data Transformation Data-driven Manufacturing

Functional Gaps in Your Data Transformation Testing Tools?

Wayne Yaddow

FEBRUARY 11, 2025

Managing tests of complex data transformations when automated data testing tools lack important features? Photo by Marvin Meyer on Unsplash Introduction Data transformations are at the core of modern business intelligence, blending and converting disparate datasets into coherent, reliable outputs.

Testing

Testing Data Transformation Data Quality Statistics

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Building a Data Culture Within a Finance Department. Our finance users tell us that their first exposure to the Alation Data Catalog often comes soon after the launch of organization-wide data transformation efforts. After all, finance is one of the greatest consumers of data within a business.

Finance

Finance Data Transformation Enterprise Metrics

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. Data time-to-value: evaluates how long it takes you to gain insights from a data set. date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

Prerequisites Before you get started, make sure you have the following prerequisites: An AWS account An IAM user with administrator access An S3 bucket Solution Architecture To automate the complete process, we use the following architecture, which integrates Step Functions for orchestration and Amazon EMR Serverless for data transformations.

Big Data

Big Data Data-driven Management Visualization

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

DataOps establishes a process hub that automates data production and analytics development workflows so that the data team is more efficient, innovative and less prone to error. In this blog, we’ll explore the role of the DataOps Engineer in driving the data organization to higher levels of productivity.

Testing

Testing Dashboards Measurement Experimentation

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Use our 14-days free trial today & transform your supply chain! Welcome To The Future Of Logistics We’re on the cusp of big data transforming the nature of logistics. Big data in logistics can improve financial efficiency, provide transparency to the supply chain, and enable proactive strategic decision-making.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Once the connection is established with the success message, you now view your project’s subscribed data directly within Tableau and build dashboards. See the Amazon DataZone and Tableau blog post for step-by-step instructions. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Visualization

Visualization Data Lake Testing Data Governance

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of.

Visualization

Visualization Data Processing Testing Publishing

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

This allows data analysts and data scientists to rapidly construct the necessary data preparation steps to meet their business needs. We use the new data preparation authoring capabilities to create recipes that meet our specific business needs for data transformations. For Data format , select Parquet.

Interactive

Interactive Visualization Data Integration Statistics

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Open the secret blog-glue-snowflake-credentials. For AWS Secret , choose the secret blog-glue-snowflake-credentials. For IAM Role , choose the role that has access to the target S3 location where the job is writing to and the source location from where it’s loading the Snowflake data and also to run the AWS Glue job.

Analytics

Analytics Data-driven Data Integration Data Lake

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

This blog post dives into the strategic considerations and steps involved in migrating from Solr to OpenSearch. For the updateRequestProcessorChain , OpenSearch provides the ingest pipeline , allowing the enrichment or transformation of data before indexing. Migration from Solr to OpenSearch is becoming a common pattern.

Dashboards

Dashboards Testing Data-driven Visualization

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

OCTOBER 19, 2021

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. The post Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering appeared first on Cloudera Blog.

Data Transformation

Data Transformation Interactive Machine Learning Testing

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning).

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

This blog post is co-written with James Sun from Snowflake. Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. Choose Airflow version 2.6.3. Choose Next.

Data Processing

Data Processing Management Publishing Visualization

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP — Apache Hive , Apache Impala , and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. The Open Data Lakehouse . Cloudera builds dbt adaptors for all engines in the open data lakehouse.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

Through this series of blog posts, we’ll discuss how to best scale and branch out an analytics solution using a knowledge graph technology stack. For the use case that this blog will explore, we have picked a perfect blend of the exciting and the fairly boring – building compliance. How to make sense of all that? But with robots.

Visualization

Visualization Reporting Metadata Enterprise

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK’s DataOps journey paralleled their data transformation journey. GSK has been in the process of investing in and building out its data and analytics capabilities and shifting the R&D organization to a software engineering mindset.

Measurement

Measurement Metrics Data-driven Dashboards

4 Considerations When Building Your Government Data Strategy

Cloudera

JULY 9, 2021

Your data strategy, evolving amid lessons and advances, now helps underpin that command, accelerating your agency’s trajectory right through the inevitable speed bumps. . Learn how we can help you power public sector data transformation.

Data Strategy

Data Strategy Strategy Data-driven Machine Learning

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

In the second blog of the Universal Data Distribution blog series , we explored how Cloudera DataFlow for the Public Cloud (CDF-PC) can help you implement use cases like data lakehouse and data warehouse ingest, cybersecurity, and log optimization, as well as IoT and streaming data collection.

Cost-Benefit

Cost-Benefit IoT Data Warehouse Manufacturing

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

DECEMBER 20, 2020

The DataOps Engineering skillset includes hybrid and cloud platforms, orchestration, data architecture, data integration, data transformation, CI/CD, real-time messaging, and containers. The capabilities unlocked by DataOps impacts everyone that uses data analytics — all the way to the top levels of the organization.

Data-driven

Data-driven Manufacturing Data Architecture Data Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Declarative Knowledge Graph APIs

Ontotext

DECEMBER 9, 2020

If you have ever built your own custom GraphQL API layer, the code typically resolves each part of a GraphQL query as it traverses downwards as a separate isolated data fetching step. This leads to lots of small data fetches to/from GraphDB over the network. Custom code also tends to over-fetch data that is not required.

Modeling

Modeling Management Optimization Machine Learning

Applying Fine Grained Security to Apache Spark

Cloudera

AUGUST 3, 2022

By leveraging Hive to apply Ranger FGAC, Spark obtains secure access to the data in a protected staging area. Since Spark has direct access to the staged data, any Spark APIs can be used, from complex data transformations to data science and machine learning. . so stay tuned! .

Snapshot

Snapshot Cost-Benefit Machine Learning Data Science

From the Ground Up: The Truth About Data Innovation

Cloudera

APRIL 19, 2022

Data holds incredible untapped potential for Australian organisations across industries, regardless of individual business goals, and all organisations are at different points in their data transformation journey with some achieving success faster than others. .

Data-driven

Data-driven Data Strategy Big Data Strategy

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

cd /home/ec2-user/SageMaker BASE_S3_PATH="s3://aws-blogs-artifacts-public/artifacts/BDB-4265" aws s3 cp "${BASE_S3_PATH}/0_create_tables_with_metadata.ipynb"./ Under Actions , choose Open Jupyter Navigate to Jupyter console, select New , and then choose Console. aws s3 cp "${BASE_S3_PATH}/1_text_to_sql_for_athena.ipynb"./

Metadata

Metadata Data Lake Modeling Data Warehouse

Efficient Query Design in Power BI: Importing CSV files, lakehouse & database tables

Paul Turley

NOVEMBER 6, 2024

This will take at least two blog posts to cover, but I will summarize them here: Compare data load & transformations with CSV files vs a Fabric lakehouse using the SQL Server connector: Loading 20 million fact rows from CSV files vs a Fabric lakehouse, using Power Query. Same comparison with deployed Power BI model.

Data Transformation

Data Transformation Modeling Reporting

Alteryx to Dataiku: AutoML

Dataiku

APRIL 24, 2024

In our last three blogs, we covered how Dataiku’s visual flow can help enhance collaboration and visibility, differences in how you work with datasets , and one of the key tools to accelerate data transformations: recipes. Welcome back to part four of the Alteryx to Dataiku series!

Visualization

Visualization Data Transformation

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

Predict – Data Engineering (Apache Spark). CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale. 3) Data Visualization is in Tech Preview on AWS and Azure. New Services.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

AWS Big Data

MAY 3, 2023

This is a guest blog post co-written with SangSu Park and JaeHong Ahn from SOCAR. As companies continue to expand their digital footprint, the importance of real-time data processing and analysis cannot be overstated. The following diagram shows an example of data transformations in the handler component.

IoT

IoT Internet of Things Data Transformation Management

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML). . b) Basic data transformation. Go to runner.py and run it.

Machine Learning

Machine Learning Statistics Dashboards Modeling

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. . Along with delivering the world’s first true hybrid data cloud, stay tuned for product announcements that will drive even more business value with innovative data ops and engineering capabilities.

Snapshot

Snapshot Data-driven Optimization Management

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

This integration empowers developers and data scientists alike with advanced capabilities for code completion, generation, and troubleshooting. Whether you’re tackling data transformation challenges or refining intricate machine learning models, our Copilot is designed to be your reliable partner in innovation.

Machine Learning

Machine Learning Data Science Data-driven Testing

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

Solutions to Reign in the Chaos Implementing Data Observability Platforms: Tools like DataKitchen’s DataOps Observability provide an overarching view of the entire Data Journey. They enable continuous monitoring of data transformations and integrations, offering invaluable insights into data lineage and changes.

Data Quality

Data Quality Testing Data Lake Data Integration

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive.

Data Integration

Data Integration Testing Data Quality Data-driven

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines. In 2023, the average enterprise receives hundreds of disparate data streams, making efficient and accurate data transformations crucial for traditional and new AI model development.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Select Visual ETL in the central pane.

Data Processing

Data Processing Visualization Data Lake Data Processing

SQL Streambuilder Data Transformations

An AI Chat Bot Wrote This Blog Post …

Webinars

Trending Sources

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

It’s Essential — Verifying Data Transformations (Part 4)

Key Challenges Affecting Data Transformations—Dev and Testing

Functional Gaps in Your Data Transformation Testing Tools?

Automating Data Pipelines in CDP with CDE Managed Airflow Service

How Your Finance Team Can Lead Your Enterprise Data Transformation

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

What is a DataOps Engineer?

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Migrate from Apache Solr to OpenSearch

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Biggest Trends in Data Visualization Taking Shape in 2022

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

4 Considerations When Building Your Government Data Strategy

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Improve Business Agility by Hiring a DataOps Engineer

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Declarative Knowledge Graph APIs

Applying Fine Grained Security to Apache Spark

From the Ground Up: The Truth About Data Innovation

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Efficient Query Design in Power BI: Importing CSV files, lakehouse & database tables

Alteryx to Dataiku: AutoML

Happy Birthday, CDP Public Cloud

How SOCAR handles large IoT data with Amazon MSK and Amazon ElastiCache for Redis

Using COD and CML to build applications that predict stock data

Cloudera Data Engineering 2021 Year End Review

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Data Integrity, the Basis for Reliable Insights

The importance of data ingestion and integration for enterprise AI

Use AWS Glue to streamline SFTP data processing

Stay Connected