Data Transformation, Modeling and Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. This saves time and effort, especially for teams looking to minimize infrastructure management and focus solely on data modeling.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Let’s start by considering the job of a non-ML software engineer: writing traditional software deals with well-defined, narrowly-scoped inputs, which the engineer can exhaustively and cleanly model in the code. Not only is data larger, but models—deep learning models in particular—are much larger than before.

IT

IT Testing Experimentation Software

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Expense optimization and clearly defined workload selection criteria will determine which go to the public cloud and which to private cloud, he says. Secure storage, together with data transformation, monitoring, auditing, and a compliance layer, increase the complexity of the system.

Data Processing

Data Processing Optimization Modeling Enterprise

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Agentic AI: Why this emerging technology will revolutionise multiple sectors

CIO Business Intelligence

DECEMBER 9, 2024

New advancements in GenAI technology are set to create more transformative opportunities for tech-savvy enterprises and organisations. These developments come as data shows that while the GenAI boom is real and optimism is high, not every organisation is generating tangible value so far. Operations.

Technology

Technology Insurance Interactive Reporting

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications.

IoT

IoT Machine Learning Metadata Data-driven

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Create dbt models in dbt Cloud.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Writing SQL queries requires not just remembering the SQL syntax rules, but also knowledge of the tables metadata, which is data about table schemas, relationships among the tables, and possible column values. Generative AI models can translate natural language questions into valid SQL queries, a capability known as text-to-SQL generation.

Metadata

Metadata Data Lake Modeling Data Warehouse

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Amazon Redshift has launched a session reuse capability for the Data API that can significantly streamline multi-step, stateful workloads such as exchange, transform, and load (ETL) pipelines, reporting processes, and other flows that involve sequential queries.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Data Engineers Are Using AI to Verify Data Transformations

Wayne Yaddow

FEBRUARY 26, 2025

AI is transforming how senior data engineers and data scientists validate data transformations and conversions. Artificial intelligence-based verification approaches aid in the detection of anomalies, the enforcement of data integrity, and the optimization of pipelines for improved efficiency.

Data Transformation

Data Transformation Testing Data-driven Data Quality

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. Building the right data model is an important part of your data strategy.

Modeling

Modeling Big Data IoT Data Warehouse

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Let’s go through the ten Azure data pipeline tools Azure Data Factory : This cloud-based data integration service allows you to create data-driven workflows for orchestrating and automating data movement and transformation. You can use it for big data analytics and machine learning workloads.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

What is business analytics? Using data to improve business outcomes

CIO Business Intelligence

JULY 5, 2022

Business analytics is the practical application of statistical analysis and technologies on business data to identify and anticipate trends and predict business outcomes. Data analytics is used across disciplines to find trends and solve problems using data mining , data cleansing, data transformation, data modeling, and more.

Business Analytics

Business Analytics Prescriptive Analytics Data mining Diagnostic Analytics

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How is Data Virtualization performance optimized? In improving operational processes.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

DataOps involves close collaboration between data scientists, IT professionals, and business stakeholders, and it often involves the use of automation and other technologies to streamline data-related tasks. One of the key benefits of DataOps is the ability to accelerate the development and deployment of data-driven solutions.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

BMW Group uses 4,500 AWS Cloud accounts across the entire organization but is faced with the challenge of reducing unnecessary costs, optimizing spend, and having a central place to monitor costs. The ultimate goal is to raise awareness of cloud efficiency and optimize cloud utilization in a cost-effective and sustainable manner.

Dashboards

Dashboards Analytics Metadata Data Warehouse

12 data science certifications that will pay off

CIO Business Intelligence

JANUARY 19, 2024

The exam covers everything from fundamental to advanced data science concepts such as big data best practices, business strategies for data, building cross-organizational support, machine learning, natural language processing, scholastic modeling, and more.

Data Science

Data Science Machine Learning Predictive Modeling Forecasting

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Accurately predicting demand for products allows businesses to optimize inventory levels, minimize stockouts, and reduce holding costs. Solution overview In today’s highly competitive business landscape, it’s essential for retailers to optimize their inventory management processes to maximize profitability and improve customer satisfaction.

Forecasting

Forecasting Management IoT Data-driven

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

The main driving factors include lower total cost of ownership, scalability, stability, improved ingestion connectors (such as Data Prepper , Fluent Bit, and OpenSearch Ingestion), elimination of external cluster managers like Zookeeper, enhanced reporting, and rich visualizations with OpenSearch Dashboards.

Dashboards

Dashboards Testing Data-driven Visualization

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Business/Data Analyst: The business analyst is all about the “meat and potatoes” of the business. These needs are then quantified into data models for acquisition and delivery. This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Foundation models: The power of curated datasets Foundation models , also known as “transformers,” are modern, large-scale AI models trained on large amounts of raw, unlabeled data.

Risk

Risk Modeling Management Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. This includes the ETL processes that capture source data, the functional refinement and creation of data products, the aggregation for business metrics, and the consumption from analytics, business intelligence (BI), and ML.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Time for New Partnership Paradigms to Be Future-fit

CIO Business Intelligence

DECEMBER 6, 2023

However, this partnership model cannot keep pace with an always-changing technology landscape in which the skill gaps and lack of resources are increasing. The new models recognise this, drawing tech vendors to shift toward innovation-focused roles and become partners in the client’s success.

Digital Transformation

Digital Transformation Software Cost-Benefit Manufacturing

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. By 2023, the focus shifted towards experimentation.

Optimization

Optimization Experimentation Metrics Enterprise

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance . Predict – Data Engineering (Apache Spark). New Services. Learn More, Keep in Touch.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Choosing A Graph Data Model to Best Serve Your Use Case

Ontotext

MARCH 27, 2024

For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. Knowledge graphs model knowledge of a domain as a graph with a network of entities and relationships.

Modeling

Modeling Metadata Data Quality Enterprise

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“All they would have to do is just build their model and run with it,” he says. But to augment its various businesses with ML and AI, Iyengar’s team first had to break down data silos within the organization and transform the company’s data operations. For now, it operates under a centralized “hub and spokes” model.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

As with all AWS services, Amazon Redshift is a customer-obsessed service that recognizes there isn’t a one-size-fits-all for customers when it comes to data models, which is why Amazon Redshift supports multiple data models such as Star Schemas, Snowflake Schemas and Data Vault. Data Vault 2.0

Enterprise

Enterprise Data Warehouse Data Lake Optimization

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

A Planning Center of Excellence Delivers Performance Improvement

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

The difference is in using advanced modeling and data management to make faster scenario planning possible, driven by actionable key performance measures that enable faster, well-informed decision cycles. In tech speak, this means the semantic layer is optimized for the intended audience.

Forecasting

Forecasting Machine Learning Finance Predictive Analytics

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

With auto-copy, automation enhances the COPY command by adding jobs for automatic ingestion of data. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

Let’s look at a few ways that different industries take advantage of streaming data. How industries can benefit from streaming data. Automotive: Monitoring connected, autonomous cars in real time to optimize routes to avoid traffic and for diagnosis of mechanical issues. Optimizing object storage.

Dashboards

Dashboards IoT Optimization Internet of Things

Declarative Knowledge Graph APIs

Ontotext

DECEMBER 9, 2020

We all want to solve the interesting data challenges, build analytics, generate graph embeddings and train smart machine learning models over our knowledge graph data. They allow Ontotext to perform optimizations that are not easy/possible using an ORM embedded within a custom-built API.

Modeling

Modeling Management Optimization Machine Learning

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera users can securely connect Rill to a source of event stream data, such as Cloudera DataFlow , model data into Rill’s cloud-based Druid service, and share live operational dashboards within minutes via Rill’s interactive metrics dashboard or any connected BI solution. Cloudera Data Warehouse). Apache Hive.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In this post, we explore how AWS Glue can serve as the data integration service to bring the data from Snowflake for your data integration strategy, enabling you to harness the power of your data ecosystem and drive meaningful outcomes across various use cases. Store the extracted and transformed data in Amazon S3.

Analytics

Analytics Data-driven Data Integration Data Lake

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. .

Snapshot

Snapshot Data-driven Optimization Data Architecture

8 data strategy mistakes to avoid

CIO Business Intelligence

JANUARY 24, 2024

At Vanguard, “data and analytics enable us to fulfill on our mission to provide investors with the best chance for investment success by enabling us to glean actionable insights to drive personalized client experiences, scale advice, optimize investment and business operations, and reduce risk,” Swann says.

Data Strategy

Data Strategy Strategy Unstructured Data Data Governance

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

This data is then used by various applications for streaming analytics, business intelligence, and reporting. Amazon SageMaker is used to build, train, and deploy a range of ML models. This ensures that the data is suitable for training purposes. Additionally, SageMaker training jobs are employed for training the models.

Data Lake

Data Lake Analytics Snapshot Data Quality

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Cloudera

JUNE 24, 2024

The complexities of modern data workflows often translate into countless hours spent coding, debugging, and optimizing models. Recognizing this pain point, we set out to redefine the data science experience with AI-driven innovation.

Machine Learning

Machine Learning Data Science Data-driven Testing

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

Occam's Razor

OCTOBER 19, 2010

A huge vast majority of clicks coming from search engines continue to be organic clicks (which is why I love and adore search engine optimization). Google Website Optimizer. Here's a free guide – 26 pages – to use the website optimizer optimally: PDF Download: The Techie Guide to Google Website Optimizer.

Analytics

Analytics Testing Measurement Optimization

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Pattern 1: Data transformation, load, and unload Several of our data pipelines included significant data transformation steps, which were primarily performed through SQL statements executed by Amazon Redshift. The following Diagram 2 shows this workflow.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise. IBM watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Assessing and interviewing data engineers from a distance

Insight

APRIL 8, 2020

In some cases, they work to deploy data science models into production with an eye towards optimization, scalability and maintainability. Data architects and data modelers who specialize in areas such as schema design, identifying query access patterns and building and maintaining data warehouses.

Data Warehouse

Data Warehouse Cost-Benefit Software Optimization

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

MLOps and DevOps: Why Data Makes It Different

Webinars

Trending Sources

CIOs are rethinking how they use public cloud services. Here’s why.

Webinars

Agentic AI: Why this emerging technology will revolutionise multiple sectors

How EUROGATE established a data mesh architecture using Amazon DataZone

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Data Engineers Are Using AI to Verify Data Transformations

Building Better Data Models to Unlock Next-Level Intelligence

Automating Data Pipelines in CDP with CDE Managed Airflow Service

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

What is business analytics? Using data to improve business outcomes

Biggest Trends in Data Visualization Taking Shape in 2022

An AI Chat Bot Wrote This Blog Post …

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

12 data science certifications that will pay off

Reference guide to build inventory management and forecasting solutions on AWS

Migrate from Apache Solr to OpenSearch

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

How to use foundation models and trusted governance to manage AI workflow risk

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Time for New Partnership Paradigms to Be Future-fit

Deploy and Scale AI Applications With Cloudera AI Inference Service

Happy Birthday, CDP Public Cloud

Choosing A Graph Data Model to Best Serve Your Use Case

Straumann Group is transforming dentistry with data, AI

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

7 key Microsoft Azure analytics services (plus one extra)

A Planning Center of Excellence Delivers Performance Improvement

Amazon Redshift data ingestion options

Harnessing Streaming Data: Insights at the Speed of Life

Declarative Knowledge Graph APIs

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Cloudera Data Engineering 2021 Year End Review

8 data strategy mistakes to avoid

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Unparalleled Productivity: The Power of Cloudera Copilot for Cloudera Machine Learning

Best Web Analytics 2.0 Tools: Quantitative, Qualitative, Life Saving!

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Exploring the AI and data capabilities of watsonx

Assessing and interviewing data engineers from a distance

Stay Connected