Data Transformation, Data-driven and Events

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. This supports the growing emphasis on event-driven data pipelines. When we announced support for version 2.9.2

Interactive

Interactive Testing Data-driven Data Lake

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

At AWS, we are committed to empowering organizations with tools that streamline data analytics and transformation processes. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

We live in a data-rich, insights-rich, and content-rich world. Data collections are the ones and zeroes that encode the actionable insights (patterns, trends, relationships) that we seek to extract from our data through machine learning and data science. Plus, AI can also help find key insights encoded in data.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Data is the foundation of innovation, agility and competitive advantage in todays digital economy. As technology and business leaders, your strategic initiatives, from AI-powered decision-making to predictive insights and personalized experiences, are all fueled by data. Data quality is no longer a back-office concern.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. or a later version) database.

Data Warehouse

Data Warehouse Analytics Testing Modeling

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

AWS Big Data

NOVEMBER 22, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that you can use to analyze your data at scale. Redshift Data API provides a secure HTTP endpoint and integration with AWS SDKs. Calls to the Data API are asynchronous.

Data Warehouse

Data Warehouse Recreation/Entertainment Cost-Benefit Data-driven

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

How dbt Core aids data teams test, validate, and monitor complex data transformations and conversions Photo by NASA on Unsplash Introduction dbt Core, an open-source framework for developing, testing, and documenting SQL-based data transformations, has become a must-have tool for modern data teams as the complexity of data pipelines grows.

Data Transformation

Data Transformation Testing Unstructured Data Data Quality

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

Data lineage is the journey data takes from its creation through its transformations over time. Tracing the source of data is an arduous task. With all these diverse data sources, and if systems are integrated, it is difficult to understand the complicated data web they form much less get a simple visual flow.

Metadata

Metadata Key Performance Indicator Data Governance Data Quality

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

Part 2: Introducing Data Journeys. Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately.

Testing

Testing Data-driven Visualization Dashboards

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Such a solution should use the latest technologies, including Internet of Things (IoT) sensors, cloud computing, and machine learning (ML), to provide accurate, timely, and actionable data. To take advantage of this data and build an effective inventory management and forecasting solution, retailers can use a range of AWS services.

Forecasting

Forecasting Management IoT Data-driven

Top 6 Benefits of Automating End-to-End Data Lineage

erwin

SEPTEMBER 17, 2020

Replace manual and recurring tasks for fast, reliable data lineage and overall data governance. It’s paramount that organizations understand the benefits of automating end-to-end data lineage. The importance of end-to-end data lineage is widely understood and ignoring it is risky business. Doing Data Lineage Right.

Cost-Benefit

Cost-Benefit Data Governance Metadata Reporting

What is DataOps? Collaborative, cross-functional analytics

CIO Business Intelligence

DECEMBER 22, 2022

DataOps (data operations) is an agile, process-oriented methodology for developing and delivering analytics. It brings together DevOps teams with data engineers and data scientists to provide the tools, processes, and organizational structures to support the data-focused enterprise. What is DataOps?

Analytics

Analytics Machine Learning Data mining Software

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog.

Data Processing

Data Processing Visualization Data Lake Data Processing

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

As the world is gradually becoming more dependent on data, the services, tools and infrastructure are all the more important for businesses in every sector. Data management has become a fundamental business concern, and especially for businesses that are going through a digital transformation. What is data management?

Management

Management Data Warehouse Digital Transformation Dashboards

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

A Planning Center of Excellence Delivers Performance Improvement

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

The difference is in using advanced modeling and data management to make faster scenario planning possible, driven by actionable key performance measures that enable faster, well-informed decision cycles. This may sound like FP&A’s mission today. Today, FP&A organizations perform much of this work manually.

Forecasting

Forecasting Machine Learning Finance Predictive Analytics

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Alerts and notifications play a crucial role in maintaining data quality because they facilitate prompt and efficient responses to any data quality issues that may arise within a dataset. It simplifies your experience of monitoring and evaluating the quality of your data.

Data Quality

Data Quality Metrics Data-driven Visualization

Turning the page

Cloudera

JUNE 1, 2021

This means we can double down on our strategy – continuing to win the Hybrid Data Cloud battle in the IT department AND building new, easy-to-use cloud solutions for the line of business. It also means we can complete our business transformation with the systems, processes and people that support a new operating model. .

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Integrating healthcare apps and data with FHIR + HL7

IBM Big Data Hub

NOVEMBER 20, 2023

Today’s healthcare providers use a wide variety of applications and data across a broad ecosystem of partners to manage their daily workflows. Integrating these applications and data is critical to their success, allowing them to deliver patient care efficiently and effectively. What is HL7? What is the FHIR Standard?

Cost-Benefit

Cost-Benefit Data-driven Data Transformation Management

How healthcare organizations can analyze and create insights using price transparency data

AWS Big Data

OCTOBER 11, 2023

Under the Transparency in Coverage (TCR) rule , hospitals and payors to publish their pricing data in a machine-readable format. The data in the machine-readable files can provide valuable insights to understand the true cost of healthcare services and compare prices and quality across hospitals.

Visualization

Visualization Dashboards Data-driven Gap analysis

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

AWS Big Data

MAY 16, 2023

Different communication infrastructure types such as mesh network and cellular can be used to send load information on a pre-defined schedule or event data in real time to the backend servers residing in the utility UDN (Utility Data Network).

Dashboards

Dashboards Statistics Data Collection Business Intelligence

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive. What is data integrity? Data integrity risks.

Data Integration

Data Integration Testing Data Quality Data-driven

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you can’t make sense of your business data, you’re effectively flying blind. Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava believes in and takes advantage of data-driven decisions in order to become the market leader. smava believes in and takes advantage of data-driven decisions in order to become the market leader.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

We just announced the general availability of Cloudera DataFlow Designer , bringing self-service data flow development to all CDP Public Cloud customers. In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development.

Testing

Testing Publishing Metadata Interactive

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

In today’s data-driven world, the ability to effortlessly move and analyze data across diverse platforms is essential. Amazon AppFlow , a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

You can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. AWS Step Functions is a serverless orchestration service that enables developers to build visual workflows for applications as a series of event-driven steps.

Big Data

Big Data Data-driven Management Visualization

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

AWS Big Data

MARCH 15, 2023

Infomedia Ltd (ASX:IFM) is a leading global provider of DaaS and SaaS solutions that empowers the data-driven automotive ecosystem. In this post, we share how Infomedia built a serverless data pipeline with change data capture (CDC) using AWS Glue and Apache Hudi.

Cost-Benefit

Cost-Benefit Data Processing Optimization Data-driven

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. Building the right data model is an important part of your data strategy.

Modeling

Modeling Big Data IoT Data Warehouse

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. Each team is the sole owner of its AWS account.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

This allows you to simplify security and governance over transactional data lakes by providing access controls at table-, column-, and row-level permissions with your Apache Spark jobs. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

In this post, we share how the AWS Data Lab helped Tricentis to improve their software as a service (SaaS) Tricentis Analytics platform with insights powered by Amazon Redshift. Although Tricentis has amassed such data over a decade, the data remains untapped for valuable insights.

Software

Software Data Lake Testing Cost-Benefit

The Chief Marketing Officer and the CDO – A Modern Fable

Peter James Thomas

OCTOBER 30, 2018

Where they have, I have normally found the people holding these roles to be better informed about data matters than their peers. Prelude… I recently came across an article in Marketing Week with the clickbait-worthy headline of Why the rise of the chief data officer will be short-lived (their choice of capitalisation).

Marketing

Marketing Strategy Data Architecture Data Strategy

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. GenericInMemoryCatalog stores the catalog data in memory.

Data Lake

Data Lake Metadata Business Analysis Data-driven

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

Healthcare is changing, and it all comes down to data. Data & analytics represents a major opportunity to tackle these challenges. Indeed, many healthcare organizations today are embracing digital transformation and using data to enhance operations. How can data help change how care is delivered?

Data Governance

Data Governance Measurement Data Quality Metrics

Manual Feature Engineering

Domino Data Lab

AUGUST 20, 2019

Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models.

Testing

Testing Modeling Interactive Measurement

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

Its AI/ML-driven predictive analysis enhanced proactive threat hunting and phishing investigations as well as automated case management for swift threat identification. Options included hosting a secondary data center, outsourcing business continuity to a vendor, and establishing private cloud solutions.

IT

IT Insurance Cost-Benefit Testing

What is a Data Pipeline?

Jet Global

MAY 9, 2024

A data pipeline is a series of processes that move raw data from one or more sources to one or more destinations, often transforming and processing the data along the way. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Trending Sources

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

SAP Datasphere Powers Business at the Speed of Data

Data’s dark secret: Why poor quality cripples AI and growth

How EUROGATE established a data mesh architecture using Amazon DataZone

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

Ensuring Data Transformation Quality with dbt Core

What is Data Lineage? Top 5 Benefits of Data Lineage

DataOps Observability: Taming the Chaos (Part 2)

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

An AI Chat Bot Wrote This Blog Post …

Reference guide to build inventory management and forecasting solutions on AWS

Top 6 Benefits of Automating End-to-End Data Lineage

What is DataOps? Collaborative, cross-functional analytics

Use AWS Glue to streamline SFTP data processing

The Best Data Management Tools For Small Businesses

Amazon Redshift data ingestion options

A Planning Center of Excellence Delivers Performance Improvement

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Turning the page

Integrating healthcare apps and data with FHIR + HL7

How healthcare organizations can analyze and create insights using price transparency data

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

Data Integrity, the Basis for Reliable Insights

7 key Microsoft Azure analytics services (plus one extra)

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

How Infomedia built a serverless data pipeline with change data capture using AWS Glue and Apache Hudi

Building Better Data Models to Unlock Next-Level Intelligence

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

The Chief Marketing Officer and the CDO – A Modern Fable

Build a data lake with Apache Flink on Amazon EMR

The Rising Need for Data Governance in Healthcare

Manual Feature Engineering

CIO 100 Award winners drive business results with IT

What is a Data Pipeline?

Stay Connected