Data Processing, Data-driven and Metadata

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Together, these capabilities enable terminal operators to enhance efficiency and competitiveness in an industry that is increasingly data driven.

IoT

IoT Machine Learning Metadata Data-driven

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Build event-driven data pipelines using AWS Controllers for Kubernetes and Amazon EMR on EKS

AWS Big Data

MARCH 30, 2023

An event-driven architecture is a software design pattern in which decoupled applications can asynchronously publish and subscribe to events via an event broker. Amazon Elastic Kubernetes Service (Amazon EKS) is becoming a popular choice among AWS customers to host long-running analytics and AI or machine learning (ML) workloads.

Data-driven

Data-driven Metadata Testing Management

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

AWS Big Data

OCTOBER 24, 2024

It allows organizations to secure data, perform searches, analyze logs, monitor applications in real time, and explore interactive log analytics. With its scalability, reliability, and ease of use, Amazon OpenSearch Service helps businesses optimize data-driven decisions and improve operational efficiency.

Visualization

Visualization Management Data Processing Testing

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

But there’s a host of new challenges when it comes to managing AI projects: more unknowns, non-deterministic outcomes, new infrastructures, new processes and new tools. AI products are automated systems that collect and learn from data to make user-facing decisions. Why AI software development is different.

Management

Management Machine Learning Experimentation Metrics

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Organizational data is often fragmented across multiple lines of business, leading to inconsistent and sometimes duplicate datasets. This fragmentation can delay decision-making and erode trust in available data. This solution enhances governance and simplifies access to unstructured data assets across the organization.

Publishing

Publishing Unstructured Data Metadata Data-driven

Data confidence begins at the edge

CIO Business Intelligence

SEPTEMBER 23, 2024

Data-driven insights are only as good as your data Imagine that each source of data in your organization—from spreadsheets to internet of things (IoT) sensor feeds—is a delegate set to attend a conference that will decide the future of your organization.

Manufacturing

Manufacturing Internet of Things Metadata Risk

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

With this new instance family, OpenSearch Service uses OpenSearch innovation and AWS technologies to reimagine how data is indexed and stored in the cloud. Today, customers widely use OpenSearch Service for operational analytics because of its ability to ingest high volumes of data while also providing rich and interactive analytics.

Optimization

Optimization Snapshot Metadata Cost-Benefit

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

How REA Group approaches Amazon MSK cluster capacity planning

AWS Big Data

DECEMBER 5, 2024

Enterprises that need to share and access large amounts of data across multiple domains and services need to build a cloud infrastructure that scales as need changes. To achieve this, the different technical products within the company regularly need to move data across domains and services efficiently and reliably.

Metrics

Metrics Dashboards Testing Optimization

What Is Data Governance? (And Why Your Organization Needs It)

erwin

AUGUST 28, 2020

Organizations with a solid understanding of data governance (DG) are better equipped to keep pace with the speed of modern business. In this post, the erwin Experts address: What Is Data Governance? Why Is Data Governance Important? What Is Good Data Governance? What Are the Key Benefits of Data Governance?

Data Governance

Data Governance IT Cost-Benefit Metadata

Data Governance Maturity and Tracking Progress

erwin

APRIL 16, 2021

Data governance is best defined as the strategic, ongoing and collaborative processes involved in managing data’s access, availability, usability, quality and security in line with established internal policies and relevant data regulations. Data Governance Is Business Transformation. Enhanced : Data managed equally.

Data Governance

Data Governance Metadata Cost-Benefit Data-driven

How to build a safe path to AI in Healthcare

CIO Business Intelligence

AUGUST 5, 2024

This is evident in the rigorous training required for providers, the stringent safety protocols for life sciences professionals, and the stringent data and privacy requirements for healthcare analytics software. Concerns about data security, privacy, and accuracy have been at the forefront of these discussions.

Experimentation

Experimentation Risk Metadata Data-driven

Themes and Conferences per Pacoid, Episode 11

Domino Data Lab

JULY 2, 2019

Paco Nathan ‘s latest article covers program synthesis, AutoPandas, model-driven data queries, and more. In other words, using metadata about data science work to generate code. In this case, code gets generated for data preparation, where so much of the “time and labor” in data science work is concentrated.

Metadata

Metadata Data Science Machine Learning Data-driven

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Business Intelligence for Fairs, Congresses and Exhibitions

Smart Data Collective

APRIL 14, 2021

Advancement in big data technology has made the world of business even more competitive. The proper use of business intelligence and analytical data is what drives big brands in a competitive market. Business intelligence tools can include data warehousing, data visualizations, dashboards, and reporting.

Business Intelligence

Business Intelligence Dashboards Visualization Big Data

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

Organizations are managing more data than ever. With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Who is authorized to use it and how?

Data Governance

Data Governance Cost-Benefit Metadata Risk

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

We use leading-edge analytics, data, and science to help clients make intelligent decisions. We developed and host several applications for our customers on Amazon Web Services (AWS). Data ingestion: The data ingestion layer is the first step of the proposed framework.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

AWS Big Data

MAY 31, 2024

This encompasses tasks such as integrating diverse data from various sources with distinct formats and structures, optimizing the user experience for performance and security, providing multilingual support, and optimizing for cost, operations, and reliability.

Metadata

Metadata Data-driven Management Testing

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient.

Metrics

Metrics Visualization Dashboards Publishing

Themes and Conferences per Pacoid, Episode 10

Domino Data Lab

JUNE 2, 2019

Co-chair Paco Nathan provides highlights of Rev 2 , a data science leaders summit. We held Rev 2 May 23-24 in NYC, as the place where “data science leaders and their teams come to learn from each other.” Nick Elprin, CEO and co-founder of Domino Data Lab. First item on our checklist: did Rev 2 address how to lead data teams?

Data Science

Data Science Data-driven Machine Learning Modeling

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

AUGUST 16, 2021

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. Apache Nifi is a powerful tool to build data movement pipelines using a visual flow designer. Implementing an automated scale up and scale down procedure for NiFi clusters is complex and time consuming.

Dashboards

Dashboards Metrics KPI Data-driven

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

The company uses AWS Cloud services to build data-driven products and scale engineering best practices. To ensure a sustainable data platform amid growth and profitability phases, their tech teams adopted a decentralized data mesh architecture. The solution Acast implemented is a data mesh, architected on AWS.

Data-driven

Data-driven Advertising Metadata Data Architecture

Ingest and analyze your data using Amazon OpenSearch Service with Amazon OpenSearch Ingestion

AWS Big Data

JUNE 12, 2024

In today’s data-driven world, organizations are continually confronted with the task of managing extensive volumes of data securely and efficiently. A common use case that we see amongst customers is to search and visualize data. A common use case that we see amongst customers is to search and visualize data.

Dashboards

Dashboards Visualization Sales IoT

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

AWS Big Data

AUGUST 6, 2024

INSITE applications are in general data intensive. They ingest and transform large volumes of data in different formats and processing patterns (such as batch and near real time) from various sources internal and external to Amazon. To enable and meet these requirements, GTTS built its own data platform.

Cost-Benefit

Cost-Benefit Metadata Snapshot Metrics

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

What Makes a Data Fabric? Data Fabric’ has reached where ‘Cloud Computing’ and ‘Grid Computing’ once trod. Data Fabric hit the Gartner top ten in 2019. This multiplicity of data leads to the growth silos, which in turns increases the cost of integration. It is a buzzword.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Data management platform definition A data management platform (DMP) is a suite of tools that helps organizations to collect and manage data from a wide array of first-, second-, and third-party sources and to create reports and build customer profiles as part of targeted personalization campaigns.

Management

Management Advertising Data Lake Sales

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

The foundation for ESG reporting, of course, is data. What companies need more than anything is good data for ESG reporting. That means ensuring ESG data is available, transparent, and actionable, says Ivneet Kaur, EVP and chief information technology officer at identity services provider Sterling.

Reporting

Reporting Data Quality Strategy Data-driven

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Data teams have the impossible task of delivering everything (data and workloads) everywhere (on premise and in all clouds) all at once (with little to no latency). Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data mesh defined.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Today, we’re making available a new capability of AWS Glue Data Catalog that allows generating column-level statistics for AWS Glue tables. Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams.

Statistics

Statistics Data Lake Optimization Data-driven

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Ontotext Invents the Universe So You Don’t Need To

Ontotext

NOVEMBER 22, 2020

Data and content are organized in a way that facilitates discoverability, insights and decision making rather than be bound by limitations of data formats and legacy systems. GraphQL has a number of advantages for developers, especially for data-centric applications. Content Enrichment and Metadata Management.

Metadata

Metadata Cost-Benefit Unstructured Data Technology

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

AWS Big Data

MAY 16, 2024

Apache Airflow is a popular platform for enterprises looking to orchestrate complex data pipelines and workflows. In this post, we’re excited to introduce two new features that address common customer challenges and unlock new possibilities for building robust, scalable, and flexible data orchestration solutions using Amazon MWAA.

Testing

Testing Metrics Interactive Management

On the Hunt for Patterns: from Hippocrates to Supercomputers

Ontotext

MAY 18, 2020

Ever since Hippocrates founded his school of medicine in ancient Greece some 2,500 years ago, writes Hannah Fry in her book Hello World: Being Human in the Age of Algorithms , what has been fundamental to healthcare (as she calls it “the fight to keep us healthy”) was observation, experimentation and the analysis of data.

Knowledge Discovery

Knowledge Discovery Experimentation Data-driven Metadata

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

It enriched their understanding of the full spectrum of knowledge graph business applications and the technology partner ecosystem needed to turn data into a competitive advantage. Content and data management solutions based on knowledge graphs are becoming increasingly important across enterprises.

Metadata

Metadata Sales Machine Learning Consulting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR ENTERPRISE AI.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Introducing erwin Data Intelligence 14: Dive into data quality, ensure data reliability and leverage new deployment flexibility

erwin

SEPTEMBER 2, 2024

Added data quality capability ready for an AI era Data quality has never been more important than as we head into this next AI-focused era. erwin Data Quality is the data quality heart of erwin Data Intelligence. erwin Data Quality is the data quality heart of erwin Data Intelligence.

Data Quality

Data Quality Data Processing Measurement Metadata

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Webinars

Build event-driven data pipelines using AWS Controllers for Kubernetes and Amazon EMR on EKS

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

What you need to know about product management for AI

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Data confidence begins at the edge

Governing data in relational databases using Amazon DataZone

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

How REA Group approaches Amazon MSK cluster capacity planning

What Is Data Governance? (And Why Your Organization Needs It)

Data Governance Maturity and Tracking Progress

How to build a safe path to AI in Healthcare

Themes and Conferences per Pacoid, Episode 11

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Business Intelligence for Fairs, Congresses and Exhibitions

How Data Governance Protects Sensitive Data

Top 15 data management platforms

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Themes and Conferences per Pacoid, Episode 10

Cloudera DataFlow for the Public Cloud: A technical deep dive

Design a data mesh on AWS that reflects the envisioned organization

Ingest and analyze your data using Amazon OpenSearch Service with Amazon OpenSearch Ingestion

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

From Data Silos to Data Fabric with Knowledge Graphs

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Top 15 data management platforms available today

CIOs rise to the ESG reporting challenge

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Enhance query performance using AWS Glue Data Catalog column-level statistics

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Ontotext Invents the Universe So You Don’t Need To

Introducing Amazon MWAA support for the Airflow REST API and web server auto scaling

On the Hunt for Patterns: from Hippocrates to Supercomputers

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Create an end-to-end data strategy for Customer 360 on AWS

Announcing the 2021 Data Impact Awards

Introducing erwin Data Intelligence 14: Dive into data quality, ensure data reliability and leverage new deployment flexibility

Enrich your serverless data lake with Amazon Bedrock

Stay Connected