Data Leaders Brief

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Data management is the foundation of quantitative research.

Metadata

Metadata Snapshot Cost-Benefit Optimization

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

CIO Business Intelligence

DECEMBER 11, 2024

According to Richard Kulkarni, Country Manager for Quest, a lack of clarity concerning governance and policy around AI means that employees and teams are finding workarounds to access the technology. Some senior technology leaders fear a Pandoras Box type situation with AI becoming impossible to control once unleashed.

Risk

Risk Data Strategy Strategy Data Governance

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. Within this feature, user data is secure and private.

Metadata

Metadata Sales Data Warehouse Optimization

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight. This led to inefficiencies in data governance and access control.

Data Lake

Data Lake Sales Metadata Machine Learning

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. Both Delta Lake and Iceberg metadata files reference the same data files.

Metadata

Metadata Data Warehouse Big Data Data Lake

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Zero-ETL is a set of fully managed integrations by AWS that minimizes the need to build ETL data pipelines. We take care of the ETL for you by automating the creation and management of data replication. Zero-ETL provides service-managed replication. Glue ETL offers customer-managed data ingestion. What is zero-ETL?

Data Integration

Data Integration Data Lake Statistics Data-driven

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

Kinesis Data Streams is a fully managed, serverless data streaming service that stores and ingests various streaming data in real time at any scale. Solution overview In this solution, we consider a common use case for centralized log aggregation for an organization. To create a Kinesis Data Stream, see Create a data stream.

Metadata

Metadata Metrics Analytics Data Processing

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Solution overview To illustrate the new Amazon Bedrock Knowledge Bases integration with structured data in Amazon Redshift, we will build a conversational AI-powered assistant for financial assistance that is designed to help answer financial inquiries, like Who has the most accounts? Create an AWS Identity and Access Management (IAM) role.

Structured Data

Structured Data Data Warehouse Analytics Finance

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

With this launch of JDBC connectivity, Amazon DataZone expands its support for data users, including analysts and scientists, allowing them to work in their preferred environments—whether it’s SQL Workbench, Domino, or Amazon-native solutions—while ensuring secure, governed access within Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

When building custom stream processing applications, developers typically face challenges with managing distributed computing at scale that is required to process high throughput data in real time. reduces the Amazon DynamoDB cost associated with KCL by optimizing read operations on the DynamoDB table storing metadata.

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant. Instead, organizations resort to manual workarounds often managed by overburdened analysts or domain experts. Assign domain data stewards.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Amazon DataZone , a data management service, helps you catalog, discover, share, and govern data stored across AWS, on-premises systems, and third-party sources. This solution enhances governance and simplifies access to unstructured data assets across the organization. The solution architecture is shown in the following screenshot.

Publishing

Publishing Unstructured Data Metadata Data-driven

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

As organizations increasingly adopt cloud-based solutions and centralized identity management, the need for seamless and secure access to data warehouses like Amazon Redshift becomes crucial. federated users to access the AWS Management Console. From there, the user can access the Redshift Query Editor V2.

Sales

Sales Metadata Enterprise Testing

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that builds upon Apache Airflow, offering its benefits while eliminating the need for you to set up, operate, and maintain the underlying infrastructure, reducing operational overhead while increasing security and resilience.

Interactive

Interactive Testing Data-driven Data Lake

Becoming a machine learning company means investing in foundational technologies

O'Reilly on Data

MAY 21, 2019

This is a good time to assess enterprise activities, as there are many indications a number of companies are already beginning to use machine learning. and managed services in the cloud. Not surprisingly, data integration and ETL were among the top responses, with 60% currently building or evaluating solutions in this area.

Machine Learning

Machine Learning Technology Deep Learning Data Science

What is SCOR? A model to improve supply chain management

CIO Business Intelligence

MAY 20, 2025

Supply chain management (SCM) is a critical focus for companies that sell products, services, hardware, and software. The updated version includes more emerging drivers of supply chain success, covering topics such as omnichannel, metadata, and blockchain , according to the ASCM. What is the main focus of the SCOR model?

Modeling

Modeling Management Metrics Measurement

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

If you’re already a software product manager (PM), you have a head start on becoming a PM for artificial intelligence (AI) or machine learning (ML). But there’s a host of new challenges when it comes to managing AI projects: more unknowns, non-deterministic outcomes, new infrastructures, new processes and new tools.

Management

Management Machine Learning Experimentation Metrics

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

A recent flourish of posts and papers has outlined the broader topic, listed attack vectors and vulnerabilities, started to propose defensive solutions, and provided the necessary framework for this post. Like many others, I’ve known for some time that machine learning models themselves could pose security risks. Data poisoning attacks.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

Disaster recovery strategies for Amazon MWAA – Part 2

AWS Big Data

JUNE 17, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed orchestration service that makes it straightforward to run data processing workflows at scale. In this post, we dive deep into the implementation for both strategies and provide a deployable solution to realize the architectures in your own AWS account.

Strategy

Strategy Metadata Recreation/Entertainment Metrics

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

It encompasses the people, processes, and technologies required to manage and protect data assets. The Data Management Association (DAMA) International defines it as the “planning, oversight, and control over management of data and the use of data and data-related sources.”

Data Governance

Data Governance Management Metadata Data Quality

Best Practices for Metadata Management

Alation

JULY 19, 2021

What Is Metadata? Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata.

Metadata

Metadata Management Data Governance Machine Learning

What Is Active Metadata Management and How Does It Work?

Octopai

OCTOBER 18, 2021

First, what active metadata management isn’t : “Okay, you metadata! Now, what active metadata management is (well, kind of): “Okay, you metadata! Metadata are the details on those tools: what they are, what to use them for, what to use them with. . That takes active metadata management.

Metadata

Metadata Management IT Data Quality

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Why it’s challenging to process and manage unstructured data Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). You can integrate different technologies or tools to build a solution.

Unstructured Data

Unstructured Data Metadata Management Analytics

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. We introduce you to Amazon Managed Service for Apache Flink Studio and get started querying streaming data interactively using Amazon Kinesis Data Streams.

Management

Management Metadata Analytics Dashboards

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 10) Data Quality Solutions: Key Attributes. However, with all good things comes many challenges and businesses often struggle with managing their information in the correct way. Enters data quality management. What Is Data Quality Management (DQM)? Table of Contents.

Data Quality

Data Quality Metrics Data-driven Management

Data Governance as an Emergency Service

erwin

MAY 20, 2020

Organizations need to understand what the most critical operational activities are and the most impactful projects that need to proceed. Where crisis leads to vulnerability, data governance as an emergency service enables organization management to direct or redirect efforts to ensure activities continue and risks are mitigated.

Data Governance

Data Governance Metadata Risk Strategy

CRM’s Have a Big Data Technical Debt Problem: Here’s How to Fix It

Smart Data Collective

JULY 27, 2021

Customer relationship management (CRM) platforms are very reliant on big data. In software development, technical debt is often defined as the cost of choosing an easy solution now instead of a better approach that might take longer. Complex Salesforce orgs can work just fine if they are properly managed.

Big Data

Big Data Snapshot IT Dashboards

Configure SAML federation with Amazon OpenSearch Serverless and Keycloak

AWS Big Data

JULY 24, 2024

Amazon OpenSearch Serverless is a serverless version of Amazon OpenSearch Service , a fully managed open search and analytics platform. This improves the user experience and reduces the overhead of managing multiple credentials. aoss:UpdateSecurityConfig – Modify a given SAML provider configuration, including the XML metadata.

Dashboards

Dashboards Metadata Visualization Consulting

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Implementing these solutions requires data sharing between purpose-built data stores.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Disaster recovery strategies for Amazon MWAA – Part 1

AWS Big Data

JANUARY 16, 2024

For organizations implementing critical workload orchestration using Amazon Managed Workflows for Apache Airflow (Amazon MWAA), it is crucial to have a DR plan in place to ensure business continuity. Within Airflow, the metadata database is a core component storing configuration variables, roles, permissions, and DAG run histories.

Strategy

Strategy Metadata Metrics Dashboards

What Is a Metadata Management Tool?

Octopai

DECEMBER 12, 2021

Metadata is the pertinent, practical details about data assets: what they are, what to use them for, what to use them with. Without metadata, data is just a heap of numbers and letters collecting dust. Where does metadata come from? What is a metadata management tool? Metadata harvesting.

Metadata

Metadata Management Data Quality Data Governance

Oracle launches a new sustainability app for Fusion Cloud EPM

CIO Business Intelligence

SEPTEMBER 11, 2024

Oracle has announced the launch of Oracle Fusion Cloud Sustainability — an app that integrates data from Oracle Fusion Cloud ERP and Oracle Fusion Cloud SCM , enabling analysis and reporting within Oracle Fusion Cloud Enterprise Performance Management (EPM) and Oracle Fusion Data Intelligence.

Contextual Data

Contextual Data Key Performance Indicator Dashboards Data-driven

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

erwin

FEBRUARY 6, 2020

As organizations deal with managing ever more data, the need to automate data management becomes clear. One piece of the research that stuck with me is that 70% of respondents spend 10 or more hours per week on data-related activities. That’s a lot of data to manage! It’s time to automate data management.

Management

Management Data Governance Cost-Benefit Metadata

Automating Data Governance

erwin

OCTOBER 29, 2020

Last but not least, we looked at the amount of time spent on data activities. The great news is that most organizations spend more than 10 hours a week on data-related activities. Automating data operations adds a lot of value by making a solution more effective and more powerful. Data Automation Adds Value.

Data Governance

Data Governance Metadata Digital Transformation ROI

7 enterprise data strategy trends

CIO Business Intelligence

NOVEMBER 22, 2022

Every enterprise needs a data strategy that clearly defines the technologies, processes, people, and rules needed to safely and securely manage its information assets and practices. Data is no longer just used by analysts and data scientists,” says Dinesh Nirmal, general manager of AI and automation at IBM Data.

Data Strategy

Data Strategy Strategy Enterprise Consulting

What Does Data Archiving Bring To Healthcare Intelligence?

Smart Data Collective

OCTOBER 28, 2020

What few of these groups know how to reckon with, though, is how to best manage data that’s no longer in use – particularly data from systems the organization has since retired. When data is no longer in active use, the best thing that healthcare systems can do it archive it. What’s the best way to handle this information?

Cost-Benefit

Cost-Benefit Risk Metadata Big Data

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. We use AWS Glue , a fully managed, serverless, ETL (extract, transform, and load) service, and the Google BigQuery Connector for AWS Glue (for more information, refer to Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors ).

Metadata

Metadata Data Warehouse Big Data Analytics

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

Designing for high throughput with 11 9s of durability OpenSearch Service manages tens of thousands of OpenSearch clusters. The following diagram illustrates the recovery flow in OR1 instances OR1 instances persist not only the data, but the cluster metadata like index mappings, templates, and settings in Amazon S3.

Optimization

Optimization Snapshot Metadata Cost-Benefit

Use Amazon OpenSearch Ingestion to migrate to Amazon OpenSearch Serverless

AWS Big Data

FEBRUARY 27, 2024

Customers prefer to let the service manage its capacity automatically rather than having to manually provision capacity. Until now, customers have had to rely on using custom code or third-party solutions to move the data between provisioned OpenSearch Service domains and OpenSearch Serverless.

Metadata

Metadata Data Processing Dashboards IoT

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

There are challenges such as complexity in managing cross-account permissions and difficulty in discovering the right data across accounts that organizations face when trying to share data products across AWS accounts. A straightforward data access and sharing mechanism is crucial for enabling effective data sharing across an organization.

Analytics

Analytics Publishing Metadata Sales

Our Top Data and Analytics Predicts for 2021

Andrew White

JANUARY 12, 2021

By 2024 , 60% of the data used for the development of AI and analytics solutions will be synthetically generated. Predicts 2021: Artificial Intelligence in Enterprise Applications : By 2024, the degree of manual effort required for the contract review process will be halved in enterprises that adopt advanced contract analytics solutions.

Analytics

Analytics Metadata Enterprise Data-driven

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

To address these growing data management challenges, AWS customers are using Amazon DataZone , a data management service that makes it fast and effortless to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. The overall structure can be represented in the following figure.

Data Governance

Data Governance Publishing Data-driven Metadata

Organize content across business units with enterprise-wide data governance using Amazon DataZone domain units and authorization policies

AWS Big Data

AUGUST 13, 2024

Amazon DataZone has announced a set of new data governance capabilities—domain units and authorization policies—that enable you to create business unit-level or team-level organization and manage policies according to your business needs. Some examples of child domain units include drug discovery and clinical trials management.

Data Governance

Data Governance Metadata Enterprise Sales

Build a high-performance quant research platform with Apache Iceberg

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

Webinars

Trending Sources

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Webinars

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

Data’s dark secret: Why poor quality cripples AI and growth

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Becoming a machine learning company means investing in foundational technologies

What is SCOR? A model to improve supply chain management

What you need to know about product management for AI

Proposals for model vulnerability and security

Disaster recovery strategies for Amazon MWAA – Part 2

What is data governance? Best practices for managing data assets

Best Practices for Metadata Management

What Is Active Metadata Management and How Does It Work?

Unstructured data management and governance using AWS AI/ML and analytics services

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Data Governance as an Emergency Service

CRM’s Have a Big Data Technical Debt Problem: Here’s How to Fix It

Configure SAML federation with Amazon OpenSearch Serverless and Keycloak

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Disaster recovery strategies for Amazon MWAA – Part 1

What Is a Metadata Management Tool?

Oracle launches a new sustainability app for Fusion Cloud EPM

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

Automating Data Governance

7 enterprise data strategy trends

What Does Data Archiving Bring To Healthcare Intelligence?

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

Use Amazon OpenSearch Ingestion to migrate to Amazon OpenSearch Serverless

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Our Top Data and Analytics Predicts for 2021

HEMA accelerates their data governance journey with Amazon DataZone

Organize content across business units with enterprise-wide data governance using Amazon DataZone domain units and authorization policies

Stay Connected