Blog - Data Leaders Brief

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources.

Analytics

Analytics Visualization Data Governance Data-driven

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

Amazon Redshift supports querying data stored in Apache Iceberg tables managed by Amazon S3 Tables , which we previously covered as part of getting started blog post. Well also review an example with simultaneously using data that resides both in Amazon Redshift and Amazon S3 Tables, enabling a unified analytics experience.

Analytics

Analytics Data Lake Management Insurance

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility. Source: Thoughtworks.

Testing

Testing Data Lake Metadata Publishing

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

AWS Big Data

DECEMBER 27, 2024

4xlarge instances, providing observable gains for data processing tasks. To minimize the influence of external catalogs like AWS Glue and Hive, we used the Hadoop catalog for the Iceberg tables. This uses the underlying file system, specifically Amazon S3, as the catalog. with Iceberg 1.6.1 and Iceberg 1.5.2.

Cost-Benefit

Cost-Benefit Testing Metrics Optimization

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data.

Analytics

Analytics Sales Testing Cost-Benefit

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Data governance is a complex but critical practice. There’s always more data to handle, much of it unstructured; more data sources, like IoT, more points of integration, and more regulatory compliance requirements.

Metadata

Metadata Data Governance Management Cost-Benefit

DataOps Facilitates Remote Work

DataKitchen

JANUARY 5, 2021

Remote working has revealed the inconsistency and fragility of workflow processes in many data organizations. The data teams share a common objective; to create analytics for the (internal or external) customer. Data Science Workflow – Kubeflow, Python, R. Data Engineering Workflow – Airflow, ETL.

Testing

Testing Data Governance Metadata Visualization

Data Intelligence and Its Role in Combating Covid-19

erwin

MARCH 30, 2020

Data intelligence has a critical role to play in the supercomputing battle against Covid-19. While leveraging supercomputing power is a tremendous asset in our fight to combat this global pandemic, in order to deliver life-saving insights, you really have to understand what data you have and where it came from.

Metadata

Metadata IT Data Governance Data Quality

Data Governance as an Emergency Service

erwin

MAY 20, 2020

Data governance (DG) as a an “emergency service” may be one critical lesson learned coming out of the COVID-19 crisis. Where crisis leads to vulnerability, data governance as an emergency service enables organization management to direct or redirect efforts to ensure activities continue and risks are mitigated.

Data Governance

Data Governance Metadata Risk Strategy

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

I’m excited to share the results of our new study with Dataversity that examines how data governance attitudes and practices continue to evolve. Defining Data Governance: What Is Data Governance? . 1 reason to implement data governance. Constructing a Digital Transformation Strategy: How Data Drives Digital.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

Migrate workloads from AWS Data Pipeline

AWS Big Data

JULY 25, 2024

AWS Data Pipeline helps customers automate the movement and transformation of data. With Data Pipeline, customers can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. Some customers want a deeper level of control and specificity than possible using Data Pipeline.

Visualization

Visualization Management Data Integration Testing

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

JULY 15, 2021

The release of Cloudera Data Platform (CDP) Private Cloud Base edition provides customers with a next generation hybrid cloud architecture. Traditional data clusters for workloads not ready for cloud. Introduction and Rationale. Private Cloud Base Overview. Further information and documentation [link] . Summary of major changes.

Data Processing

Data Processing Metadata Testing Management

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

This blog post is co-written with Raj Samineni from ATPCO. In today’s data-driven world, companies across industries recognize the immense value of data in making decisions, driving innovation, and building new products to serve their customers.

Data Lake

Data Lake Metadata Sales Publishing

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

Octopai

APRIL 19, 2021

Aptly named, metadata management is the process in which BI and Analytics teams manage metadata, which is the data that describes other data. In other words, data is the context and metadata is the content. Without metadata, BI teams are unable to understand the data’s full story. Donna Burbank.

Metadata

Metadata Management Business Intelligence Data Governance

The Value of Catalog-Led Data Governance

Alation

NOVEMBER 4, 2021

This week I was talking to a data practitioner at a global systems integrator. The practitioner asked me to add something to a presentation for his organization: the value of data governance for things other than data compliance and data security. Now to be honest, I immediately jumped onto data quality.

Data Governance

Data Governance Metadata Data Quality Enterprise

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. However, efficiently managing and synchronizing data within these lakes presents a significant challenge.

Data Lake

Data Lake Marketing Data Processing Management

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. They don’t know exactly what data they have or even where some of it is.

Metadata

Metadata Management Data-driven Data Architecture

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Capital Fund Management ( CFM ) is an alternative investment management company based in Paris with staff in New York City and London. CFM assets under management are now $13 billion. Using social network data has also often been cited as a potential source of data to improve short-term investment decisions.

Interactive

Interactive Strategy Cost-Benefit Data Governance

How to Do Data Modeling the Right Way

erwin

MAY 27, 2020

Data modeling supports collaboration among business stakeholders – with different job roles and skills – to coordinate with business objectives. Data resides everywhere in a business , on-premise and in private or public clouds. A single source of data truth helps companies begin to leverage data as a strategic asset.

Modeling

Modeling Metadata Data Governance Visualization

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

And yeah, the real-world relationships among the entities represented in the data had to be fudged a bit to fit in the counterintuitive model of tabular data, but, in trade, you get reliability and speed. Ironically, relational databases only imply relationships between data points by whatever row or column they exist in.

Metadata

Metadata Cost-Benefit OLAP Modeling

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. To achieve this, Oktank envisions a unified data query layer using Athena.

Data Lake

Data Lake Analytics Cost-Benefit Management

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

erwin

MAY 31, 2019

It’s time to consider data-driven enterprise architecture. The traditional approach to enterprise architecture – the analysis, design, planning and implementation of IT capabilities for the successful execution of enterprise strategy – seems to be missing something … data. Data-Driven Enterprise Architecture and Cloud Migration.

Data-driven

Data-driven Enterprise Metadata Strategy

Amazon EMR 7.1 runtime for Apache Spark and Iceberg can run Spark workloads 2.7 times faster than Apache Spark 3.5.1 and Iceberg 1.5.2

AWS Big Data

AUGUST 26, 2024

4xlarge instances, providing observable gains for data processing tasks. In this post, we explore the performance benefits of using the Amazon EMR runtime for Apache Spark and Apache Iceberg compared to running the same workloads with open source Spark 3.5.1 on Iceberg tables. Additionally, the cost efficiency improves by 2.2 workloads 4.5

Cost-Benefit

Cost-Benefit Testing Optimization Metrics

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Data catalogs have quickly become a core component of modern data management. Organizations with successful data catalog implementations see remarkable changes in the speed and quality of data analysis, and in the engagement and enthusiasm of people who need to perform data analysis.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

How a data fabric overcomes data sprawls to reduce time to insights

IBM Big Data Hub

APRIL 28, 2022

Data agility, the ability to store and access your data from wherever makes the most sense, has become a priority for enterprises in an increasingly distributed and complex environment. That’s where the data fabric comes in. Data fabric in action: Retail supply chain example. enterprises to minimize their time to value.

Data Warehouse

Data Warehouse Metadata Forecasting Predictive Modeling

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. ZS is a management consulting and technology firm focused on transforming global healthcare.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

Operational Database Security – Part 2

Cloudera

SEPTEMBER 23, 2020

We are going to talk about auditing, different security levels, security features of Data Catalog, and Client Considerations. Access audits are mastered centrally in Apache Ranger which provides comprehensive non-repudiable audit log for every access event to every resource with rich access event metadata such as: IP.

Data Lake

Data Lake Metadata IoT Enterprise

Implement column-level encryption to protect sensitive data in Amazon Redshift with AWS Glue and AWS Lambda user-defined functions

AWS Big Data

APRIL 5, 2023

Amazon Redshift is a massively parallel processing (MPP), fully managed petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using existing business intelligence tools. A sample 256-bit data encryption key is generated and securely stored using AWS Secrets Manager.

Data Warehouse

Data Warehouse Management Business Intelligence Testing

Data Governance Program: Ensuring a Successful Delivery

Alation

AUGUST 17, 2022

According to analysts, data governance programs have not shown a high success rate. According to CIOs , historical data governance programs were invasive and suffered from one of two defects: They were either forced on the rank and file — who grew to dislike IT as a result. The Risks of Early Data Governance Programs.

Data Governance

Data Governance Risk Data-driven Measurement

Data Catalog First, Master Data Management Second: Here’s Why

Alation

DECEMBER 21, 2022

Master Data Management (MDM) and data catalog growth are accelerating because organizations must integrate more systems, comply with privacy regulations, and address data quality concerns. What Is Master Data Management (MDM)? Data Catalog and Master Data Management.

Management

Management Data Quality Metadata Testing

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

AWS Big Data

MARCH 9, 2023

Thousands of customers rely on Amazon Redshift to build data warehouses to accelerate time to insights with fast, simple, and secure analytics at scale and analyze data from terabytes to petabytes by running complex analytical queries. The star schema is a popular data model for building data marts.

Slice and Dice

Slice and Dice Data Warehouse Metrics Metadata

Projects in SQL Stream Builder

Cloudera

MAY 1, 2023

Businesses everywhere have engaged in modernization projects with the goal of making their data and application infrastructure more nimble and dynamic. The newly introduced Environments feature allows you to export only the generic, reusable parts of code and resources, while managing environment-specific configuration separately.

Testing

Testing Data Processing Management IT

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Quality Solutions

IBM Big Data Hub

NOVEMBER 4, 2022

Data is the new oil and organizations of all stripes are tapping this resource to fuel growth. However, data quality and consistency are one of the top barriers faced by organizations in their quest to become more data-driven. Unlock quality data with IBM. and its leading data observability offerings.

Data Quality

Data Quality Metadata Data Governance Data-driven

Multi-Channel Attribution: Definitions, Models and a Reality Check

Occam's Razor

APRIL 2, 2012

Included in the post are recommendations for measurement and data analysis. While I'm using the term Store here, it encompasses sales (or leads or catalog requests) driven to a retail store or company call center, people driven to donate blood via online campaigns, or essentially any offline outcome driven by the online channel.

Modeling

Modeling Advertising Marketing Measurement

Alcion supports their multi-tenant platform with Amazon OpenSearch Serverless

AWS Big Data

JULY 25, 2023

This is a guest blog post co-written with Zack Rossman from Alcion. Alcion, a security-first, AI-driven backup-as-a-service (BaaS) platform, helps Microsoft 365 administrators quickly and intuitively protect data from cyber threats and accidental data loss. OpenSearch is an Apache-2.0-licensed, OpenSearch is an Apache-2.0-licensed,

Cost-Benefit

Cost-Benefit Modeling Management Optimization

Data fabric marketplace: The heart of data economy

IBM Big Data Hub

JULY 25, 2022

Modern-day enterprises face a similar situation regarding data assets. On one side there is a need for data. Businesses ask: “Do we have this kind of data in the enterprise?” “How do we get that data?” “Can Can I trust that data?” This discussion is more relevant with the advent of data fabric.

Metadata

Metadata Enterprise Consulting Publishing

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

And now, arguably the greatest rivalry the world (well, at least the data community) has ever witnessed: Data Fabric vs Data Mesh! Data fabric and data mesh are both having a moment. Gartner calls data fabric the Future of Data Management 1. Gartner on Data Fabric. Tyson vs Holyfield.

Data Lake

Data Lake Metadata Data-driven Data Governance

Migrate to CDP Private Cloud Base – A Step by Step Guide

Cloudera

SEPTEMBER 30, 2021

Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. Zookeeper data. HDFS Master Node data directories. Hue dependencies.

Metadata

Metadata Testing Management Data-driven

Data Catalogs: A Category of Their Own

Alation

FEBRUARY 20, 2020

Data catalogs are here to stay. This week, two independent analyst reports validated what we’ve known for years – data catalogs are critical for self-service analytics.[1]. The Forrester Wave : Machine Learning Data Catalogs, Q2 2018. This is Forrester’s inaugural Wave on data catalogs.

Machine Learning

Machine Learning Marketing Reporting Data-driven

Google Analytics Tutorial: 8 Valuable Tips To Hustle With Data!

Occam's Razor

JANUARY 30, 2012

Nourish yourself with the "info snacks" the tool's engineers and product managers cooked up. Ravaging data. Leverage Custom Alerts – Let Data Kick Your Butt Into Action. #3. In-Page Analytics – Re-imagine Traveling Through Data. #5. Exploit every possible button. Produce built-in visualization magic.

Analytics

Analytics Dashboards Metrics Key Performance Indicator

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

Data fabric is now on the minds of most data management leaders. In our previous blog, Data Mesh vs. Data Fabric: A Love Story , we defined data fabric and outlined its uses and motivations. The data catalog is a foundational layer of the data fabric.

Metadata

Metadata IT Data-driven Metrics

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Scaling RISE with SAP data and AWS Glue

Webinars

Trending Sources

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

Webinars

Addressing Data Mesh Technical Challenges with DataOps

Amazon EMR 7.5 runtime for Apache Spark and Iceberg can run Spark workloads 3.6 times faster than Spark 3.5.3 and Iceberg 1.6.1

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

How DataOps is Transforming Commercial Pharma Analytics

Centralize Your Data Processes With a DataOps Process Hub

Data Governance and Metadata Management: You Can’t Have One Without the Other

DataOps Facilitates Remote Work

Data Intelligence and Its Role in Combating Covid-19

Data Governance as an Emergency Service

What’s the Current State of Data Governance and Automation?

Migrate workloads from AWS Data Pipeline

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Top 10 Metadata Management Influencers, Sites, and Blogs You Must Follow in 2021

The Value of Catalog-Led Data Governance

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

How to Do Data Modeling the Right Way

RDF-Star: Metadata Complexity Simplified

Multicloud data lake analytics with Amazon Athena

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

Amazon EMR 7.1 runtime for Apache Spark and Iceberg can run Spark workloads 2.7 times faster than Apache Spark 3.5.1 and Iceberg 1.5.2

What Is a Data Catalog?

How a data fabric overcomes data sprawls to reduce time to insights

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

Operational Database Security – Part 2

Implement column-level encryption to protect sensitive data in Amazon Redshift with AWS Glue and AWS Lambda user-defined functions

Data Governance Program: Ensuring a Successful Delivery

Data Catalog First, Master Data Management Second: Here’s Why

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

Projects in SQL Stream Builder

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Quality Solutions

Multi-Channel Attribution: Definitions, Models and a Reality Check

Alcion supports their multi-tenant platform with Amazon OpenSearch Serverless

Data fabric marketplace: The heart of data economy

Data Mesh vs. Data Fabric: A Love Story

Migrate to CDP Private Cloud Base – A Step by Step Guide

Data Catalogs: A Category of Their Own

Google Analytics Tutorial: 8 Valuable Tips To Hustle With Data!

What Is a Data Fabric and How Does a Data Catalog Support It?

Stay Connected