Blog - Data Leaders Brief

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

To achieve this, they aimed to break down data silos and centralize data from various business units and countries into the BMW Cloud Data Hub (CDH). However, the initial version of CDH supported only coarse-grained access control to entire data assets, and hence it was not possible to scope access to data asset subsets.

Data Lake

Data Lake Sales Metadata Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

Customers of all sizes and industries use Amazon Simple Storage Service (Amazon S3) to store data globally for a variety of use cases. Customers want to know how their data is being accessed, when it is being accessed, and who is accessing it. With exponential growth in data volume, centralized monitoring becomes challenging.

Metadata

Metadata Dashboards Metrics Visualization

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Today, we’re making available a new capability of AWS Glue Data Catalog that allows generating column-level statistics for AWS Glue tables. Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams.

Statistics

Statistics Data Lake Optimization Data-driven

The Role of the Data Catalog in Data Security

Alation

JUNE 14, 2021

The Role of Catalog in Data Security. Recently, I dug in with CIOs on the topic of data security. Recently, I dug in with CIOs on the topic of data security. What came as no surprise was the importance CIOs place on taking a broader approach to data protection. The Role of the CISO in Data Governance and Security.

Data Governance

Data Governance Recreation/Entertainment Data Lake Metadata

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? It turns out that Apache Impala scales down with data just as well as it scales up. The entire collection is available here.

Optimization

Optimization Metadata Statistics Cost-Benefit

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

The company uses AWS Cloud services to build data-driven products and scale engineering best practices. The company uses AWS Cloud services to build data-driven products and scale engineering best practices. Acast found itself with diverse business units and a vast amount of data generated across the organization.

Data-driven

Data-driven Advertising Metadata Data Architecture

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

erwin

MAY 31, 2019

It’s time to consider data-driven enterprise architecture. The traditional approach to enterprise architecture – the analysis, design, planning and implementation of IT capabilities for the successful execution of enterprise strategy – seems to be missing something … data. That’s right. Strategic Building Blocks.

Data-driven

Data-driven Enterprise Metadata Strategy

Federating access to Amazon DataZone with AWS IAM Identity Center and Okta

AWS Big Data

JULY 30, 2024

To help develop a data-driven culture, everyone inside an organization can use Amazon DataZone. To help develop a data-driven culture, everyone inside an organization can use Amazon DataZone. This post guides you through the process of setting up Okta as an identity provider for signing in users to Amazon DataZone.

Metadata

Metadata Dashboards Data-driven Management

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

This blog post is co-written with Raj Samineni from ATPCO. In today’s data-driven world, companies across industries recognize the immense value of data in making decisions, driving innovation, and building new products to serve their customers.

Data Lake

Data Lake Metadata Sales Publishing

Enhance data governance with enforced metadata rules in Amazon DataZone

AWS Big Data

NOVEMBER 20, 2024

With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. By making it mandatory for data consumers to provide specific metadata, domain owners can achieve compliance, meet organizational standards, and support audit and reporting needs.

Metadata

Metadata Data Governance Metrics Marketing

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

AWS Big Data

MAY 28, 2024

Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). AWS recently announced that Apache Flink is generally available for Amazon EMR on Amazon Elastic Kubernetes Service (EKS).

Data Processing

Data Processing Cost-Benefit Metadata Optimization

Migrate workloads from AWS Data Pipeline

AWS Big Data

JULY 25, 2024

AWS Data Pipeline helps customers automate the movement and transformation of data. With Data Pipeline, customers can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. Some customers want a deeper level of control and specificity than possible using Data Pipeline.

Visualization

Visualization Management Data Integration Testing

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives. Quite simply, metadata is data about data.

Metadata

Metadata Management Data Quality Cost-Benefit

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. It was first opened to investors in 1995. CFM assets under management are now $13 billion.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

We also delve into details on how to configure data sources and subscription targets for a project using a custom AWS service blueprint. New feature: Custom AWS service blueprints Previously, Amazon DataZone provided default blueprints that created AWS resources required for data lake, data warehouse, and machine learning use cases.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

How to Design an Analytics Stack that Humans Actually Use

Alation

AUGUST 9, 2021

In her current role as VP of UX, Design & Research at Sigma Computing, she deploys human-centric design to support data democratization and analysis. Less than 40 percent of Fortune 1000 companies are managing data as an asset and only 24 percent of executives consider their organization to be data-driven.

Analytics

Analytics Data-driven Data Analytics Contextual Data

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Multicloud data lake analytics with Amazon Athena

AWS Big Data

MARCH 18, 2024

Many organizations operate data lakes spanning multiple cloud data stores. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics processes. To achieve this, Oktank envisions a unified data query layer using Athena.

Data Lake

Data Lake Analytics Cost-Benefit Management

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. When the catalog property s3.delete-enabled With the s3.delete.tags This property is set to true by default.

Data Lake

Data Lake Snapshot Metadata Optimization

AWS Clean Rooms proof of concept scoping part 1: media measurement

AWS Big Data

DECEMBER 7, 2023

Companies are increasingly seeking ways to complement their data with external business partners’ data to build, maintain, and enrich their holistic view of their business at the consumer level. For the purpose of this blog, we will be focusing only on SQL queries.

Measurement

Measurement Advertising Testing Data-driven

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. We use leading-edge analytics, data, and science to help clients make intelligent decisions.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. Cargotec’s use cases also required them to create views that span tables and views across catalogs.

Metadata

Metadata Data Lake Machine Learning Big Data

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

AWS Big Data

FEBRUARY 21, 2024

AWS Glue is a serverless data integration service that makes it straightforward to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Furthermore, each node (driver or worker) in an AWS Glue job requires an IP address assigned from the subnet.

Strategy

Strategy Visualization Management IT

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. a new version of AWS Glue that accelerates data integration workloads in AWS.

Data Lake

Data Lake Visualization Dashboards Insurance

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

AWS Big Data

SEPTEMBER 3, 2024

Amazon Redshift and Tableau empower data analysis. Amazon Redshift is a cloud data warehouse that processes complex queries at scale and with speed. Tableau’s extensive capabilities and enterprise connectivity help analysts efficiently prepare, explore, and share data insights company-wide.

Publishing

Publishing Reporting Data Warehouse Management

Use the AWS CDK with the Data Solutions Framework to provision and manage Amazon Redshift Serverless

AWS Big Data

SEPTEMBER 4, 2024

In February 2024, we announced the release of the Data Solutions Framework (DSF) , an opinionated open source framework for building data solutions on AWS. In this post, we demonstrate how to use the AWS CDK and DSF to create a multi-data warehouse platform based on Amazon Redshift Serverless.

Management

Management Data Warehouse Data Lake Testing

Cloud Data Warehouse Migration 101: Expert Tips

Alation

JULY 28, 2022

There was a time when most CIOs would never consider putting their crown jewels — AKA customer data and associated analytics — into the cloud. And what must organizations overcome to succeed at cloud data warehousing ? What Are the Biggest Drivers of Cloud Data Warehousing? The cloud is no longer synonymous with risk.

Data Warehouse

Data Warehouse Cost-Benefit Data-driven Data Governance

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Cloud Analytics Powered by FinOps

Cloudera

OCTOBER 30, 2023

The legacy IT infrastructure to run the business operations — mainly data centers — has a deadline to shift to cloud-based services. The public cloud is increasingly becoming the preferred platform to host data analytics – related projects, such as business intelligence, machine learning (ML), and AI applications. Why FinOps?

Analytics

Analytics Cost-Benefit ROI Business Objectives

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lake

Data Lake Advertising Data Architecture Insurance

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

On Thursday January 6th I hosted Gartner’s 2022 Leadership Vision for Data and Analytics webinar. – In the webinar and Leadership Vision deck for Data and Analytics we called out AI engineering as a big trend. I would take a look at our Top Trends for Data and Analytics 2021 for additional AI, ML and related trends.

Analytics

Analytics Measurement Data-driven Modeling

IBM named a leader in The Forrester Wave™: Enterprise Data Fabric, Q2 2022

IBM Big Data Hub

JUNE 23, 2022

The opportunity to work with many clients on their data fabric journey continues to drive and inspire us to achieve even greater heights with our solutions. Key attributes of IBM’s approach to data fabric . Let’s take a look at IBM’s take on some of the specific strengths recognized in Forrester’s Wave below. . Providing the semantic.

Enterprise

Enterprise Metadata Data Governance Machine Learning

Accelerate hybrid cloud transformation through IBM Cloud for Financial Service Validation Program

IBM Big Data Hub

APRIL 4, 2024

Lots of innovation is happening, with new technologies emerging in areas such as data and AI, payments, cybersecurity and risk management, to name a few. Lots of innovation is happening, with new technologies emerging in areas such as data and AI, payments, cybersecurity and risk management, to name a few.

Cost-Benefit

Cost-Benefit Risk Management Risk Digital Transformation

Accelerate your analytics with Amazon S3 Tables and Amazon SageMaker Lakehouse

AWS Big Data

APRIL 17, 2025

Amazon SageMaker Lakehouse is a unified, open, and secure data lakehouse that now seamlessly integrates with Amazon S3 Tables , the first cloud object store with built-in Apache Iceberg support. You can then query, analyze, and join the data using Redshift, Amazon Athena , Amazon EMR , and AWS Glue.

Analytics

Analytics Data Lake Data Warehouse Sales

Accelerating Your Statewide Data Strategy with Alation

Alation

MARCH 29, 2023

They help our customers architect their modern data stack , tying in Snowflake or Fivetran where they’re needed, for example. They’re also instrumental in connecting us to the key decision-makers that need a data intelligence platform. Please tell us about your background. They can trust us.

Data Strategy

Data Strategy Strategy Data Governance Sales

Catalog and govern Amazon Athena federated queries with Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Yesterday, we announced Amazon SageMaker Unified Studio (Preview), an integrated experience for all your data and AI and Amazon SageMaker Lakehouse to unify data – from Amazon Simple Storage Service (S3) to third-party sources such as Snowflake. First, end-users often have to set up connections to data sources on their own.

Data-driven

Data-driven Data Warehouse Data Governance Big Data

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

AWS Big Data

DECEMBER 11, 2024

Organizations are building data-driven applications to guide business decisions, improve agility, and drive innovation. Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services.

Data Lake

Data Lake Data Warehouse Data-driven Big Data

How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture

AWS Big Data

JANUARY 28, 2025

In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. We also implemented a layered approach that included collection, preparation, and enrichment making it straightforward to identify areas that affect data accuracy.

Cost-Benefit

Cost-Benefit Data-driven Data Lake Optimization

Enhancing Adobe Marketo Engage Data Analysis with AWS Glue Integration

AWS Big Data

MARCH 11, 2025

To further enhance their B2B marketing capabilities, organizations are now looking to fully use their marketing data for more informed decision-making and strategy optimization. The agile, serverless nature of AWS Glue meets a range of data analytics needs while reducing costs.

B2B

B2B Data-driven Sales Marketing

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

AWS Big Data

OCTOBER 31, 2024

This is a joint blog post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa. Volkswagen Autoeuropa aims to become a data-driven factory and has been using cutting-edge technologies to enhance digitalization efforts. The lead time to access data was often from several days to weeks.

Digital Transformation

Digital Transformation Metadata Data Quality Manufacturing

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Trending Sources

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Webinars

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Enhance query performance using AWS Glue Data Catalog column-level statistics

The Role of the Data Catalog in Data Security

Introducing Apache Hudi support with AWS Glue crawlers

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Design a data mesh on AWS that reflects the envisioned organization

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

Federating access to Amazon DataZone with AWS IAM Identity Center and Okta

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Enhance data governance with enforced metadata rules in Amazon DataZone

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

Migrate workloads from AWS Data Pipeline

7 Benefits of Metadata Management

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Amazon DataZone announces custom blueprints for AWS services

How to Design an Analytics Stack that Humans Actually Use

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Multicloud data lake analytics with Amazon Athena

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Clean Rooms proof of concept scoping part 1: media measurement

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

How Cargotec uses metadata replication to enable cross-account data sharing

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

Use the AWS CDK with the Data Solutions Framework to provision and manage Amazon Redshift Serverless

Cloud Data Warehouse Migration 101: Expert Tips

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Cloud Analytics Powered by FinOps

3 Major Trends at Strata New York 2017

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

IBM named a leader in The Forrester Wave™: Enterprise Data Fabric, Q2 2022

Accelerate hybrid cloud transformation through IBM Cloud for Financial Service Validation Program

Accelerate your analytics with Amazon S3 Tables and Amazon SageMaker Lakehouse

Accelerating Your Statewide Data Strategy with Alation

Catalog and govern Amazon Athena federated queries with Amazon SageMaker Lakehouse

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture

Enhancing Adobe Marketo Engage Data Analysis with AWS Glue Integration

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

Stay Connected