Data Governance, Data Lake and Reference

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Over the years, this customer-centric approach has led to the introduction of groundbreaking features such as zero-ETL , data sharing , streaming ingestion , data lake integration , Amazon Redshift ML , Amazon Q generative SQL , and transactional data lake capabilities.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

One-time and complex queries are two common scenarios in enterprise data analytics. Complex queries, on the other hand, refer to large-scale data processing and in-depth analysis based on petabyte-level data warehouses in massive data scenarios. file, enter the preprocessing code for the raw lineage data.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Thank You Snowflake for Naming Alation the Data Governance Partner of the Year

Alation

JUNE 17, 2021

Leading companies like Cisco, Nielsen, and Finnair turn to Alation + Snowflake for data governance and analytics. By joining forces, we can build more potent, tailored solutions that leverage data governance as a competitive asset. Lastly, active data governance simplifies stewardship tasks of all kinds.

Data Governance

Data Governance Data Lake Insurance Enterprise

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

AWS Big Data

SEPTEMBER 26, 2024

In this blog post, there are three personas: Data Lake Administrator (with admin level access) User Silver from the Data Engineering group User Lead Auditor from the Auditor group. You will see how different personas in an organization can access the data without the need to modify their existing enterprise entitlements.

Data Governance

Data Governance Enterprise Management Data Lake

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Data Virtualization Brings Data Together Quickly and Easily

David Menninger's Analyst Perspectives

OCTOBER 7, 2021

In this post, I don’t want to debate the meanings and origins of different terms; rather, I’d like to highlight a technology weapon that you should have in your data management arsenal. We currently refer to this technology as data virtualization.

Technology

Technology Management Data Lake IT

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Although the terms data fabric and data mesh are often used interchangeably, I previously explained that they are distinct but complementary. Denodo remains a specialist data management software provider and in September 2023 announced that it had received a $336 million investment from asset management firm TPG.

Management

Management Data-driven Data Governance Data Lake

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. To learn more about DataZone, refer to the User Guide. Crawlers, salut!

Data Lake

Data Lake Metadata Data Governance Statistics

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

New feature: Custom AWS service blueprints Previously, Amazon DataZone provided default blueprints that created AWS resources required for data lake, data warehouse, and machine learning use cases. If you’re new to Amazon DataZone, refer to Getting started.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

FEBRUARY 20, 2020

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. With an on-premise deployment, enterprises have full control over data security, data access, and data governance. Data that needs to be tightly controlled (e.g. The Problem with Hybrid Cloud Environments.

Data Lake

Data Lake ROI Metadata Cost-Benefit

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

AWS Big Data

FEBRUARY 13, 2025

Refer to the appendix at the end of this post for more details. To organize the data assets within the organization, the admin logs in to the SageMaker Unified Studio URL and creates domain units aligned with the business divisions. Refer to the appendix at the end of this post for more details.

Data Analytics

Data Analytics Analytics Modeling Management

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Solutions data architect: These individuals design and implement data solutions for specific business needs, including data warehouses, data marts, and data lakes. Application data architect: The application data architect designs and implements data models for specific software applications.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

The solution is data intelligence. It improves IT and business data literacy and knowledge, supporting enterprise data governance and business enablement. Organizations need a real-time, accurate picture of the metadata landscape to: Discover data – Identify and interrogate metadata from various data management silos.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

I spent much time de-categorizing DataOps: we are not discussing ETL, Data Lake, or Data Science. Today we have had over 20,000 signatures , millions of page views, and copycat clones, and it is frequently used as a reference guide. It’s Customer Journey for data analytic systems.

Testing

Testing Dashboards Data Lake Data Science

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Data Warehouse Consulting

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. In the navigation pane, under Data catalog , choose Settings.

Data Lake

Data Lake Data-driven Management Data Architecture

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. A data hub is a center of data exchange that constitutes a hub of data repositories and is supported by data engineering, data governance, security, and monitoring services.

Analytics

Analytics Data Warehouse Data Lake Metadata

5 Ways Data Engineers Can Support Data Governance

Alation

JANUARY 26, 2023

These data requirements could be satisfied with a strong data governance strategy. Governance can — and should — be the responsibility of every data user, though how that’s achieved will depend on the role within the organization. How can data engineers address these challenges directly?

Data Governance

Data Governance Strategy Data Quality Data Collection

News and Announcements from Tableau and TC18

David Menninger's Analyst Perspectives

NOVEMBER 21, 2018

Once again I attended Tableau's Users Conference, along with 17,000 other attendees, affectionately self-referred to as "data nerds".

Data Lake

Data Lake Data Integration Data Governance Software

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

Which type(s) of storage consolidation you use depends on the data you generate and collect. . One option is a data lake—on-premises or in the cloud—that stores unprocessed data in any type of format, structured or unstructured, and can be queried in aggregate. Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

The first post of this series describes the overall architecture and how Novo Nordisk built a decentralized data mesh architecture, including Amazon Athena as the data query engine. The third post will show how end-users can consume data from their tool of choice, without compromising data governance.

Data Governance

Data Governance Management Data-driven Analytics

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

The solution uses AWS services such as AWS HealthLake , Amazon Redshift , Amazon Kinesis Data Streams , and AWS Lake Formation to build a 360 view of patients. Simultaneously and as part of AWS HealthLake managed service, the nested JSON FHIR data undergoes an ETL process and is stored in Apache Iceberg open table format in Amazon S3.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. For more details, refer to Creating Apache Iceberg tables. If this is the first time accessing the Lake Formation console, add yourself as the data lake administrator.

Data Lake

Data Lake Metadata Snapshot Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

AWS Big Data

JULY 28, 2023

Amazon Redshift now makes it easier for you to run queries in AWS data lakes by automatically mounting the AWS Glue Data Catalog. You no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog.

Data Lake

Data Lake Data Governance Data Warehouse Modeling

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. This blog focuses on governing spreadsheets that contain data, information, and metadata, and must themselves be governed.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

In this solution (as shown in the preceding figure), the AWS account that contains the data assets is referred to as the producer account. The AWS account that needs to access or use the data from the producer account is referred to as the consumer account. You will then publish the data assets from these data sources.

Analytics

Analytics Publishing Metadata Sales

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Agile Modeling : Agile methodologies introduced flexibility, allowing data models to evolve alongside changing business needs. Data Governance : Incorporating data governance principles enabled better data management, ensuring data accuracy, consistency, and compliance.

Data-driven

Data-driven Modeling Enterprise Structured Data

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Refer to Enabling AWS PrivateLink in the Snowflake documentation to verify the steps, required access level, and service level to set the configurations. For Data sources , search for and select Snowflake. To obtain the Snowflake PrivateLink account URL, refer to parameters obtained in the prerequisites. Choose Next.

Analytics

Analytics Data-driven Data Integration Data Lake

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

AWS Big Data

JULY 8, 2024

Introduction to OpenLineage compatible data lineage The need to capture data lineage consistently across various analytical services and combine them into a unified object model is key in uncovering insights from the lineage artifact. To learn more, refer to Creating inventory and published data in Amazon DataZone.

Visualization

Visualization Metadata Publishing Sales

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

AWS Lake Formation helps with enterprise data governance and is important for a data mesh architecture. It works with the AWS Glue Data Catalog to enforce data access and governance. This solution only replicates metadata in the Data Catalog, not the actual underlying data.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities.

Insurance

Insurance Data Lake Data-driven Management

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

In turn, they both must also have the data literacy skills to be able to verify the data’s accuracy, ensure its security, and provide or follow guidance on when and how it should be used. Then, it applies these insights to automate and orchestrate the data lifecycle.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Webinars

Trending Sources

Data governance in the age of generative AI

Webinars

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Choosing an open table format for your transactional data lake on AWS

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Thank You Snowflake for Naming Alation the Data Governance Partner of the Year

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Data Virtualization Brings Data Together Quickly and Easily

AWS Lake Formation 2022 year in review

Denodo Provides a Logical Approach to Data Management

AWS Lake Formation 2023 year in review

Amazon DataZone announces custom blueprints for AWS services

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Foundational blocks of Amazon SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

What is a data architect? Skills, salaries, and how to become a data framework master

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Why the Data Journey Manifesto?

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Glue crawlers support cross-account crawling to support data mesh architecture

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

5 Ways Data Engineers Can Support Data Governance

News and Announcements from Tableau and TC18

Your 5-Step Journey from Analytics to AI

How Novo Nordisk built distributed data governance and control at scale

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Governing data in relational databases using Amazon DataZone

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

What is an open data lakehouse and why you should care?

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

Data democratization: How data architecture can drive business decisions and AI initiatives

Stay Connected