Blog - Data Leaders Brief

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon EMR provides a big data environment for data processing, interactive analysis, and machine learning using open source frameworks such as Apache Spark, Apache Hive, and Presto. Although LLMs can generate syntactically correct SQL queries, they still need the table metadata for writing accurate SQL query.

Metadata

Metadata Data Lake Modeling Data Warehouse

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

In this context, the adoption of data lakes and the data mesh framework emerges as a powerful approach. This service supports consolidated billing and subscription management, offering you the flexibility to explore 1,000 free datasets and samples.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale. for Apache Spark.

Snapshot

Snapshot Metadata Data Lake Optimization

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Top 7 Data Governance and Metadata Management Blog Posts of 2019

erwin

DECEMBER 19, 2019

To help you prepare for 2020, we’ve compiled some of the most popular data governance and metadata management blog posts from the erwin Experts from this year. The Best Data Governance and Metadata Management Blog Posts of 2019. Four Use Cases Proving the Benefits of Metadata-Driven Automation.

Metadata

Metadata Data Governance Management Digital Transformation

Data Governance and Metadata Management: You Can’t Have One Without the Other

erwin

FEBRUARY 13, 2020

When an organization’s data governance and metadata management programs work in harmony, then everything is easier. Creating and sustaining an enterprise-wide view of and easy access to underlying metadata is also a tall order. Metadata Management Takes Time. Finding metadata, “the data about the data,” isn’t easy.

Metadata

Metadata Data Governance Management Cost-Benefit

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

With all these diverse metadata sources, it is difficult to understand the complicated web they form much less get a simple visual flow of data lineage and impact analysis. The metadata-driven suite automatically finds, models, ingests, catalogs and governs cloud data assets. But let’s be honest – no one likes to move.

Data Governance

Data Governance Metadata Testing Data Lake

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

Monitoring and tracking issues in the data management lifecycle are essential for achieving operational excellence in data lakes. This is where Apache Iceberg comes into play, offering a new approach to data lake management. You will learn about an open-source solution that can collect important metrics from the Iceberg metadata layer.

Metadata

Metadata Snapshot Data Lake Metrics

The Role Of Data Warehousing In Your Business Intelligence Architecture

datapine

MAY 29, 2019

In this post, we will explain the definition, connection, and differences between data warehousing and business intelligence , provide a BI architecture diagram that will visually explain the correlation of these terms, and the framework on which they operate. BI Architecture Framework In Modern Business. Learn right here!

Business Intelligence

Business Intelligence Data Warehouse Dashboards Visualization

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

Enterprises are trying to manage data chaos. Then there’s unstructured data with no contextual framework to govern data flows across the enterprise not to mention time-consuming manual data preparation and limited views of data lineage. They might have 300 applications, with 50 different databases and a different schema for each one.

Data Governance

Data Governance Modeling Metadata Unstructured Data

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

However, more than 50 percent say they have deployed metadata management, data analytics, and data quality solutions. erwin Named a Leader in Gartner 2019 Metadata Management Magic Quadrant. Top Five: Benefits of An Automation Framework for Data Governance. Stop Wasting Your Time. appeared first on erwin, Inc.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. Analyze metadata – Understand how data relates to the business and what attributes it has.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. Second, the data connectivity experience is inconsistent across different services. This approach simplifies your data journey and helps you meet your security requirements.

Visualization

Visualization Data Processing Testing Publishing

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. We use AWS Glue , a fully managed, serverless, ETL (extract, transform, and load) service, and the Google BigQuery Connector for AWS Glue (for more information, refer to Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors ).

Metadata

Metadata Data Warehouse Big Data Analytics

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. and later supports the Apache Iceberg framework for data lakes. The Iceberg catalog stores the metadata pointer to the current table metadata file.

Data Lake

Data Lake Data Processing Metadata Snapshot

Four Use Cases Proving the Benefits of Metadata-Driven Automation

erwin

FEBRUARY 7, 2019

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down. Metadata-Driven Automation in the BFSI Industry.

Metadata

Metadata Insurance Data-driven Cost-Benefit

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? However, with all good things comes many challenges and businesses often struggle with managing their information in the correct way. Enters data quality management. What Is Data Quality Management (DQM)? Why Do You Need Data Quality Management? Table of Contents.

Data Quality

Data Quality Metrics Data-driven Management

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. Metadata Caching. As with most caching systems, two common problems eventually arise: keeping the cache data up to date, and managing the size of the cache. See the performance results below for an example of how metadata caching helps reduce latency.

Optimization

Optimization Metadata Statistics Cost-Benefit

Automation Gives DevOps More Horsepower

erwin

MARCH 12, 2020

Almost 70 percent of CEOs say they expect their companies to change their business models in the next three years, and 62 percent report they have management initiatives or transformation programs underway to make their businesses more digital, according to Gartner. Just like with cars, more horsepower in DevOps translates to greater speed.

Metadata

Metadata Digital Transformation Data-driven Enterprise

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. What Is Metadata? Analyst firm Gartner defines metadata as “information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset.”.

Metadata

Metadata Management Data Quality Cost-Benefit

Automating Data Governance

erwin

OCTOBER 29, 2020

We’re on a mission to automate all the tasks data stewards typically perform so they spend less time building and populating the data governance framework and more time using the framework to realize value and ROI. Automation also ensures that the data governance framework is always up to date and never stale.

Data Governance

Data Governance Metadata Digital Transformation ROI

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin

JULY 12, 2019

In light of recent, high-profile data breaches, it’s past-time we re-examined strategic data governance and its role in managing regulatory requirements. equivalent of GDPR] will not become effective until 2020, we believe that new developments in GDPR enforcement may influence the regulatory framework of the still fluid CCPA.”.

Data Governance

Data Governance Management Metadata Risk Management

erwin Positioned as a Leader in Gartner’s 2020 Magic Quadrant for Metadata Management Solutions for Second Year in a Row

erwin

NOVEMBER 19, 2020

erwin has once again been positioned as a Leader in the Gartner “2020 Magic Quadrant for Metadata Management Solutions.”. The post erwin Positioned as a Leader in Gartner’s 2020 Magic Quadrant for Metadata Management Solutions for Second Year in a Row appeared first on erwin, Inc.

Metadata

Metadata Management Digital Transformation Data Governance

Enterprise Architecture Tools and the Changing Role of the Enterprise Architect

erwin

NOVEMBER 28, 2019

Related content: 2019 Gartner Magic Quadrant for Metadata Management Solutions. In an enterprise architecture team, each team member often will have some role-specific knowledge and then take the lead in managing that particular area. But changes to the way EA is applied require enterprise architects to change also.

Enterprise

Enterprise Data-driven Data Governance Metadata

Securing Confidential and Protected Data Today. Exploring VMware’s VCF Sovereign Cloud Solution (v2).

CIO Business Intelligence

JULY 10, 2024

It has never been a more important time to make sure that data and metadata remain protected, resident within local jurisdiction, compliant, under local control, and accessible yet portable. This framework is crafted to address the market-driven needs in data security, legislative compliance, and operational efficiency.

Metadata

Metadata Data-driven Marketing Measurement

There’s More to erwin Data Governance Automation Than Meets the AI

erwin

NOVEMBER 6, 2020

The clear benefit is that data stewards spend less time building and populating the data governance framework and more time realizing value and ROI from it. . For data governance, automation ensures the framework is always accurate and up to date; otherwise the data governance initiative itself falls apart.

Data Governance

Data Governance Metadata Data-driven Visualization

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

Use case overview AnyCompany Travel and Hospitality wanted to build a data processing framework to seamlessly ingest and process data coming from operational databases (used by reservation and booking systems) in a data lake before applying machine learning (ML) techniques to provide a personalized experience to its users.

Data Lake

Data Lake Data Processing Metadata Snapshot

Data Governance 2.0: The CIO’s Guide to Collaborative Data Governance

erwin

DECEMBER 6, 2019

A lack of resources, difficulties in proving the business case, and challenges in getting senior management to see the importance of such an effort rank among the biggest obstacles facing DG initiatives, according to a recent survey by UBM. As a foundational component of enterprise data management, DG would reside in such a group.

Data Governance

Data Governance Metadata Enterprise Data-driven

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

erwin

MAY 31, 2019

The typical notion is that enterprise architects and data (and metadata) architects are in opposite corners. Therefore, most frameworks fail to address the distance. So we created a set of methods, frameworks and reference architectures that address all these different disciplines, strata and domains.

Data-driven

Data-driven Enterprise Metadata Strategy

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. In the Enterprise Data Management realm, such a data domain is called an Authoritative Data Domain (ADD). Introduction.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

AWS Big Data

MAY 28, 2024

Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). You can enable monitoring of launched Flink jobs while using EMR on EKS with Apache Flink.

Data Processing

Data Processing Cost-Benefit Metadata Optimization

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Companies such as Adobe , Expedia , LinkedIn , Tencent , and Netflix have published blogs about their Apache Iceberg adoption for processing their large scale analytics datasets. . In CDP we enable Iceberg tables side-by-side with the Hive table types, both of which are part of our SDX metadata and security framework.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

This is a graph of millions of edges and vertices – in enterprise data management terms it is a giant piece of master/reference data. Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. open-world vs. closed-world assumptions).

Metadata

Metadata Cost-Benefit OLAP Modeling

Using Enterprise Architecture, Data Modeling & Data Governance for Rapid Crisis Response

erwin

MARCH 17, 2020

Teams need to urgently respond to everything from massive changes in workforce access and management to what-if planning for a variety of grim scenarios, in addition to building and documenting new applications and providing fast, accurate access to data for smart decision-making.

Data Governance

Data Governance Enterprise Modeling Metadata

Choose Compliance, Choose Hybrid Cloud

Cloudera

MAY 2, 2022

But increasingly at Cloudera, our clients are looking for a hybrid cloud architecture in order to manage compliance requirements. This is not just to implement specific governance rules — such as tagging, metadata management, access controls, or anonymization — but to prepare for the potential for rules to change in the future. .

Data Architecture

Data Architecture Metadata Digital Transformation Finance

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Cloudinary is a cloud-based media management platform that provides a comprehensive set of tools and services for managing, optimizing, and delivering images, videos, and other media assets on websites and mobile applications. This concept makes Iceberg extremely versatile. Here is where it can get complicated.

Data Lake

Data Lake Metadata Snapshot Analytics

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. These logs can track activity, such as data access patterns, lifecycle and management activity, and security events. With exponential growth in data volume, centralized monitoring becomes challenging.

Metadata

Metadata Dashboards Metrics Visualization

How to stay ahead of ever-evolving data privacy regulations

IBM Big Data Hub

SEPTEMBER 12, 2022

The journey starts with having a multimodal data governance framework that is underpinned by a robust data architecture like data fabric. This framework can create a standard approach for meeting regulatory compliance while allowing for customization to address local regulations and being proactive when handling new regulations.

Metadata

Metadata Data Governance Enterprise Data Architecture

The importance of governance: What we’re learning from AI advances in 2022

IBM Big Data Hub

DECEMBER 16, 2022

At the same time, governments around the world are continuously evaluating and implementing new AI guidelines and AI regulation frameworks. The post The importance of governance: What we’re learning from AI advances in 2022 appeared first on Journey to AI Blog. Advances across AI technology are happening quickly.

Uncertainty

Uncertainty Metadata Modeling Data Collection

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

In this blog post, we share what we heard from our customers that led us to create Amazon DataZone and discuss specific customer use cases and quotes from customers who tried Amazon DataZone during our public preview. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

What Is Data Literacy?

erwin

AUGUST 6, 2020

Collaborate more effectively with their partners in data (management and governance) for greater efficiency and higher quality outcomes. Data Context & Enrichment: Put data in business context and enable stakeholders to share best practices and build communities by tagging/commenting on data assets, enriching the metadata.

Contextual Data

Contextual Data Data Governance Metadata Digital Transformation

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

This has been a major architectural enhancement on how Apache Ozone manages data at scale in a data lake. . Ozone provides an easy to use monitoring and management console using recon. Collects and aggregates metadata from components and present cluster state. Metadata in cluster is disjoint across components.

Data Lake

Data Lake Cost-Benefit Metadata Big Data

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. ZS is a management consulting and technology firm focused on transforming global healthcare. Evidence generation is rife with knowledge management challenges.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

Recognizing Organizations Leading the Way in Data Security & Governance

Cloudera

DECEMBER 20, 2021

Another option — a more rewarding one — is to include centralized data management, security, and governance into data projects from the start. In the past year, the Bank of the West has begun using the Cloudera platform to establish a data governance and security framework to manage and protect its customers’ sensitive information.

Metadata

Metadata Data-driven Cost-Benefit Digital Transformation

AI Governance: Break open the black box

IBM Big Data Hub

OCTOBER 4, 2022

According to Gartner 54% of models are stuck in pre-production because there is not an automated process to manage these pipelines and there is a need to ensure the AI models can be trusted.” Challenges around managing risk. This includes capturing of the metadata, tracking provenance and documenting the model lifecycle.

Metadata

Metadata Risk Management Risk Experimentation

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Webinars

Trending Sources

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Webinars

Top 7 Data Governance and Metadata Management Blog Posts of 2019

Data Governance and Metadata Management: You Can’t Have One Without the Other

Doing Cloud Migration and Data Governance Right the First Time

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

The Role Of Data Warehousing In Your Business Intelligence Architecture

5 Ways Data Modeling Is Critical to Data Governance

What’s the Current State of Data Governance and Automation?

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

Use Apache Iceberg in a data lake to support incremental data processing

Four Use Cases Proving the Benefits of Metadata-Driven Automation

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Automation Gives DevOps More Horsepower

7 Benefits of Metadata Management

Automating Data Governance

Using Strategic Data Governance to Manage GDPR/CCPA Complexity

erwin Positioned as a Leader in Gartner’s 2020 Magic Quadrant for Metadata Management Solutions for Second Year in a Row

Enterprise Architecture Tools and the Changing Role of the Enterprise Architect

Securing Confidential and Protected Data Today. Exploring VMware’s VCF Sovereign Cloud Solution (v2).

There’s More to erwin Data Governance Automation Than Meets the AI

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Data Governance 2.0: The CIO’s Guide to Collaborative Data Governance

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

Introducing Apache Iceberg in Cloudera Data Platform

RDF-Star: Metadata Complexity Simplified

Using Enterprise Architecture, Data Modeling & Data Governance for Rapid Crisis Response

Choose Compliance, Choose Hybrid Cloud

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

How to stay ahead of ever-evolving data privacy regulations

The importance of governance: What we’re learning from AI advances in 2022

Unlock data across organizational boundaries using Amazon DataZone – now generally available

What Is Data Literacy?

Apache Ozone and Dense Data Nodes

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

Recognizing Organizations Leading the Way in Data Security & Governance

AI Governance: Break open the black box

Stay Connected