Blog, Data Integration and Metadata

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. We take care of the ETL for you by automating the creation and management of data replication. Glue ETL offers customer-managed data ingestion.

Data Integration

Data Integration Data Lake Statistics Data-driven

7 Benefits of Metadata Management

erwin

FEBRUARY 19, 2021

Metadata management is key to wringing all the value possible from data assets. However, most organizations don’t use all the data at their disposal to reach deeper conclusions about how to drive revenue, achieve regulatory compliance or accomplish other strategic objectives. What Is Metadata? Harvest data.

Metadata

Metadata Management Data Quality Cost-Benefit

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

It addresses many of the shortcomings of traditional data lakes by providing features such as ACID transactions, schema evolution, row-level updates and deletes, and time travel. In this blog post, we’ll discuss how the metadata layer of Apache Iceberg can be used to make data lakes more efficient.

Metadata

Metadata Snapshot Data Lake Metrics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

RDF-Star: Metadata Complexity Simplified

Ontotext

JUNE 10, 2021

Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter. To be able to automate these operations and maintain sufficient data quality, enterprises have started implementing the so-called data fabrics , that employ diverse metadata sourced from different systems. Such examples are provenance (e.g.

Metadata

Metadata Cost-Benefit OLAP Modeling

Four Use Cases Proving the Benefits of Metadata-Driven Automation

erwin

FEBRUARY 7, 2019

Organization’s cannot hope to make the most out of a data-driven strategy, without at least some degree of metadata-driven automation. The volume and variety of data has snowballed, and so has its velocity. As such, traditional – and mostly manual – processes associated with data management and data governance have broken down.

Metadata

Metadata Insurance Data-driven Cost-Benefit

Metadata Management, Data Governance and Automation

erwin

NOVEMBER 6, 2019

In most companies, an incredible amount of data flows from multiple sources in a variety of formats and is constantly being moved and federated across a changing system landscape. And this time, you guessed it – we’re focusing on data automation and how it could impact metadata management and data governance.

Metadata

Metadata Data Governance Management Cost-Benefit

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

If you include the title of this blog, you were just presented with 13 examples of heteronyms in the preceding paragraphs. This is accomplished through tags, annotations, and metadata (TAM). Smart content includes labeled (tagged, annotated) metadata (TAM). What you have just experienced is a plethora of heteronyms.

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

Metadata, the Neglected Stepchild of IT

Data Virtualization

DECEMBER 8, 2022

Reading Time: 3 minutes While cleaning up our archive recently, I found an old article published in 1976 about data dictionary/directory systems (DD/DS). Nowadays, we no longer use the term DD/DS, but “data catalog” or simply “metadata system”. It was written by L.

Metadata

Metadata IT Data Integration Publishing

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated. Industry-leading price-performance: Amazon Redshift launches RA3.large

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Top 10 Data Lineage Podcasts, Blogs, and Magazines

Octopai

JANUARY 31, 2021

We have identified the top ten sites, videos, or podcasts online that deal with data lineage. Our list of Top 10 Data Lineage Podcasts, Blogs, and Websites To Follow in 2021. Data Engineering Podcast. This podcast centers around data management and investigates a different aspect of this field each week.

Data Governance

Data Governance Data Processing Data Quality Metadata

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

The Power of Active Metadata

Data Virtualization

JULY 28, 2023

Reading Time: 2 minutes As the volume, variety, and velocity of data continue to surge, organizations still struggle to gain meaningful insights. This is where active metadata comes in. Listen to “Why is Active Metadata Management Essential?” What is Active Metadata? ” on Spreaker.

Metadata

Metadata Data Integration Management Data Science

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (APIs), file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

Data Governance

Data Governance Metadata Testing Data Lake

5 Ways Data Modeling Is Critical to Data Governance

erwin

JANUARY 9, 2020

That’s because it’s the best way to visualize metadata , and metadata is now the heart of enterprise data management and data governance/ intelligence efforts. So here’s why data modeling is so critical to data governance. erwin Data Modeler: Where the Magic Happens.

Data Governance

Data Governance Modeling Metadata Unstructured Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Let’s briefly describe the capabilities of the AWS services we referred above: AWS Glue is a fully managed, serverless, and scalable extract, transform, and load (ETL) service that simplifies the process of discovering, preparing, and loading data for analytics. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

And if it isnt changing, its likely not being used within our organizations, so why would we use stagnant data to facilitate our use of AI? The key is understanding not IF, but HOW, our data fluctuates, and data observability can help us do just that.

Metadata

Metadata Data Quality Sales Modeling

Data integrity vs. data quality: Is there a difference?

IBM Big Data Hub

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data. In short, yes.

Data Quality

Data Quality Data Integration Metadata Cost-Benefit

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog. An AWS Glue job (metadata exporter) runs daily on the source account.

Metadata

Metadata Data Lake Machine Learning Big Data

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

In-place data upgrade In an in-place data migration strategy, existing datasets are upgraded to Apache Iceberg format without first reprocessing or restating existing data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

erwin

JULY 17, 2019

Part Two of the Digital Transformation Journey … In our last blog on driving digital transformation , we explored how enterprise architecture (EA) and business process (BP) modeling are pivotal factors in a viable digital transformation strategy. Analyze metadata – Understand how data relates to the business and what attributes it has.

Digital Transformation

Digital Transformation Strategy Metadata Data-driven

How Metadata Makes Data Meaningful

erwin

DECEMBER 12, 2019

Metadata is an important part of data governance, and as a result, most nascent data governance programs are rife with project plans for assessing and documenting metadata. But in many scenarios, it seems that the underlying driver of metadata collection projects is that it’s just something you do for data governance.

Metadata

Metadata Data Governance Digital Transformation Data Quality

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

The results of our new research show that organizations are still trying to master data governance, including adjusting their strategies to address changing priorities and overcoming challenges related to data discovery, preparation, quality and traceability. And close to 50 percent have deployed data catalogs and business glossaries.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This unified catalog enables engineers, data scientists, and analysts to securely discover and access approved data and models using semantic search with generative AI-created metadata. Having confidence in your data is key. We’re excited to see what you’ll build next!

Data Analytics

Data Analytics Analytics Data Lake Data Quality

What is a data fabric architecture?

IBM Big Data Hub

MARCH 25, 2022

A data fabric is an architectural approach that enables organizations to simplify data access and data governance across a hybrid multicloud landscape for better 360-degree views of the customer and enhanced MLOps and trustworthy AI. The post What is a data fabric architecture? appeared first on Journey to AI Blog.

Metadata

Metadata Data Quality Data Governance Data Integration

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

It is also crucial to audit granular data access for security and compliance needs. This blog post presents an architecture solution that allows customers to extract key insights from Amazon S3 access logs at scale. Both the user data and logs buckets must be in the same AWS Region and owned by the same account.

Metadata

Metadata Dashboards Metrics Visualization

Elevating Data Integration: A Four-Tier Approach to Effective Data Preparation

Data Virtualization

SEPTEMBER 12, 2024

Reading Time: 2 minutes In today’s data-driven landscape, the integration of raw source data into usable business objects is a pivotal step in ensuring that organizations can make informed decisions and maximize the value of their data assets. To achieve these goals, a well-structured.

Data Integration

Data Integration Business Objectives Data-driven Management

The Role Of Data Warehousing In Your Business Intelligence Architecture

datapine

MAY 29, 2019

Each of that component has its own purpose that we will discuss in more detail while concentrating on data warehousing. A solid BI architecture framework consists of: Collection of data. Data integration. Storage of data. Data analysis. Distribution of data. Data integration.

Business Intelligence

Business Intelligence Data Warehouse Dashboards Visualization

What Is Data Modeling? Data Modeling Best Practices for Data-Driven Organizations

erwin

JANUARY 17, 2020

Data modeling is a process that enables organizations to discover, design, visualize, standardize and deploy high-quality data assets through an intuitive, graphical interface. Data models provide visualization, create additional metadata and standardize data design across the enterprise. SQL or NoSQL?

Data-driven

Data-driven Modeling Metadata Data Governance

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

Example 2: The Data Engineering Team Has Many Small, Valuable Files Where They Need Individual Source File Tracking In a typical data processing workflow, tracking individual files as they progress through various stages—from file delivery to data ingestion—is crucial.

Insurance

Insurance Metadata Data-driven Data Quality

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity.

Visualization

Visualization Data Processing Testing Publishing

How to Do Data Modeling the Right Way

erwin

MAY 27, 2020

What, then, should users look for in a data modeling product to support their governance/intelligence requirements in the data-driven enterprise? Nine Steps to Data Modeling. Provide metadata and schema visualization regardless of where data is stored. naming and database standards, formatting options, and so on.

Modeling

Modeling Metadata Data Governance Visualization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue. This concept makes Iceberg extremely versatile. SparkActions.get().expireSnapshots(iceTable).expireOlderThan(TimeUnit.DAYS.toMillis(7)).execute()

Data Lake

Data Lake Metadata Snapshot Analytics

Proposals for model vulnerability and security

O'Reilly on Data

MARCH 20, 2019

Data integrity constraints: Many databases don’t allow for strange or unrealistic combinations of input variables and this could potentially thwart watermarking attacks. Applying data integrity constraints on live, incoming data streams could have the same benefits. Disparate impact analysis: see section 1.

Modeling

Modeling Machine Learning Predictive Modeling Consulting

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

Data Virtualization

DECEMBER 20, 2024

The post My Reflections on the Gartner Hype Cycle for Data Management, 2024 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information. Gartner Hype Cycle methodology provides a view of how.

Management

Management Data Integration Technology Data Architecture

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. The RDF data model and the other standards in W3C’s Semantic Web stack (e.g.,

Enterprise

Enterprise Metadata Knowledge Discovery Management

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

And each of these gains requires data integration across business lines and divisions. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. So, they become very data-driven.

Metadata

Metadata Slice and Dice Data Integration Enterprise

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

erwin

FEBRUARY 6, 2020

Here are our eight recommendations for how to transition from manual to automated data management: 1) Put Data Quality First: Automating and matching business terms with data assets and documenting lineage down to the column level are critical to good decision making.

Management

Management Data Governance Cost-Benefit Metadata

Securing Confidential and Protected Data Today. Exploring VMware’s VCF Sovereign Cloud Solution (v2).

CIO Business Intelligence

JULY 10, 2024

With data privacy and security becoming an increased concern, Sovereign cloud is turning from an optional, like-to-have, to an essential requirement, especially for highly protected markets like Government, Healthcare, Financial Services, Legal, etc. This local presence is crucial for maintaining data integrity and security.

Metadata

Metadata Data-driven Marketing Measurement

There’s More to erwin Data Governance Automation Than Meets the AI

erwin

NOVEMBER 6, 2020

To better explain our vision for automating data governance, let’s look at some of the different aspects of how the erwin Data Intelligence Suite (erwin DI) incorporates automation. Data Cataloging: Catalog and sync metadata with data management and governance artifacts according to business requirements in real time.

Data Governance

Data Governance Metadata Data-driven Visualization

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

erwin

MAY 13, 2020

The role of data modeling (DM) has expanded to support enterprise data management, including data governance and intelligence efforts. Metadata management is the key to managing and governing your data and drawing intelligence from it. Types of Data Models: Conceptual, Logical and Physical.

Data Governance

Data Governance Enterprise Modeling Management

How to stay ahead of ever-evolving data privacy regulations

IBM Big Data Hub

SEPTEMBER 12, 2022

To understand how a data fabric helps maintain compliance to privacy regulations, it’s helpful to look at some essential elements of that single pane of glass. Build a foundation using a common catalog and metadata. It lets appropriate parties, such as the company’s chief data analyst, know what the data is and where it resides.

Metadata

Metadata Data Governance Enterprise Data Architecture

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. This is something that you can learn more about in just about any technology blog. We would like to talk about data visualization and its role in the big data movement.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

7 Benefits of Metadata Management

Webinars

Trending Sources

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Webinars

RDF-Star: Metadata Complexity Simplified

Four Use Cases Proving the Benefits of Metadata-Driven Automation

Metadata Management, Data Governance and Automation

Are You Content with Your Organization’s Content Strategy?

How Metadata Makes Data Meaningful

Metadata, the Neglected Stepchild of IT

Recap of Amazon Redshift key product announcements in 2024

Top 10 Data Lineage Podcasts, Blogs, and Magazines

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

The Power of Active Metadata

Doing Cloud Migration and Data Governance Right the First Time

5 Ways Data Modeling Is Critical to Data Governance

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Why data observability is essential to AI governance

Data integrity vs. data quality: Is there a difference?

How Cargotec uses metadata replication to enable cross-account data sharing

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Constructing A Digital Transformation Strategy: Putting the Data in Digital Transformation

How Metadata Makes Data Meaningful

What’s the Current State of Data Governance and Automation?

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

What is a data fabric architecture?

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Elevating Data Integration: A Four-Tier Approach to Effective Data Preparation

The Role Of Data Warehousing In Your Business Intelligence Architecture

What Is Data Modeling? Data Modeling Best Practices for Data-Driven Organizations

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Need For Personalized Data Journeys for Your Data Consumers

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How to Do Data Modeling the Right Way

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Proposals for model vulnerability and security

My Reflections on the Gartner® Hype Cycle™ for Data Management, 2024

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

You Cannot Get to the Moon on a Bike!

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

Securing Confidential and Protected Data Today. Exploring VMware’s VCF Sovereign Cloud Solution (v2).

There’s More to erwin Data Governance Automation Than Meets the AI

Modern Data Modeling: The Foundation of Enterprise Data Management and Data Governance

How to stay ahead of ever-evolving data privacy regulations

Biggest Trends in Data Visualization Taking Shape in 2022

Stay Connected