Metadata, Modeling and Structured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. foundation model (FM) in Amazon Bedrock as the LLM.

Metadata

Metadata Data Lake Modeling Data Warehouse

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

Good data governance has always involved dealing with errors and inconsistencies in datasets, as well as indexing and classifying that structured data by removing duplicates, correcting typos, standardizing and validating the format and type of data, and augmenting incomplete information or detecting unusual and impossible variations in the data.

Enterprise

Enterprise Data Quality Structured Data Modeling

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. The synchronization process in XTable works by translating table metadata using the existing APIs of these table formats.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.

IoT

IoT Machine Learning Metadata Data-driven

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

We need to do more than automate model building with autoML; we need to automate tasks at every stage of the data pipeline. In a previous post , we talked about applications of machine learning (ML) to software development, which included a tour through sample tools in data science and for managing data infrastructure.

Machine Learning

Machine Learning Software Metadata Testing

Have we reached the end of ‘too expensive’ for enterprise software?

CIO Business Intelligence

JANUARY 9, 2025

Generative artificial intelligence ( genAI ) and in particular large language models ( LLMs ) are changing the way companies develop and deliver software. The commodity effect of LLMs over specialized ML models One of the most notable transformations generative AI has brought to IT is the democratization of AI capabilities.

Software

Software Enterprise Key Performance Indicator Machine Learning

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

We have enhanced data sharing performance with improved metadata handling, resulting in data sharing first query execution that is up to four times faster when the data sharing producers data is being updated.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Alation and Salesforce partner on data governance for Data Cloud

CIO Business Intelligence

SEPTEMBER 19, 2024

It will do this, it said, with bidirectional integration between its platform and Salesforce’s to seamlessly delivers data governance and end-to-end lineage within Salesforce Data Cloud. Additional to that, we are also allowing the metadata inside of Alation to be read into these agents.”

Data Governance

Data Governance Metadata Unstructured Data Structured Data

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Unstructured information may have a little or a lot of structure but in ways that are unexpected or inconsistent. Text, images, audio, and videos are common examples of unstructured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. On the other hand, data lakes are flexible storages used to store unstructured, semi-structured, or structured raw data.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

What is a data scientist? A key data analytics role and a lucrative career

CIO Business Intelligence

MARCH 21, 2022

The data that data scientists analyze draws from many sources, including structured, unstructured, or semi-structured data. The more high-quality data available to data scientists, the more parameters they can include in a given model, and the more data they will have on hand for training their models.

Unstructured Data

Unstructured Data Data Analytics Analytics Data Science

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

AWS Big Data

APRIL 2, 2024

In this post, we share what we heard from our customers that led us to add the AI-generated data descriptions and discuss specific customer use cases addressed by this capability. We also detail how the feature works and what criteria was applied for the model and prompt selection while building on Amazon Bedrock.

Metadata

Metadata Metrics Data-driven Contextual Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

S3 Tables integration with the AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize dataincluding Amazon S3 Metadata tablesusing AWS analytics services such as Amazon Data Firehose , Amazon Athena , Amazon Redshift, Amazon EMR, and Amazon QuickSight. With AWS Glue 5.0,

Analytics

Analytics Data Lake Metadata Data Warehouse

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

It encompasses the people, processes, and technologies required to manage and protect data assets. The Data Management Association (DAMA) International defines it as the “planning, oversight, and control over management of data and the use of data and data-related sources.”

Data Governance

Data Governance Management Metadata Data Quality

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

The Semantic Web started in the late 90’s as a fascinating vision for a web of data, which is easy to interpret by both humans and machines. One of its pillars are ontologies that represent explicit formal conceptual models, used to describe semantically both unstructured content and databases. Take this restaurant, for example.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Top 10 Key Features of BI Tools in 2020

FineReport

FEBRUARY 5, 2020

Metadata management. Users can centrally manage metadata, including searching, extracting, processing, storing, sharing metadata, and publishing metadata externally. The metadata here is focused on the dimensions, indicators, hierarchies, measures and other data required for business analysis.

Metadata

Metadata Dashboards Informatics Visualization

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

The results showed that (among those surveyed) approximately 90% of enterprise analytics applications are being built on tabular data. The ease with which such structured data can be stored, understood, indexed, searched, accessed, and incorporated into business models could explain this high percentage.

Data-driven

Data-driven Enterprise Analytics Machine Learning

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

CIO Business Intelligence

SEPTEMBER 12, 2024

One reason is that documents, medical records, emails, images, video, and audio and so on, are almost impossible to prepare, manage, and use in AI applications before recent technological strides in areas such as AI, computer vision, and large language models such as those used in generative AI.

Unstructured Data

Unstructured Data Deep Learning Metadata Structured Data

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Ontotext

OCTOBER 12, 2023

Did you know that, if you add “take a deep breath” to a prompt, chances are you will get more accurate results from Large Language Models (LLMs)? Do Knowledge Graphs Dream of Large Language Models? I didn’t either. He shared the need for more research at the intersection of LLMs and knowledge graphs.

Modeling

Modeling Recreation/Entertainment Data Processing Metadata

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

As a result, users can easily find what they need, and organizations avoid the operational and cost burdens of storing unneeded or duplicate data copies. Newer data lakes are highly scalable and can ingest structured and semi-structured data along with unstructured data like text, images, video, and audio.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

ZS unlocked new value from unstructured data for evidence generation leads by applying large language models (LLMs) and generative artificial intelligence (AI) to power advanced semantic search on evidence protocols. In the pipeline, the data ingestion process takes shape through a thoughtfully structured sequence of steps.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

The need for a decentralized data mesh architecture stems from the challenges organizations faced when implementing more centralized data management architectures – challenges that can attributed to both technology (e.g., need to integrate multiple “point solutions” used in a data ecosystem) and organization reasons (e.g.,

Metadata

Metadata Cost-Benefit Enterprise Interactive

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

A number of industry leaders are already experimenting with advanced AI use cases, including Denso, a leading mobility supplier that develops advanced technology and components for nearly every vehicle make and model on the road today. Denso uses AI to verify the structuring of unstructured data from across its organisation.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

In Computer Science, we are trained to use the Okham razor – the simplest model of reality that can get the job done is the best one. Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Generative AI is pushing unstructured data to center stage

CIO Business Intelligence

DECEMBER 13, 2023

Applications such as financial forecasting and customer relationship management brought tremendous benefits to early adopters, even though capabilities were constrained by the structured nature of the data they processed. have encouraged the creation of unstructured data.

Unstructured Data

Unstructured Data IoT Metadata Manufacturing

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies. Unlike software, ML models need continuous tuning.

Unstructured Data

Unstructured Data Cost-Benefit Metadata Machine Learning

The Role of AI and ML in Model Governance

Alation

JUNE 2, 2022

These include tracking, documenting, monitoring, versioning, and controlling access to AI/ML models. Currently, models are managed by modelers and by the software tools they use, which results in a patchwork of control, but not on an enterprise level. And until recently, such governance processes have been fragmented.

Modeling

Modeling Data Governance Statistics Unstructured Data

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

The Data Fabric paradigm combines design principles and methodologies for building efficient, flexible and reliable data management ecosystems. Knowledge Graphs are the Warp and Weft of a Data Fabric. To implement any Data Fabric approach, it is essential to be able to understand the context of data.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

erwin

APRIL 18, 2019

An effective data governance initiative should enable just that, by giving an organization the tools to: Discover data: Identify and interrogate metadata from various data management silos. Harvest data: Automate the collection of metadata from various data management silos and consolidate it into a single source.

Data Governance

Data Governance Metadata Data Collection Data-driven

Graphs on the Ground Part I: The Power of Knowledge Graphs within the Financial Industry

Ontotext

OCTOBER 14, 2021

For the purposes of this article, you just need to know the following: A graph is a method of storing and modeling data that uniquely captures the relationships between data. This creates a contextual layer that allows companies to rapidly retrieve and interpret the data stored in the graph whether structured or unstructured.

Reporting

Reporting Structured Data Data Warehouse Metadata

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data. Organizationally the innovation of self-service analytics, pioneered by Tableau and Qlik, fundamentally transformed the user model for data analysis.

Data Lake

Data Lake Metadata Structured Data Big Data

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

Today’s platform owners, business owners, data developers, analysts, and engineers create new apps on the Cloudera Data Platform and they must decide where and how to store that data. Structured data (such as name, date, ID, and so on) will be stored in regular SQL databases like Hive or Impala databases.

Metadata

Metadata Big Data Optimization Machine Learning

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed for analyzing large volumes of data and performing complex queries on structured and semi-structured data. Tags provide metadata about resources at a glance.

Snapshot

Snapshot Metadata Measurement Data Warehouse

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Sources Data can be loaded from multiple sources, such as systems of record, data generated from applications, operational data stores, enterprise-wide reference data and metadata, data from vendors and partners, machine-generated data, social sources, and web sources.

Analytics

Analytics Data Warehouse Data Lake Metadata

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Shutterstock capitalizes on the cloud’s cutting edge

CIO Business Intelligence

MARCH 6, 2023

Advancements in analytics and AI as well as support for unstructured data in centralized data lakes are key benefits of doing business in the cloud, and Shutterstock is capitalizing on its cloud foundation, creating new revenue streams and business models using the cloud and data lakes as key components of its innovation platform.

Data Lake

Data Lake Cost-Benefit Recreation/Entertainment Unstructured Data

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

It won’t protect you from issues of data quality or from service failures. […] But Linked Data does provide you with new ways to manage these existing data-management challenges. 6 Linked Data, Structured Data on the Web. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

Across the country, data scientists have an unemployment rate of 2% and command an average salary of nearly $100,000. As they attempt to put machine learning models into production, data science teams encounter many of the same hurdles that plagued data analytics teams in years past: Finding trusted, valuable data is time-consuming.

Metadata

Metadata Data Quality Statistics Data Science

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. This blog focuses on governing spreadsheets that contain data, information, and metadata, and must themselves be governed. Data catalogs and spreadsheets are related in many ways.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

When is data too clean to be useful for enterprise AI?

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

How EUROGATE established a data mesh architecture using Amazon DataZone

Deep automation in machine learning

Have we reached the end of ‘too expensive’ for enterprise software?

Recap of Amazon Redshift key product announcements in 2024

Alation and Salesforce partner on data governance for Data Cloud

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Unstructured data management and governance using AWS AI/ML and analytics services

Understanding the Differences Between Data Lakes and Data Warehouses

Data governance in the age of generative AI

Salesforce debuts Zero Copy Partner Network to ease data integration

What is a data scientist? A key data analytics role and a lucrative career

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

Top analytics announcements of AWS re:Invent 2024

What is data governance? Best practices for managing data assets

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Top 10 Key Features of BI Tools in 2020

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Building a Beautiful Data Lakehouse

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Making OT-IT integration a reality with new data architectures and generative AI

You Cannot Get to the Moon on a Bike!

Generative AI is pushing unstructured data to center stage

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

The Role of AI and ML in Model Governance

From Data Silos to Data Fabric with Knowledge Graphs

A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

Graphs on the Ground Part I: The Power of Knowledge Graphs within the Financial Industry

The Future Is Hybrid Data, Embrace It

Data Cataloging in the Data Lake: Alation + Kylo

A Flexible and Efficient Storage System for Diverse Workloads

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

The Future Is Hybrid Data, Embrace It

Shutterstock capitalizes on the cloud’s cutting edge

If Johnny Mnemonic Smuggled Linked Data

The Data Scientist’s Guide to the Data Catalog

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Stay Connected