IT, Metadata and Structured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Amazon Athena provides interactive analytics service for analyzing the data in Amazon Simple Storage Service (Amazon S3). Amazon Redshift is used to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Table metadata is fetched from AWS Glue.

Metadata

Metadata Data Lake Modeling Data Warehouse

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Traditionally, financial data analysis could require deep SQL expertise and database knowledge. Now with Amazon Bedrock Knowledge Bases integration with structured data, you can use simple, natural language prompts to query complex financial datasets. Enable Amazon Bedrock large language model (LLM) access for Amazon Nova Pro.

Structured Data

Structured Data Data Warehouse Analytics Finance

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In later pipeline stages, data is converted to Iceberg, to benefit from its read performance.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

When is data too clean to be useful for enterprise AI?

CIO Business Intelligence

NOVEMBER 27, 2024

But along with siloed data and compliance concerns , poor data quality is holding back enterprise AI projects. And while most executives generally trust their data, they also say less than two thirds of it is usable. At worst, it can go in and remove signal from your data, and actually be at cross purposes with what you need.”

Enterprise

Enterprise Data Quality Structured Data Modeling

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. They don’t know exactly what data they have or even where some of it is. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

The Missing Link in Enterprise Data Governance: Metadata

Octopai

JUNE 26, 2020

In order to figure out why the numbers in the two reports didn’t match, Steve needed to understand everything about the data that made up those reports – when the report was created, who created it, any changes made to it, which system it was created in, etc. Enterprise data governance. Metadata in data governance.

Metadata

Metadata Data Governance Enterprise Reporting

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

If you’re serious about a data-driven strategy , you’re going to need a data catalog. Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner. Three Types of Metadata in a Data Catalog.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Deep automation in machine learning

O'Reilly on Data

DECEMBER 19, 2018

are only starting to exist; one big task over the next two years is developing the IDEs for machine learning, plus other tools for data management, pipeline management, data cleaning, data provenance, and data lineage. GitHub is an excellent tool for managing code, but we need to think about [code+data].

Machine Learning

Machine Learning Software Metadata Testing

Alation and Salesforce partner on data governance for Data Cloud

CIO Business Intelligence

SEPTEMBER 19, 2024

Data intelligence platform vendor Alation has partnered with Salesforce to deliver trusted, governed data across the enterprise. It will do this, it said, with bidirectional integration between its platform and Salesforce’s to seamlessly delivers data governance and end-to-end lineage within Salesforce Data Cloud.

Data Governance

Data Governance Metadata Unstructured Data Structured Data

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Amazon DataZone , a data management service, helps you catalog, discover, share, and govern data stored across AWS, on-premises systems, and third-party sources. Create a new Amazon DataZone project (for this post, we name it unstructured-data-producer-project) for publishing the unstructured S3 data asset.

Publishing

Publishing Unstructured Data Metadata Data-driven

Have we reached the end of ‘too expensive’ for enterprise software?

CIO Business Intelligence

JANUARY 9, 2025

This required dedicated infrastructure and ideally a full MLOps pipeline (for model training, deployment and monitoring) to manage data collection, training and model updates. Content management systems: Content editors can search for assets or content using descriptive language without relying on extensive tagging or metadata.

Software

Software Enterprise Key Performance Indicator Machine Learning

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. On the other hand, data lakes are flexible storages used to store unstructured, semi-structured, or structured raw data.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

The Benefits of a Knowledge Graph-based Metadata Hub

Ontotext

DECEMBER 15, 2022

Today’s enterprises are increasingly daunted by the realization that more data doesn’t automatically equal deeper knowledge and better business decisions. The mere existence of the 175 zettabytes of data the International Data Corporation estimates the world will possess by 2025 doesn’t matter if organizations can’t leverage it effectively.

Metadata

Metadata Unstructured Data Structured Data Enterprise

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. Grant the user role permissions for sensitive information and compliance policies.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Making OT-IT integration a reality with new data architectures and generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Manufacturers have long held a data-driven vision for the future of their industry. It’s one where near real-time data flows seamlessly between IT and operational technology (OT) systems. Legacy data management is holding back manufacturing transformation Until now, however, this vision has remained out of reach.

Data Architecture

Data Architecture Unstructured Data Manufacturing IT

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

S3 Tables integration with the AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize dataincluding Amazon S3 Metadata tablesusing AWS analytics services such as Amazon Data Firehose , Amazon Athena , Amazon Redshift, Amazon EMR, and Amazon QuickSight. With AWS Glue 5.0, AWS Glue 5.0 AWS Glue 5.0

Analytics

Analytics Data Lake Metadata Data Warehouse

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

At Salesforce World Tour NYC today, Salesforce unveiled a new global ecosystem of technology and solution providers geared to help its customers leverage third-party data via secure, bidirectional zero-copy integrations with Salesforce Data Cloud. It works in Salesforce just like any other native Salesforce data,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

What is a data scientist? A key data analytics role and a lucrative career

CIO Business Intelligence

MARCH 21, 2022

Data scientists are becoming increasingly important in business, as organizations rely more heavily on data analytics to drive decision-making and lean on automation and machine learning as core components of their IT strategies. Data scientist job description. Semi-structured data falls between the two.

Unstructured Data

Unstructured Data Data Analytics Analytics Data Science

Why You Need End-to-End Data Lineage

erwin

SEPTEMBER 10, 2020

Not Documenting End-to-End Data Lineage Is Risky Busines – Understanding your data’s origins is key to successful data governance. Not everyone understands what end-to-end data lineage is or why it is important. Data Lineage Tells an Important Origin Story. Who are the data owners?

Data Governance

Data Governance Key Performance Indicator Metadata Digital Transformation

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Data governance definition Data governance is a system for defining who within an organization has authority and control over data assets and how those data assets may be used. It encompasses the people, processes, and technologies required to manage and protect data assets.

Data Governance

Data Governance Management Metadata Data Quality

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

AWS Big Data

APRIL 2, 2024

Data consumers need detailed descriptions of the business context of a data asset and documentation about its recommended use cases to quickly identify the relevant data for their intended use case. It saves organizations time for catalog curation and speeds discovery for relevant use cases of the data.

Metadata

Metadata Metrics Data-driven Contextual Data

Why Your Data Lineage is Incomplete Without an Automated Business Glossary

Octopai

FEBRUARY 8, 2020

While some businesses suffer from “data translation” issues, others are lacking in discovery methods and still do metadata discovery manually. Moreover, others need to trace data history, get its context to resolve an issue before it actually becomes an issue. The solution is a comprehensive automated metadata platform.

Metadata

Metadata Key Performance Indicator Unstructured Data Business Intelligence

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

The Semantic Web started in the late 90’s as a fascinating vision for a web of data, which is easy to interpret by both humans and machines. In this post you will discover the aspects of the Semantic Web that are key to enterprise data, knowledge and content management. Source: tag.ontotext.com. Take this restaurant, for example.

Enterprise

Enterprise Metadata Knowledge Discovery Management

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

The results showed that (among those surveyed) approximately 90% of enterprise analytics applications are being built on tabular data. The ease with which such structured data can be stored, understood, indexed, searched, accessed, and incorporated into business models could explain this high percentage.

Data-driven

Data-driven Enterprise Analytics Machine Learning

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas. Introduction to the Data Mesh Architecture and its Required Capabilities.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Top 10 Key Features of BI Tools in 2020

FineReport

FEBRUARY 5, 2020

Both the investment community and the IT circle are paying close attention to big data and business intelligence. Overall, as users’ data sources become more extensive, their preferences for BI are changing. They prefer self-service development, interactive dashboards, and self-service data exploration. Metadata management.

Metadata

Metadata Dashboards Informatics Visualization

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

CIO Business Intelligence

SEPTEMBER 12, 2024

Now that AI can unravel the secrets inside a charred, brittle, ancient scroll buried under lava over 2,000 years ago, imagine what it can reveal in your unstructured data–and how that can reshape your work, thoughts, and actions. Unstructured data has been integral to human society for over 50,000 years. You’re not alone.

Unstructured Data

Unstructured Data Deep Learning Metadata Structured Data

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

Now, evidence generation leads (medical affairs, HEOR, and RWE) can have a natural-language, conversational exchange and return a list of evidence activities with high relevance considering both structured data and the details of the studies from unstructured sources. Evidence generation is rife with knowledge management challenges.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

AWS Big Data

OCTOBER 2, 2023

Amazon Redshift Streaming Ingestion allows you to connect to Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK) directly through materialized views, in real time and without the complexity associated with staging the data in Amazon Simple Storage Service (Amazon S3) and loading it into the cluster.

Cost-Benefit

Cost-Benefit Metadata Structured Data Data-driven

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. A data hub contains data at multiple levels of granularity and is often not integrated.

Analytics

Analytics Data Warehouse Data Lake Metadata

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Data quality is crucial in data pipelines because it directly impacts the validity of the business insights derived from the data. Today, many organizations use AWS Glue Data Quality to define and enforce data quality rules on their data at rest and in transit. In the post_dq_results_to_datazone.py

Data Quality

Data Quality Visualization Metadata Key Performance Indicator

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

This requires new tools and new systems, which results in diverse and siloed data. And each of these gains requires data integration across business lines and divisions. Any type of metadata or universal data model is likely to slow down development and increase costs, which will affect the time to market and profit.

Metadata

Metadata Slice and Dice Data Integration Enterprise

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

What Makes a Data Fabric? Data Fabric’ has reached where ‘Cloud Computing’ and ‘Grid Computing’ once trod. Data Fabric hit the Gartner top ten in 2019. This multiplicity of data leads to the growth silos, which in turns increases the cost of integration. It is a buzzword.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

erwin

APRIL 18, 2019

California Consumer Privacy Act (CCPA) compliance shares many of the same requirements in the European Unions’ General Data Protection Regulation (GDPR). Data governance , thankfully, provides a framework for compliance with either or both – in addition to other regulatory mandates your organization may be subject to.

Data Governance

Data Governance Metadata Data Collection Data-driven

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies. Unlike software, ML models need continuous tuning.

Unstructured Data

Unstructured Data Cost-Benefit Metadata Machine Learning

Graphs on the Ground Part I: The Power of Knowledge Graphs within the Financial Industry

Ontotext

OCTOBER 14, 2021

For the purposes of this article, you just need to know the following: A graph is a method of storing and modeling data that uniquely captures the relationships between data. A knowledge graph uses this format to integrate data from different sources while enriching it with metadata that documents collective knowledge about the data.

Reporting

Reporting Structured Data Data Warehouse Metadata

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Webinars

Trending Sources

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

When is data too clean to be useful for enterprise AI?

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

How EUROGATE established a data mesh architecture using Amazon DataZone

The Missing Link in Enterprise Data Governance: Metadata

Recap of Amazon Redshift key product announcements in 2024

Do I Need a Data Catalog?

Deep automation in machine learning

Alation and Salesforce partner on data governance for Data Cloud

Unstructured data management and governance using AWS AI/ML and analytics services

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Have we reached the end of ‘too expensive’ for enterprise software?

Understanding the Differences Between Data Lakes and Data Warehouses

The Future Is Hybrid Data, Embrace It

The Benefits of a Knowledge Graph-based Metadata Hub

Data governance in the age of generative AI

Making OT-IT integration a reality with new data architectures and generative AI

The Future Is Hybrid Data, Embrace It

Top analytics announcements of AWS re:Invent 2024

Salesforce debuts Zero Copy Partner Network to ease data integration

What is a data scientist? A key data analytics role and a lucrative career

Why You Need End-to-End Data Lineage

What is data governance? Best practices for managing data assets

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

Why Your Data Lineage is Incomplete Without an Automated Business Glossary

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Building a Beautiful Data Lakehouse

Top 10 Key Features of BI Tools in 2020

From charred scrolls to customer sentiment: How AI helps you monetize your unstructured data

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Implement data quality checks on Amazon Redshift data assets and integrate with Amazon DataZone

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

You Cannot Get to the Moon on a Bike!

From Data Silos to Data Fabric with Knowledge Graphs

A Guide to CCPA Compliance and How the California Consumer Privacy Act Compares to GDPR

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Graphs on the Ground Part I: The Power of Knowledge Graphs within the Financial Industry

Stay Connected