Analytics, Metadata and Structured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Table metadata is fetched from AWS Glue. The generated Athena SQL query is run. ./

Metadata

Metadata Data Lake Modeling Data Warehouse

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Enhance agility by localizing changes within business domains and clear data contracts. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. This allowed customers to scale read analytics workloads and offered isolation to help maintain SLAs for business-critical applications.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data. Metadata Is the Heart of Data Intelligence.

Metadata

Metadata Management Data-driven Data Architecture

The Missing Link in Enterprise Data Governance: Metadata

Octopai

JUNE 26, 2020

In order to figure out why the numbers in the two reports didn’t match, Steve needed to understand everything about the data that made up those reports – when the report was created, who created it, any changes made to it, which system it was created in, etc. Enterprise data governance. Metadata in data governance.

Metadata

Metadata Data Governance Enterprise Reporting

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. This premier event showcased groundbreaking advancements, keynotes from AWS leadership, hands-on technical sessions, and exciting product launches.

Analytics

Analytics Data Lake Metadata Data Warehouse

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

I recently saw an informal online survey that asked users which types of data (tabular, text, images, or “other”) are being used in their organization’s analytics applications. The results showed that (among those surveyed) approximately 90% of enterprise analytics applications are being built on tabular data.

Data-driven

Data-driven Enterprise Analytics Machine Learning

What is a data scientist? A key data analytics role and a lucrative career

CIO Business Intelligence

MARCH 21, 2022

What is a data scientist? Data scientists are analytical data experts who use data science to discover insights from massive amounts of structured and unstructured data to help shape or meet specific business needs and goals. Semi-structured data falls between the two.

Unstructured Data

Unstructured Data Data Analytics Analytics Data Science

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

The data catalog is a searchable asset that enables all data – including even formerly siloed tribal knowledge – to be cataloged and more quickly exposed to users for analysis. Three Types of Metadata in a Data Catalog. Technical Metadata. Operational Metadata. for analysis and integration purposes).

Metadata

Metadata Cost-Benefit Measurement Data-driven

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. Data enrichment In addition, additional metadata may need to be extracted from the objects.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Amazon DataZone , a data management service, helps you catalog, discover, share, and govern data stored across AWS, on-premises systems, and third-party sources. After you create the asset, you can add glossaries or metadata forms, but its not necessary for this post. Enter a name for the asset.

Publishing

Publishing Unstructured Data Metadata Data-driven

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

AWS Big Data

APRIL 2, 2024

Amazon DataZone enables you to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. Amazon DataZone uses the available metadata on the asset to generate the descriptions.

Metadata

Metadata Metrics Data-driven Contextual Data

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

This post provides guidance on how to build scalable analytical solutions for gaming industry use cases using Amazon Redshift Serverless. Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. A data warehouse is one of the components in a data hub.

Analytics

Analytics Data Warehouse Data Lake Metadata

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Data analytics – Business analysts gather operational insights from multiple data sources, including the location data collected from the vehicles. Athena is used to run geospatial queries on the location data stored in the S3 buckets. For Data source , choose AwsDataCatalog. Choose Run.

Analytics

Analytics IoT Metadata Internet of Things

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

The Business Application Research Center (BARC) warns that data governance is a highly complex, ongoing program, not a “big bang initiative,” and it runs the risk of participants losing trust and interest over time. The program must introduce and support standardization of enterprise data.

Data Governance

Data Governance Management Metadata Data Quality

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

Applying artificial intelligence (AI) to data analytics for deeper, better insights and automation is a growing enterprise IT priority. But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

Why You Need End-to-End Data Lineage

erwin

SEPTEMBER 10, 2020

Yet given this era of digital transformation and fierce competition, understanding what data you have, where it came from, how it’s changed since creation or acquisition, and whether it poses any risks is paramount to optimizing its value. Data Lineage Tells an Important Origin Story. Who are the data owners?

Data Governance

Data Governance Key Performance Indicator Metadata Digital Transformation

Why Your Data Lineage is Incomplete Without an Automated Business Glossary

Octopai

FEBRUARY 8, 2020

While some businesses suffer from “data translation” issues, others are lacking in discovery methods and still do metadata discovery manually. Moreover, others need to trace data history, get its context to resolve an issue before it actually becomes an issue. The solution is a comprehensive automated metadata platform.

Metadata

Metadata Key Performance Indicator Unstructured Data Business Intelligence

Top 10 Key Features of BI Tools in 2020

FineReport

FEBRUARY 5, 2020

Based on the study of the evaluation criteria of Gartner Magic Quadrant for analytics and Business Intelligence Platforms, I have summarized top 10 key features of BI tools for your reference. Overall, as users’ data sources become more extensive, their preferences for BI are changing. Metadata management. Analytics dashboards.

Metadata

Metadata Dashboards Informatics Visualization

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

However, there is a fundamental challenge standing in the way of being successful: data. By breaking down data silos and integrating log data from multiple sources, Cloudera empowers defenders with the real-time analytics to respond to threats swiftly.

Analytics

Analytics Metadata Snapshot Data-driven

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Ontotext

JULY 29, 2021

KGs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management. The RDF data model and the other standards in W3C’s Semantic Web stack (e.g.,

Enterprise

Enterprise Metadata Knowledge Discovery Management

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

AWS Big Data

SEPTEMBER 12, 2024

We use leading-edge analytics, data, and science to help clients make intelligent decisions. AWS services such as Amazon Neptune and Amazon OpenSearch Service form part of their data and analytics pipelines, and AWS Batch is used for long-running data and machine learning (ML) processing tasks.

Unstructured Data

Unstructured Data Metadata Machine Learning Consulting

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Texts Without Pages: Advancing Text Analytics with Content Enrichment

Ontotext

NOVEMBER 12, 2020

Different technologies approach the tasks differently but here we want to talk about text analytics combined with content enrichment. Taking Text Analytics to the Next Level with Content Enrichment. With text analytics and content enrichment, we can think of text as separate digital objects relating to each other.

Analytics

Analytics Publishing Metadata Structured Data

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

AWS Big Data

OCTOBER 2, 2023

For a deeper exploration on configuring and using streaming ingestion in Amazon Redshift , refer to Real-time analytics with Amazon Redshift streaming ingestion. For streams that contain the raw binary data encoded in JSON format, Amazon Redshift provides a variety of tools for parsing and managing the data.

Cost-Benefit

Cost-Benefit Metadata Structured Data Data-driven

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Limiting growth by (data integration) complexity Most operational IT systems in an enterprise have been developed to serve a single business function and they use the simplest possible model for this. In order to integrate structured data, enterprises need to implement the data fabric pattern.

Metadata

Metadata Slice and Dice Data Integration Enterprise

From Data Silos to Data Fabric with Knowledge Graphs

Ontotext

SEPTEMBER 15, 2020

The Data Fabric paradigm combines design principles and methodologies for building efficient, flexible and reliable data management ecosystems. Knowledge Graphs are the Warp and Weft of a Data Fabric. To implement any Data Fabric approach, it is essential to be able to understand the context of data.

Metadata

Metadata Knowledge Discovery Data Quality Strategy

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

It was designed as a native object store to provide extreme scale, performance, and reliability to handle multiple analytics workloads using either S3 API or the traditional Hadoop API. Structured data (such as name, date, ID, and so on) will be stored in regular SQL databases like Hive or Impala databases. LEGACY Bucket.

Metadata

Metadata Big Data Optimization Machine Learning

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. Deploying Data Lakes in the cloud. Possibilities with Data Lakes.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges. We recommend building your data strategy around five pillars of C360, as shown in the following figure.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Business units can curate and expose their readily available domain-specific data products through Amazon DataZone, providing discoverability and controlled access. Thousands of customers use Amazon Redshift data sharing to enable instant, granular, and fast data access across Amazon Redshift provisioned clusters and serverless workgroups.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data. Organizationally the innovation of self-service analytics, pioneered by Tableau and Qlik, fundamentally transformed the user model for data analysis. Disruptive Trend #1: Hadoop.

Data Lake

Data Lake Metadata Structured Data Big Data

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform. This will of course translate into more options to explore data, leading to faster, better and more business insights.

Data Warehouse

Data Warehouse Measurement Reporting Testing

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

They classified the metrics and indicators in the following categories: Data usage – A clear understanding of who is consuming what data source, materialized with a mapping of consumers and producers. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

As most organizations, that have turned to Google Analytics (GA) as a digital analytics solution, mature they discover a more pressing need to integrate this data silo with the rest of their organization’s data to enable better analytics and resulting product development and fraud detection.

Analytics

Analytics Data Lake Testing Optimization

How to Choose an Automated Data Mapping Tool for Your BI Environment

Octopai

FEBRUARY 27, 2020

The business intelligence & analytics environment becomes further complex when you consider the multiple systems in place processing data (database (DWH), ETL, analysis, reporting) that are all likely housing different descriptions of the same data, creating ONE. Why Spend Hours & Days on Data Mapping?

Metadata

Metadata Data Warehouse Reporting Structured Data

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

This requirement arises because the data and analytics associated with a particular use case can sometimes involve hundreds of files. You can build projects and subscribe to both unstructured and structured data assets within the Amazon DataZone portal.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera

SEPTEMBER 17, 2020

They were not able to quickly and easily query and analyze huge amounts of data as required. They also needed to combine text or other unstructured data with structured data and visualize the results in the same dashboards. Events or time-series data served by our real-time events or time-series data store solutions.

Data Warehouse

Data Warehouse Experimentation Dashboards Visualization

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments.

Data Lake

Data Lake Metadata Unstructured Data Management

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Recap of Amazon Redshift key product announcements in 2024

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

The Missing Link in Enterprise Data Governance: Metadata

Unstructured data management and governance using AWS AI/ML and analytics services

Top analytics announcements of AWS re:Invent 2024

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

What is a data scientist? A key data analytics role and a lucrative career

Do I Need a Data Catalog?

Data governance in the age of generative AI

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AI recommendations for descriptions in Amazon DataZone for enhanced business data cataloging and discovery is now generally available

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Gain insights from historical location data using Amazon Location Service and AWS analytics services

What is data governance? Best practices for managing data assets

Building a Beautiful Data Lakehouse

Why You Need End-to-End Data Lineage

Why Your Data Lineage is Incomplete Without an Automated Business Glossary

Top 10 Key Features of BI Tools in 2020

Empower Your Cyber Defenders with Real-Time Analytics

The Semantic Web: 20 Years And a Handful of Enterprise Knowledge Graphs Later

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

The Future Is Hybrid Data, Embrace It

Texts Without Pages: Advancing Text Analytics with Content Enrichment

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

You Cannot Get to the Moon on a Bike!

From Data Silos to Data Fabric with Knowledge Graphs

A Flexible and Efficient Storage System for Diverse Workloads

Data Lakes on Cloud & it’s Usage in Healthcare

The Future Is Hybrid Data, Embrace It

Create an end-to-end data strategy for Customer 360 on AWS

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Data Cataloging in the Data Lake: Alation + Kylo

Enhance query performance using AWS Glue Data Catalog column-level statistics

Key considerations when making a decision on a Cloud Data Warehouse

Design a data mesh on AWS that reflects the envisioned organization

How SumUp made digital analytics more accessible using AWS Glue

How to Choose an Automated Data Mapping Tool for Your BI Environment

Amazon DataZone announces custom blueprints for AWS services

How to get powerful and actionable insights from any and all of your data, without delay

Migrate Hive data from CDH to CDP public cloud

Stay Connected