2023 and Metadata - Data Leaders Brief

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

CIO Business Intelligence

DECEMBER 11, 2024

If 2023 was the year of AI discovery and 2024 was that of AI experimentation, then 2025 will be the year that organisations seek to maximise AI-driven efficiencies and leverage AI for competitive advantage. Primary among these is the need to ensure the data that will power their AI strategies is fit for purpose.

Risk

Risk Data Strategy Strategy Data Governance

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Icebergs table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites.

Metadata

Metadata Snapshot Cost-Benefit Optimization

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

The release of SAP Datasphere was launched and announced globally on March 8, 2023. Datasphere goes beyond the “big three” data usage end-user requirements (ease of discovery, access, and delivery) to include data orchestration (data ops and data transformations) and business data contextualization (semantics, metadata, catalog services).

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Enterprises can gain an edge with Metadata Management

CIO Business Intelligence

SEPTEMBER 6, 2024

Central to this is metadata management, a critical component for driving future success AI and ML need large amounts of accurate data for companies to get the most out of the technology. Let’s dive into what that looks like, what workarounds some IT teams use today, and why metadata management is the key to success.

Metadata

Metadata Enterprise Management Cost-Benefit

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Central to a transactional data lake are open table formats (OTFs) such as Apache Hudi , Apache Iceberg , and Delta Lake , which act as a metadata layer over columnar formats. Originally open sourced in November 2023 under the name OneTable, with contributions from amongst others OneHouse , it was licensed under Apache 2.0.

Metadata

Metadata Data Lake Snapshot Data Warehouse

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

In this post, we are happy to summarize the results of our hard work in 2023 to improve and simplify data governance for customers. We announced our new features and capabilities during AWS re:Invent 2023, as is our custom every year. In 2023, we released several updates to AWS Glue crawlers. Bienvenue dans DataZone!

Data Lake

Data Lake Metadata Data Governance Statistics

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata. It uses metadata from database schemas to improve the SQL query suggestions.

Metadata

Metadata Sales Data Warehouse Optimization

Amazon OpenSearch Service search enhancements: 2023 roundup

AWS Big Data

JANUARY 9, 2024

Now users seek methods that allow them to get even more relevant results through semantic understanding or even search through image visual similarities instead of textual search of metadata. We are excited about the OpenSearch Service features and enhancements we’ve added to that toolkit in 2023.

Visualization

Visualization Cost-Benefit Modeling Machine Learning

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

ANT 352 | [NEW LAUNCH] Amazon Q generative SQL in Amazon Redshift Query Editor SQL, the industry standard language for data analytics, often requires users to spend a lot of time understanding an organization’s complex metadata in order to write and carry out complex SQL queries for data insights.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. You lose the roots: the business context, the metadata, the connections, the hierarchies and security. This week I was in Dubai for the latest edition of the SAP Partner Innovation Meeting. Innovating Faster.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

Amazon OpenSearch Service H1 2023 in review

AWS Big Data

AUGUST 23, 2023

Since its release in January 2021, the OpenSearch project has released 14 versions through June 2023. In this post, we provide a review of all the exciting features releases in OpenSearch Service in the first half of 2023. In July 2023, we previewed support for a third collection type: vector search. in OpenSearch Service).

Snapshot

Snapshot Dashboards Visualization Metrics

Can LLMs Become Knowledgeable – Impressions from Day 1 At SEMANTiCS 2023

Ontotext

OCTOBER 11, 2023

SEMANTiCS 2023 kicked off with a Pre-conference day that offered an awesome lineup of business and academia talks. Andreas Blumauer presenting his talk: Responsible AI and LLMs SEMANTiCS 2023 Andreas focused on how we can take the best of both worlds and work on responsible, explainable generative AI. Are LLMs Knowledgeable?

Metadata

Metadata Cost-Benefit Marketing Modeling

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

At AWS re:Invent 2023, we introduced more performance enhancements in query planning and execution such as enhanced bloom filters , query rewrites, and support for write operations in auto scaling. At AWS re:Invent 2023, we extended data sharing capabilities to launch multi-data warehouse writes in preview.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Our Top Data and Analytics Predicts for 2021

Andrew White

JANUARY 12, 2021

Predicts 2021: Data and Analytics Leaders Are Poised for Success but Risk an Uncertain Future : By 2023, 50% of chief digital officers in enterprises without a chief data officer (CDO) will need to become the de facto CDO to succeed. By 2023, ERP data will be the basis for 30% of AI-generated predictive analyses and forecasts.

Analytics

Analytics Metadata Enterprise Data-driven

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

So, KGF 2023 proved to be a breath of fresh air for anyone interested in topics like data mesh and data fabric , knowledge graphs, text analysis , large language model (LLM) integrations, retrieval augmented generation (RAG), chatbots, semantic data integration , and ontology building. Three presentations at the KGF 2023 proved it.

Metadata

Metadata Sales Machine Learning Consulting

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Generative AI is the biggest and hottest trend in AI (Artificial Intelligence) at the start of 2023. The latter is essential for Generative AI implementations. Love thy data: data are never perfect, but all the data may produce value, though not immediately.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Maximize your data dividends with active metadata

IBM Big Data Hub

NOVEMBER 28, 2022

Metadata management performs a critical role within the modern data management stack. However, as data volumes continue to grow, manual approaches to metadata management are sub-optimal and can result in missed opportunities. This puts into perspective the role of active metadata management. What is Active Metadata management?

Metadata

Metadata Data Quality Data-driven Data Governance

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Ontotext

OCTOBER 12, 2023

I learned that fact from a comment in the audience on the second day of SEMANTICS 2023 – the European conference series focused on semantic technologies ever since 2005. Aidan Hogan at SEMANTiCS 2023. I didn’t either. What If ChatGPT Is the Killer App for the Semantic Web?

Modeling

Modeling Recreation/Entertainment Data Processing Metadata

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

A Data Prediction for 2025

DataKitchen

FEBRUARY 2, 2023

We’ve read many predictions for 2023 in the data field: they cover excellent topics like data mesh, observability, governance, lakehouses, LLMs, etc. Most data governance tools today start with the slow, waterfall building of metadata with data stewards and then hope to use that metadata to drive code that runs in production.

Metadata

Metadata Testing Data Science Risk

5 Alation Customers Sharing Data Successes at Snowflake Summit 2023

Alation

JUNE 1, 2023

Join this session to learn how DIRECTV partnered with Alation to map their new dataverse, which includes Snowflake data sources (hubs), glossaries, enhanced metadata for metadata objects, lineage, and quality. They also recognized that to become 100% data- driven, first they had to become 100% metadata- driven.

Metadata

Metadata Data Governance Data-driven Recreation/Entertainment

Data Trends to Watch in 2023

TDAN

DECEMBER 6, 2022

Your business doesn’t stay still— and neither does the data landscape. While the next 12 months will no doubt contain many surprises, twists, and turns, one thing is certain. Data will continue passing through the veins of business industries and economies.

Metadata

Metadata Data Governance Data Strategy Strategy

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake

Data Lake Data Processing Metadata Snapshot

Insights from Gartner Data & Analytics Summit Orlando 2023

Alation

MARCH 31, 2023

Ehtisham Zaidi, Gartner’s VP of data management, and Robert Thanaraj, Gartner’s director of data management, gave an update on the fabric versus mesh debate in light of what they call the “active metadata era” we’re currently in. The active metadata helix Indeed, automation was on everyone’s minds. We couldn’t agree more.

Data Analytics

Data Analytics Analytics Metadata Data Governance

Preprocess and fine-tune LLMs quickly and cost-effectively using Amazon EMR Serverless and Amazon SageMaker

AWS Big Data

FEBRUARY 1, 2024

The Common Crawl corpus contains petabytes of data, regularly collected since 2008, and contains raw webpage data, metadata extracts, and text extracts. Common Crawl data The Common Crawl raw dataset includes three types of data files: raw webpage data (WARC), metadata (WAT), and text extraction (WET).

Metadata

Metadata Modeling Data Processing Unstructured Data

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

In 2023, Onehouse announced an initiative to provide interoperability across table formats. Initially called Onetable, the project became Apache XTable in September 2024 and provides a lightweight translation layer to translate metadata between table formats without the need to duplicate or modify the data.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Get started managing partitions for Amazon S3 tables backed by the AWS Glue Data Catalog

AWS Big Data

JUNE 22, 2023

Files corresponding to a single day’s worth of data are placed under a prefix such as s3://my_bucket/logs/year=2023/month=06/day=01/. If the partition isn’t loaded into a partitioned table, when the application downloads the partition metadata, the application will not be aware of the S3 path that needs to be queried.

Metadata

Metadata Management Recreation/Entertainment Optimization

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data Virtualization

MARCH 28, 2024

As noted in the Gartner Hype Cycle for Finance Data and Analytics Governance, 2023, “Through. The post My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Finance

Finance Digital Transformation Analytics Data Integration

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Alation

MAY 31, 2023

DGIQ is June 5-9, 2023, at the Catamaran Resort Hotel and Spa in San Diego, just steps away from the Mission Bay beach. He’ll share how “metadata normalization” played a key role in the journey to automation, the steps required to automate data governance processes, and why a data catalog was critical to the project’s success.

Data Governance

Data Governance Insurance Metadata Data-driven

Copyright, AI, and Provenance

O'Reilly on Data

DECEMBER 12, 2023

Google, which invented Transformers, knows better than anyone that Transformer-based models destroy metadata, unless you do a lot of special engineering. We can’t say for certain that it was implemented with RAG, but it clearly follows the pattern. But Google has the best search engine in the world.

Modeling

Modeling Sales Software Statistics

Collibra Provides a Platform for Data Intelligence

David Menninger's Analyst Perspectives

OCTOBER 8, 2024

I assert that through 2027, three-quarters of enterprises will be engaged in data intelligence initiatives to increase trust in their data by leveraging metadata to understand how, when and where data is used in their organization, and by whom. Collibra also announced the acquisition of Husprey in 2023 for its SQL data notebook functionality.

Data Quality

Data Quality Data Governance Enterprise Visualization

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Denodo remains a specialist data management software provider and in September 2023 announced that it had received a $336 million investment from asset management firm TPG.

Management

Management Data-driven Data Governance Data Lake

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

In 2023, AWS announced general availability for Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake in Amazon Athena for Apache Spark , which removes the need to install a separate connector or associated dependencies and manage versions, and simplifies the configuration steps required to use these frameworks.

Snapshot

Snapshot Data Lake Metadata Optimization

How Far We Can Go with GenAI as an Information Extraction Tool

Ontotext

JANUARY 10, 2025

You can use the Ontotext Metadata Studio (OMDS) to integrate any NER model and apply it to your documents to extract the entities you are interested in. There is no silver bullet: LLMs still need human validation and there does not seem to be one best model as weve seen that Llama-70b and GPT-4o perform differently on different tasks.

Informatics

Informatics Modeling Metadata Experimentation

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

Iceberg tables store metadata in manifest files. As the number of data files increase, the amount of metadata stored in these manifest files also increases, leading to longer query planning time. The query runtime also increases because it’s proportional to the number of data or metadata file read operations.

Optimization

Optimization Snapshot Data Lake Metadata

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

RIO is really great",date("2023-04-06"),2023)""") You can check the new snapshot is created after this append operation by querying the Iceberg snapshot: spark.sql("""SELECT * FROM dev.db.amazon_reviews_iceberg.snapshots""").show() The metadata file location can be fetched from the metadata log entries metatable as illustrated earlier.

Data Lake

Data Lake Snapshot Metadata Optimization

What enterprise software vendors are doing with generative AI

CIO Business Intelligence

AUGUST 15, 2023

2023 has been a break-out year for generative AI technology, as tools such as ChatGPT graduated from lab curiosity to household name. July 2023 Microsoft adds Copilot abilities to Dynamics 365 suite Microsoft will roll out its Copilot generative AI assistant across more of its products.

Software

Software Enterprise Sales Visualization

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We’re excited to share that Gartner has recognized Cloudera as a Visionary among all vendors evaluated in the 2023 Gartner® Magic Quadrant for Cloud Database Management Systems. Download the complimentary 2023 Gartner Magic Quadrant for Cloud Database Management Systems report.

Unstructured Data

Unstructured Data Cost-Benefit Metadata Machine Learning

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

Cloudera

JANUARY 5, 2023

As more industries mature digitally and widely adopt AI and machine learning technologies, 2023 will be a pivotal year for organizations looking to deploy emerging tech solutions company-wide to fulfill business objectives. These features provide businesses with a common metadata, security, and governance model across all their data.

Cost-Benefit

Cost-Benefit Business Objectives Machine Learning Data Architecture

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Currently, we have approximately 120,000 employees worldwide (as of March 2023), including group companies. As of November 2023, more than 200 projects and 37,000 users were onboarded. Provide and keep up to date with technical metadata for loaded data. Fujitsu Limited was established in Japan in 1935.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

In this post, which is a matured version of my opening keynote at Ontotext’s Knowledge Graph Forum 2023 , I will start with evidence about the impact of complexity on the growth and efficiency of big enterprises. In both cases, semantic metadata is the glue that turns knowledge graphs into hubs of data, metadata, and content.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

AWS Big Data

OCTOBER 24, 2024

Complete the following steps to set up an EC2 instance for installing Jenkins: Launch an EC2 instance with the latest Amazon Linux 2023 AMI. Launch an EC2 instance Note : Make sure to deploy the EC2 instance for hosting Jenkins in the same VPC as the OpenSearch domain.

Visualization

Visualization Management Data Processing Testing

How AI can deliver eye-opening insights for IT

CIO Business Intelligence

SEPTEMBER 26, 2023

But even as we remember 2023 as the year when generative AI went ballistic, AI and its ML (machine learning) sidekick have been quietly evolving over several years to yield eye-opening insights and problem-solving productivity for IT organizations. And rightly so.

IT

IT Key Performance Indicator Software Metadata

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

However, even the most powerful systems can experience performance degradation if they encounter anti-patterns like grossly inaccurate table statistics, such as the row count metadata. This can have a significant impact on overall query performance.

Data Lake

Data Lake Statistics Broadcasting Optimization

It’s 2025. Are your data strategies strong enough to de-risk AI adoption?

Build a high-performance quant research platform with Apache Iceberg

Webinars

Trending Sources

SAP Datasphere Powers Business at the Speed of Data

Webinars

Enterprises can gain an edge with Metadata Management

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Lake Formation 2023 year in review

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Amazon OpenSearch Service search enhancements: 2023 roundup

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Top Opportunities for SAP Partners in 2023

Amazon OpenSearch Service H1 2023 in review

Can LLMs Become Knowledgeable – Impressions from Day 1 At SEMANTiCS 2023

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Our Top Data and Analytics Predicts for 2021

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Maximize your data dividends with active metadata

Do Large Language Models Dream of Knowledge Graphs – Impressions from Day 2 At SEMANTiCS 2023

Migrate an existing data lake to a transactional data lake using Apache Iceberg

A Data Prediction for 2025

5 Alation Customers Sharing Data Successes at Snowflake Summit 2023

Data Trends to Watch in 2023

Use Apache Iceberg in a data lake to support incremental data processing

Insights from Gartner Data & Analytics Summit Orlando 2023

Preprocess and fine-tune LLMs quickly and cost-effectively using Amazon EMR Serverless and Amazon SageMaker

The Increasing Importance of Open Table Formats

Get started managing partitions for Amazon S3 tables backed by the AWS Glue Data Catalog

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Copyright, AI, and Provenance

Collibra Provides a Platform for Data Intelligence

Denodo Provides a Logical Approach to Data Management

Use Amazon Athena with Spark SQL for your open-source transactional table formats

How Far We Can Go with GenAI as an Information Extraction Tool

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

What enterprise software vendors are doing with generative AI

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

2023 Predictions: Data Trends That Will Dominate Business Agenda in APAC

How Fujitsu implemented a global data mesh architecture and democratized data

You Cannot Get to the Moon on a Bike!

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

How AI can deliver eye-opening insights for IT

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Stay Connected