article thumbnail

Themes and Conferences per Pacoid, Episode 11

Domino Data Lab

In other words, using metadata about data science work to generate code. One of the longer-term trends that we’re seeing with Airflow , and so on, is to externalize graph-based metadata and leverage it beyond the lifecycle of a single SQL query, making our workflows smarter and more robust. BTW, videos for Rev2 are up: [link].

Metadata 105
article thumbnail

Through the Looking Glass: Data Owners and Other Fallacies

TDAN

I vividly remember reading this passage from Bob Seiner’s TDAN.com article “Things I Think I Think about Data Governance”, from August 1, 2015: If we were going to remove two words from the Data Governance vocabulary, I would choose the words “assign” and “owner. When someone is designated as the “owner” of data, that implies […].

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

Hewlett-Packard acquired Aruba Networks in 2015, making it a wireless networking subsidiary with a wide range of next-generation network access solutions. Each file arrives as a pair with a tail metadata file in CSV format containing the size and name of the file. To achieve this, Aruba used Amazon S3 Event Notifications.

article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

More data files leads to more metadata stored in manifest files, and small data files often cause an unnecessary amount of metadata, resulting in less efficient queries and higher Amazon S3 access costs. The output will give a count of the number of data and metadata files deleted. resource('s3') bucket = s3.Bucket('

Snapshot 126
article thumbnail

Illuminating the black box: why CIOs should consider publishing an annual IT report

CIO Business Intelligence

By 2015, the technical executives of at least one conglomerate, Intel, had figured they could enrich the firm’s perception of IT by showcasing how essentially that function contributes to business value. And don’t just rattle off project metadata. Such a report has a legacy already, if only a short one. What pains did it alleviate?

article thumbnail

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

Starting today, the Athena SQL engine uses a cost-based optimizer (CBO), a new feature that uses table and column statistics stored in the AWS Glue Data Catalog as part of the table’s metadata. By using these statistics, CBO improves query run plans and boosts the performance of queries run in Athena. Pathik Shah is a Sr.

article thumbnail

10 Years Later: Who’s the GOAT of Data Catalogs?

Alation

January 2015: Alation acquires its first customer. March 2015: Alation emerges from stealth mode to launch the first official data catalog to empower people in enterprises to easily find, understand, govern and use data for informed decision making that supports the business. June 2017: Yahoo Japan Corp.