2023, Blog and Snapshot - Data Leaders Brief

Chart Snapshot: Barcode Plot

The Data Visualisation Catalogue

MARCH 28, 2024

— VizWiz ‘Avengers’ characters’ appearances over time How the ‘Avengers’ Line-up Has Changed Over the Years — Wall Street Journal Multiple Income Households Flowingdata / Nathan Yau The Corruption Perceptions Index 2023 Week 35 | Power BI: Create a Faceted Instance Chart — Workout Wednesday / Meagan Longoria The post Chart (..)

Snapshot

Snapshot Visualization

Amazon OpenSearch Service H1 2023 in review

AWS Big Data

AUGUST 23, 2023

Since its release in January 2021, the OpenSearch project has released 14 versions through June 2023. In this post, we provide a review of all the exciting features releases in OpenSearch Service in the first half of 2023. In July 2023, we previewed support for a third collection type: vector search.

Snapshot

Snapshot Dashboards Visualization Metrics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

Update your-iceberg-storage-blog in the following configuration with the bucket that you created to test this example. S3FileIO", "spark.sql.catalog.dev.warehouse":"s3://&amp;lt;your-iceberg-storage-blog&amp;gt;/iceberg/", "spark.sql.catalog.dev.s3.write.tags.write-tag-name":"created", write.tags.write-tag-name and s3.delete.tags.delete-tag-name

Data Lake

Data Lake Snapshot Metadata Optimization

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

An in-place migration can be performed in either of two ways: Using add_files : This procedure adds existing data files to an existing Iceberg table with a new snapshot that includes the files. Unlike migrate or snapshot, add_files can import files from a specific partition or partitions and doesn’t create a new Iceberg table.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Whenever there is an update to the Iceberg table, a new snapshot of the table is created, and the metadata pointer points to the current table metadata file. At the top of the hierarchy is the metadata file, which stores information about the table’s schema, partition information, and snapshots. Choose Advanced options.

Data Lake

Data Lake Data Processing Metadata Snapshot

Smarten Augmented Analytics is Named as a Representative Vendor in Gartner® 2023 ‘Market Guide for Augmented Analytics, Published October, 2023!

Smarten

NOVEMBER 23, 2023

Smarten is pleased to announce that its Smarten Augmented Analytics solution is included as a Representative Vendor in the Market Guide for Augmented Analytics Published October 2, 2023 (ID G00780764). The Smarten Cloud Software-as-a-Service offering includes all of these features and is available for free evaluation.

Publishing

Publishing Marketing Analytics Predictive Modeling

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

To activate the automatic compaction process, add a new record to the existing Iceberg table using a Spark insert: spark.sql(""" Insert into dev.db.sensor_data_iceberg_format values(999123, 86, 'PASS', timestamp'2023-07-26 12:50:25') """) Navigate to the Amazon EMR console to check the cluster steps. impl":"org.apache.iceberg.aws.s3.S3FileIO",

Optimization

Optimization Snapshot Data Lake Metadata

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Overview This blog post describes support for materialized views for the Iceberg table format. Create Iceberg materialized view For the examples in this blog, we will use three tables from the TPC-DS dataset as our base tables: store_sales, customer and date_dim. Both full and incremental rebuild of the materialized view are supported.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

For this blog our “primary” workgroup is using Athena engine version 3. Create an S3 bucket to store the table data We create a new S3 bucket to save the data for the table: On the Amazon S3 console, create an S3 bucket with unique name (for this post, we use iceberg-athena-lakeformation-blog ). Choose Save.

Interactive

Interactive Snapshot Data Lake Software

Maximize the power of your lines of defense against cyber-attacks with IBM Storage FlashSystem and IBM Storage Defender

IBM Big Data Hub

APRIL 15, 2024

In 2023, the FBI received a record number of 880,418 complaints with potential losses exceeding USD 12.5 When a cyberattack strikes, the ransomware code gathers information about target networks and key resources such as databases, critical files, snapshots and backups. Today, cybercrime is good business.

Snapshot

Snapshot Machine Learning Interactive Statistics

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

This post is designed to be implemented for a real customer use case, where you get full snapshot data on a daily basis. Run the AWS Glue job Confirm if you see the employee dataset in the path s3://scd-blog-landing/dataset/employee/. You can download the dataset and open it in a code editor such as VS Code.

Data Lake

Data Lake Testing Snapshot Big Data

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success. They ingest data in snapshots from operational systems. The post Unleashing the power of Presto: The Uber case study appeared first on IBM Blog.

OLAP

OLAP Data Lake Data-driven Online Analytical Processing

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

AWS Big Data

AUGUST 6, 2024

In 2023, AWS announced the upcoming deprecation of Data Pipeline , one of the core services used by Langley. However, it wasn’t created with multi-tenancy in mind and therefore it didn’t provide the robustness and the appropriate level of isolation to guard each tenant from impacting others on the shared platform.

Cost-Benefit

Cost-Benefit Snapshot Metadata Metrics

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. The post Empower Your Cyber Defenders with Real-Time Analytics appeared first on Cloudera Blog.

Analytics

Analytics Metadata Snapshot Data-driven

Unlock insights on Amazon RDS for MySQL data with zero-ETL integration to Amazon Redshift

AWS Big Data

MARCH 21, 2024

Amazon Relational Database Service (Amazon RDS) for MySQL zero-ETL integration with Amazon Redshift was announced in preview at AWS re:Invent 2023 for Amazon RDS for MySQL version 8.0.28 The following is an example command: aws s3 cp 's3://redshift-blogs/zero-etl-integration/data/tickit'.

Data Warehouse

Data Warehouse Metrics Statistics Optimization

IBM’s enduring commitment to environmental leadership

IBM Big Data Hub

APRIL 11, 2023

Here is a snapshot of some current results: We continued making progress towards our goal of net-zero operational greenhouse gas (GHG) emissions by 2030, underscored by energy conservation; use of renewable energy; and GHG emissions reduction. Also through year-end 2021, we reduced operational GHG emissions by 61.6%

Snapshot

Snapshot Reporting Business Objectives Software

Power your cybersecurity strategy with an integrated data security framework

Laminar Security

NOVEMBER 9, 2023

Malicious actors came out swinging at the start of 2023, and they aren’t slowing down any time soon. Efficiently identifying the most recent clean snapshot (the point just before the malware intrusion and data compromise). Data breaches increased by 156% between Q1 and Q2 alone. Take MGM Resorts and Caesars Entertainment as examples.

Strategy

Strategy Risk Testing Recreation/Entertainment

Laminar Scales Enterprise Data Security Platform With New Management Features

Laminar Security

APRIL 18, 2023

According to Laminar research, more than 75% of organizations experienced a cloud data breach in 2023, which speaks for itself. Yet, managing this diverse environment creates challenges for the security, privacy and governance teams charged with protecting data. Unfortunately, the evidence shows we’re not doing a good job!

Enterprise

Enterprise Management Dashboards Snapshot

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. Today, cyber defenders face an unprecedented set of challenges as they work to secure and protect their organizations.

Analytics

Analytics Metadata Snapshot Data-driven

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

AWS Big Data

DECEMBER 19, 2024

Although this provides immediate consistency and simplifies reads (because readers only access the latest snapshot of the data), it can become costly and slow for write-heavy workloads due to the need for frequent rewrites. xlarge using Amazon Linux 2023 running on one of those private subnets where you will launch the data simulator.

Data Lake

Data Lake IoT Metadata Testing

Talk to Your Graph Client for GraphDB

Ontotext

JANUARY 16, 2025

The first version of Talk to Your Graph (or TTYG for short) was released in 2023 and it was my baby. Introduction Since I became the product manager of GraphDB , I was expected to stop writing code but I couldnt help it. Its certainly unorthodox but I strongly believe this makes the product better.

Metadata

Metadata Modeling Snapshot Interactive

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

DataKitchen

AUGUST 8, 2023

On 20 July 2023, Gartner released the article “ Innovation Insight: Data Observability Enables Proactive Data Quality ” by Melody Chien. Data Lineage, a form of static analysis , is like a snapshot or a historical record describing data assets at a specific time.

Data Quality

Data Quality Testing Snapshot Reporting

Data Leaders Brief

Chart Snapshot: Barcode Plot

Amazon OpenSearch Service H1 2023 in review

Webinars

Trending Sources

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Smarten Augmented Analytics is Named as a Representative Vendor in Gartner® 2023 ‘Market Guide for Augmented Analytics, Published October, 2023!

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Materialized Views in Hive for Iceberg Table Format

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Maximize the power of your lines of defense against cyber-attacks with IBM Storage FlashSystem and IBM Storage Defender

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Unleashing the power of Presto: The Uber case study

How Amazon GTTS runs large-scale ETL jobs on AWS using Amazon MWAA

Empower Your Cyber Defenders with Real-Time Analytics

Unlock insights on Amazon RDS for MySQL data with zero-ETL integration to Amazon Redshift

IBM’s enduring commitment to environmental leadership

Power your cybersecurity strategy with an integrated data security framework

Laminar Scales Enterprise Data Security Platform With New Management Features

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

Talk to Your Graph Client for GraphDB

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

Stay Connected