2023, Data Lake and Optimization

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

These include architectural optimizations to reduce memory usage and query times with more efficient batch processing to deliver better throughput, faster bulk writes and accelerated concurrent writes during data replication. also extends MongoDBs Queryable Encryption capability, which was introduced in 2023.

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. In 2023, we released several updates to AWS Glue crawlers. Crawlers, salut!

Data Lake

Data Lake Metadata Data Governance Statistics

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. AWS Glue 3.0 The following diagram illustrates the solution architecture.

Data Lake

Data Lake Data Processing Metadata Snapshot

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

AWS Big Data

JULY 31, 2024

In the current industry landscape, data lakes have become a cornerstone of modern data architecture, serving as repositories for vast amounts of structured and unstructured data. Maintaining data consistency and integrity across distributed data lakes is crucial for decision-making and analytics.

Data Lake

Data Lake Marketing Data Processing Management

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

In our previous post Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes , we discussed how you can implement solutions to improve operational efficiencies of your Amazon Simple Storage Service (Amazon S3) data lake that is using the Apache Iceberg open table format and running on the Amazon EMR big data platform.

Optimization

Optimization Snapshot Data Lake Metadata

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2023 AWS Analytics Superheroes We are excited to introduce the 2023 AWS Analytics Superheroes at this year’s re:Invent conference! A shapeshifting guardian and protector of data like Data Lynx? 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. This property is set to true by default. availability.

Data Lake

Data Lake Snapshot Metadata Optimization

AWS re:Invent 2023 Amazon Redshift Sessions Recap

AWS Big Data

DECEMBER 18, 2023

Get a closer look at how scaling for data warehousing works in AWS with the latest introduction of AI driven scaling and optimizations in Amazon Redshift Serverless to enable better price-performance for your workloads.

Data Warehouse

Data Warehouse Machine Learning Data-driven Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

These announcements drive forward the AWS Zero-ETL vision to unify all your data, enabling you to better maximize the value of your data with comprehensive analytics and ML capabilities, and innovate faster with secure data collaboration within and across organizations.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

Enterprise use of AI tools will only grow, with industries like manufacturing leading the charge Our research shows that mirroring the broader AI trend, enterprises across industry verticals sharply increased their use of AI from May 2023 to June 2023, with sustained growth through August 2023.

Enterprise

Enterprise Manufacturing Risk Data-driven

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. High-quality data is not just about accuracy; it’s also about timeliness.

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Top Opportunities for SAP Partners in 2023

Timo Elliott

NOVEMBER 30, 2022

My role was to talk about the trends and opportunities for 2023, for customers, SAP, and our partners. And it’s not just a technology vision — it’s also about how organizations have to rethink how they optimize business processes, business capabilities, and the business ecosystem. Business Process Optimization.

Recreation/Entertainment

Recreation/Entertainment Metadata Data Warehouse Cost-Benefit

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. With Amazon Q, you can spend less time worrying about the nuances of SQL syntax and optimizations, allowing you to concentrate your efforts on extracting invaluable business insights from your data.

Metadata

Metadata Sales Data Warehouse Optimization

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

5 Key Takeaways from Flink Forward 2023

Cloudera

NOVEMBER 27, 2023

Earlier this month (November 6 through 8, 2023) a few hundred Apache Flink enthusiasts descended upon a Hyatt Regency Lake near Seattle for the annual Flink Forward conference. This will help accelerate deployment across environments and to optimize performance and resource utilization on an ongoing basis. Takeaway No.

Advertising

Advertising Data Lake Data Warehouse ROI

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Your applications can seamlessly read from and write to your Amazon Redshift data warehouse while maintaining optimal performance and transactional consistency. Additionally, you’ll benefit from performance improvements through pushdown optimizations, further enhancing the efficiency of your operations. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. The output will give a count of the number of data and metadata files deleted.

Snapshot

Snapshot Data Lake Metadata Optimization

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Wolverine hits pause for cloud success

CIO Business Intelligence

JULY 8, 2022

To optimize business on its re-opening, Wolverine IT built supply chain data models using Microsoft Power BI to prioritize which brands it should manufacture first once factories resumed operation. Wolverine relies on seven data centers, two of which are run by third-party partners. We are not currently doing that.”.

Data Lake

Data Lake Manufacturing Machine Learning Digital Transformation

Denodo Provides a Logical Approach to Data Management

David Menninger's Analyst Perspectives

OCTOBER 24, 2024

Data silos are a perennial data management problem for enterprises, with almost three-quarters (73%) of participants in ISG Research’s Data Governance Benchmark Research citing disparate data sources and systems as a data governance challenge.

Management

Management Data-driven Data Governance Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. The end benefit for you is more effective and optimized AWS Glue for Apache Spark workloads. The metrics are available in all AWS Glue supported Regions. Check it out!

Metrics

Metrics Data Lake Cost-Benefit Dashboards

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

If anything, 2023 has proved to be a year of reckoning for businesses, and IT leaders in particular, as they attempt to come to grips with the disruptive potential of this technology — just as debates over the best path forward for AI have accelerated and regulatory uncertainty has cast a longer shadow over its outlook in the wake of these events.

Risk

Risk Manufacturing Enterprise Technology

Porsche Carrera Cup Brasil gets real-time data boost

CIO Business Intelligence

MAY 21, 2024

In the annual Porsche Carrera Cup Brasil, data is essential to keep drivers safe and sustain optimal performance of race cars. Until recently, getting at and analyzing that essential data was a laborious affair that could take hours, and only once the race was over. The process took between 30 minutes and two hours. “I

Broadcasting

Broadcasting Recreation/Entertainment Manufacturing Data Lake

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

With a few taps on a mobile device, riders request a ride; then, Uber’s algorithms work to match them with the nearest available driver and calculate the optimal price. Uber’s prowess as a transportation, logistics and analytics company hinges on their ability to leverage data effectively. But the simplicity ends there.

OLAP

OLAP Data Lake Data-driven Online Analytical Processing

Steps Gerresheimer takes to transform its IT

CIO Business Intelligence

NOVEMBER 29, 2023

By mid-2023, Walldorf-based Gerresheimer had its IT strategy revised, and a central component of this was its cloud journey, for which CIO Zafer Nalbant and his team built a hybrid environment consisting of a public cloud part based on Microsoft Azure, and a private cloud part that runs in a data center completely managed by T-Systems.

IT

IT Data Lake Strategy IoT

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

Currently, we have approximately 120,000 employees worldwide (as of March 2023), including group companies. To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting

AWS Big Data

NOVEMBER 20, 2023

Customers often use Apache Spark Web UI , a popular debugging tool that is part of open source Apache Spark, to help fix problems and optimize job performance. Once logs are parsed, you can When logs are parsed, you can use the built-in Spark UI to debug, troubleshoot, and optimize your jobs. Now it’s time to run the job!

Visualization

Visualization Optimization Data Lake Management

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

Tens of thousands of customers use Amazon Redshift to gain business insights from their data. With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. _cdc_unit" t2 WHERE t2.deletexid_ _cdc_unit" t2 WHERE t2.deletexid_

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Analyzing historical patterns allows you to optimize performance, identify issues proactively, and improve planning. Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. We walk through ingesting CloudWatch metrics into QuickSight using a CloudWatch metric stream and QuickSight SPICE.

Metrics

Metrics Visualization Dashboards Publishing

Wonderla Holidays goes digital to enhance business and customer fun

CIO Business Intelligence

OCTOBER 18, 2022

One pulse sends 150 bytes of data. So, each band can send out 500KB to 750KB of data. To handle the huge volume of data thus generated, the company is in the process of deploying a data lake, data warehouse, and real-time analytical tools in a hybrid model. One band delivers multiple business benefits.

Data Lake

Data Lake Data Warehouse Cost-Benefit Digital Transformation

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Improved Decision Making : Well-modeled data provides insights that drive informed decision-making across various business domains, resulting in enhanced strategic planning. Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies.

Data-driven

Data-driven Modeling Enterprise Structured Data

You’ve Got the Data. Why Can’t Your Developers Build with it?

CIO Business Intelligence

MAY 27, 2022

Developers who build the real-time experiences that customers love need a tech stack that lets them access data quickly and easily. IDC predicts that by 2023 there will be more than 500 million new cloud native digital apps and services – more than the total created over the past 40 years. Converge data at rest and data in motion.

IT

IT Data-driven Data Lake Enterprise

Approaches to migrating your VMware workloads to AWS

IBM Big Data Hub

JUNE 11, 2024

In November 2023, Broadcom finalized its acquisition (link resides outside ibm.com) of VMware for USD 69 billion, with an aim to enhance its multicloud strategy. IBM offers alternative solution options with optimal scalability, from scale-up to scale-out, enabling VMware transition and helping move to AWS.

Consulting

Consulting Optimization Manufacturing Management

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Optimized for all data, analytics and AI workloads, watsonx.data combines the flexibility of a data lake with the performance of a data warehouse, helping businesses to scale data analytics and AI anywhere their data resides.

Data Warehouse

Data Warehouse Machine Learning Cost-Benefit Metadata

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Microsoft itself claims half of Fortune 500 companies use its Copilot tools and the number of daily users doubled in Q4 2023, although without saying how widely they’re deployed in those organizations. Organizations typically start with the most capable model for their workload, then optimize for speed and cost.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Queries containing joins, filters, projections, group-by, or aggregations without group-by can be transparently rewritten by the Hive optimizer to use one or more eligible materialized views. Materialized views can be partitioned on one or more columns. This can potentially lead to orders of magnitude improvement in performance.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

AWS Big Data

OCTOBER 17, 2024

You can then run enhanced analysis on this DynamoDB data with the rich capabilities of Amazon Redshift, such as high-performance SQL, built-in machine learning (ML) and Spark integrations, materialized views (MV) with automatic and incremental refresh, data sharing, and the ability to join data across multiple data stores and data lakes.

Metrics

Metrics Dashboards Data Warehouse Statistics

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

AWS Big Data

JUNE 6, 2023

You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. Hundreds of thousands of customers use data lakes for analytics and ML to make data-driven business decisions.

Data Quality

Data Quality Data-driven Data Lake Metrics

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. Whether they need to query data from today or from years past, the system scales up or down to meet their needs.

Analytics

Analytics Metadata Snapshot Data-driven

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

This involves unifying and sharing a single copy of data and metadata across IBM® watsonx.data ™, IBM® Db2 ®, IBM® Db2® Warehouse and IBM® Netezza ®, using native integrations and supporting open formats, all without the need for migration or recataloging. With Netezza support for 1.2

Cost-Benefit

Cost-Benefit Metadata Optimization Management

MongoDB Enhances Developer Data Platform

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Webinars

Trending Sources

AWS Lake Formation 2023 year in review

Webinars

Use Apache Iceberg in a data lake to support incremental data processing

Build a high-performance quant research platform with Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and Amazon MSK

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Your guide to AWS Analytics at AWS re:Invent 2023

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Top Opportunities for SAP Partners in 2023

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

5 Key Takeaways from Flink Forward 2023

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Wolverine hits pause for cloud success

Denodo Provides a Logical Approach to Data Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

CIOs press ahead for gen AI edge — despite misgivings

Porsche Carrera Cup Brasil gets real-time data boost

Unleashing the power of Presto: The Uber case study

Steps Gerresheimer takes to transform its IT

How Fujitsu implemented a global data mesh architecture and democratized data

Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Wonderla Holidays goes digital to enhance business and customer fun

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

You’ve Got the Data. Why Can’t Your Developers Build with it?

Approaches to migrating your VMware workloads to AWS

Introducing watsonx: The future of AI for business

Why enterprise CIOs need to plan for Microsoft gen AI

Materialized Views in Hive for Iceberg Table Format

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Empower Your Cyber Defenders with Real-Time Analytics

Tackling AI’s data challenges with IBM databases on AWS

Stay Connected