2022, Data Lake and Optimization

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. In early 2022, AWS announced general availability of Athena ACID transactions, powered by Apache Iceberg. and later supports the Apache Iceberg framework for data lakes.

Data Lake

Data Lake Data Processing Metadata Snapshot

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

Here, CIO Patrick Piccininno provides a roadmap of his journey from data with no integration to meaningful dashboards, insights, and a data literate culture. You ’re building an enterprise data platform for the first time in Sevita’s history. Once they were identified, we had to determine we had the right data.

Enterprise

Enterprise Dashboards KPI Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable).

Data Lake

Data Lake Snapshot Optimization Data Transformation

What I Learned At Gartner Data & Analytics 2022

Timo Elliott

MAY 27, 2022

But Gartner is calling for something more sophisticated — for example, what they call Decision Intelligence , where you go beyond just providing information, and actually help reengineer and optimize decision processes. They say you need data artists that create great questions to complement the data scientists that find great answers.

Data Analytics

Data Analytics Analytics Recreation/Entertainment Data Lake

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

The following are the recommended best practices when working with files using the auto-copy job: Use unique file names for each file in a auto-copy job (for example, 2022-10-15-batch-1.csv He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture. Do not overwrite existing files.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Leadership in 2022: Focus on Empathy

Cloudera

FEBRUARY 18, 2022

Empathy stands out as a core skill that must be alive and nurtured within our teams if we are to achieve our desired outcomes in 2022 and beyond. For example, data is helping both Cloudera and our customers to create better, healthier, and more open relationships with employees. . At Cloudera we operate according to core values.

Uncertainty

Uncertainty Data Lake Dashboards Optimization

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Capital One Offers Cost Controls for Cloud Data Warehouses

David Menninger's Analyst Perspectives

NOVEMBER 7, 2024

The adoption of cloud environments for analytic workloads has been a key feature of the data platforms sector in recent years. For two-thirds (66%) of participants in ISG’s Data Lake Dynamic Insights Research, the primary data platform used for analytics is cloud based.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Software

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner ® Magic Quadrant for Cloud Database Management Systems. Notably, these same services simplify repatriating data workloads back to private clouds, to save on cloud infrastructure expenses. 2-A truly open data lakehouse.

Management

Management Metadata Machine Learning Data Lake

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Backtesting is a process used in quantitative finance to evaluate trading strategies using historical data. This helps traders determine the potential profitability of a strategy and identify any risks associated with it, enabling them to optimize it for better performance.

Snapshot

Snapshot Data Lake Testing Strategy

AWS Lake Formation 2023 year in review

AWS Big Data

JANUARY 18, 2024

AWS Lake Formation and the AWS Glue Data Catalog form an integral part of a data governance solution for data lakes built on Amazon Simple Storage Service (Amazon S3) with multiple AWS analytics services integrating with them. In 2022 , we talked about the enhancements we had done to these services.

Data Lake

Data Lake Metadata Data Governance Statistics

Chipotle’s recipe for digital transformation: Cloud plus AI

CIO Business Intelligence

OCTOBER 21, 2022

Chipotle’s digital business in 2022 was $3.5 Chipotle IT’s secret sauce Garner credits Chipotle’s wholly owned business model for enabling him to deploy advanced technologies such as the cloud, analytics, data lake, and AI uniformly to all restaurants because they are all based on the same digital backbone.

Digital Transformation

Digital Transformation Data Lake Forecasting Technology

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

At AWS re:Invent 2022, Amazon Athena launched support for Apache Spark. Before you run these workloads, most customers run SQL queries to interactively extract, filter, join, and aggregate data into a shape that can be used for decision-making, model training, or inference. An Athena Spark workgroup configured for use.

Data Lake

Data Lake Visualization Optimization Interactive

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

In summer 2022, P&G sealed a multiyear partnership with Microsoft to transform P&G’s digital manufacturing platform. Cretella says P&G will make manufacturing smarter by enabling scalable predictive quality, predictive maintenance, controlled release, touchless operations, and manufacturing sustainability optimization.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Real estate CIOs drive deals with data

CIO Business Intelligence

JULY 26, 2023

“The only thing we have on premise, I believe, is a data server with a bunch of unstructured data on it for our legal team,” says Grady Ligon, who was named Re/Max’s first CIO in October 2022. billion in 2022, resource industries $82.1 billion in 2022, and personal and consumer services at $82.6 billion in 2022.

Data Lake

Data Lake Digital Transformation Machine Learning Data Architecture

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures. Are data architects in demand?

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Data-Driven Everything engagement Altron has provided information technology services since 1965 across South Africa, the Middle East, and Australia. Foundations for a data lake with data governance controls and data quality checks. A set of QuickSight dashboards to be consumed via browser and mobile.

Optimization

Optimization B2B Data Quality Sales

Make SASE your cybersecurity armor – but don’t go it alone

CIO Business Intelligence

SEPTEMBER 7, 2023

Nearly 95% of organizations say hybrid work has led them to invest more in data protection and security, according to NTT’s 2022–23 Global Network Report. You can use AI and machine learning across security, networking and user experience management, all in the same data lake. The solution?

IT

IT Data Lake Cost-Benefit Digital Transformation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Why does AI need an open data lakehouse architecture? from 2022 to 2026. Another IDC study showed that while 2/3 of respondents reported using AI-driven data analytics, most reported that less than half of the data under management is available for this type of analytics.

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

5 ways to maximize your cloud investment

CIO Business Intelligence

JANUARY 10, 2024

Optimizing cloud investments requires close collaboration with the rest of the business to understand current and future needs, building effective FinOps teams, partnering with providers, and ongoing monitoring of key performance metrics. You worry you don’t have enough capacity, so you overprovision,” he says.

Cost-Benefit

Cost-Benefit Measurement Optimization Metrics

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market. The platform consists of approximately 370 dashboards, 360 tables registered in the data catalog, and 40 linked systems.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

Built on highly curated structured data, it provides the flexibility and speed to run aggregations across an entire dataset to derive insights. To house our data, we need to define a data model. An optimal design choice is to use a dimensional model. This is achieved by partitioning the data.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Does Cost Reduction Play a Role in Digital Transformation?

Cloudera

OCTOBER 6, 2022

Gartner : “Digital transformation can refer to anything from IT modernization (for example, cloud computing), to digital optimization, to the invention of new digital business models.”. For example, we have some customers using their data platform originally established for compliance initiatives to drive new use cases.

Digital Transformation

Digital Transformation Cost-Benefit Data Lake Machine Learning

OCBC Bank Accelerates Its Data Strategy with Cloudera

Cloudera

DECEMBER 14, 2022

OCBC Bank optimizes customer experience & risk management with multi-phased data initiative. The company recently migrated to Cloudera Data Platform (CDP ) and CDP Machine Learning to power a number of solutions that have increased operational efficiency, enabled new revenue streams and improved risk management.

Data Strategy

Data Strategy Strategy IT Contextual Data

Why Business Intelligence is Top of Mind for CFOs for 2022

Jet Global

DECEMBER 3, 2021

It is able to draw from a broader array of data stores, including traditional relational databases, robust data warehouses, and cloud-based data lakes. Discover Meaning Amid All That Data. This will ensure that you have the information you need to optimize your marketing spend. Why business intelligence ?

Business Intelligence

Business Intelligence OLAP Sales Data Warehouse

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

Most organizations understand the profound impact that data is having on modern business. In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

OpenAI’s November 2022 announcement of ChatGPT and its subsequent $10 billion in funding from Microsoft were the “shots heard ’round the world” when it comes to the promise of generative AI. in concert with Microsoft’s AI-optimized Azure platform. John Spottiswood, COO of Jerry, a Palo Alto, Calif.-based

Risk

Risk Manufacturing Enterprise Technology

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

Customers now want to migrate their Apache Hive workloads to Apache Spark in the cloud to get the benefits of optimized runtime, cost reduction through transient clusters, better scalability by decoupling the storage and compute, and flexibility. He is passionate about big data and data analytics.

Metadata

Metadata Data Lake Testing Consulting

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

This is the promise of the modern data lakehouse architecture. analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.”

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

Organizations are increasingly building low-latency, data-driven applications, automations, and intelligence from real-time data streams. Cloudera Stream Processing (CSP) enables customers to turn streams into data products by providing capabilities to analyze streaming data for complex patterns and gain actionable intel.

Data Lake

Data Lake Manufacturing Metadata Dashboards

Wonderla Holidays goes digital to enhance business and customer fun

CIO Business Intelligence

OCTOBER 18, 2022

One pulse sends 150 bytes of data. So, each band can send out 500KB to 750KB of data. To handle the huge volume of data thus generated, the company is in the process of deploying a data lake, data warehouse, and real-time analytical tools in a hybrid model. Digital Transformation, RFID

Data Lake

Data Lake Data Warehouse Cost-Benefit Digital Transformation

Get maximum value out of your cloud data warehouse with Amazon Redshift

AWS Big Data

APRIL 19, 2023

Every day, customers are challenged with how to manage their growing data volumes and operational costs to unlock the value of data for timely insights and innovation, while maintaining consistent performance. As data workloads grow, costs to scale and manage data usage with the right governance typically increase as well.

Data Warehouse

Data Warehouse Data Lake Unstructured Data Optimization

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

AWS Big Data

OCTOBER 17, 2024

You can then run enhanced analysis on this DynamoDB data with the rich capabilities of Amazon Redshift, such as high-performance SQL, built-in machine learning (ML) and Spark integrations, materialized views (MV) with automatic and incremental refresh, data sharing, and the ability to join data across multiple data stores and data lakes.

Metrics

Metrics Dashboards Data Warehouse Statistics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Choosing an open table format for your transactional data lake on AWS

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Trending Sources

Steps taken to build Sevita’s first enterprise data platform

Webinars

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

What I Learned At Gartner Data & Analytics 2022

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Leadership in 2022: Focus on Empathy

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Capital One Offers Cost Controls for Cloud Data Warehouses

The rise of the data lakehouse: A new era of data value

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Lake Formation 2023 year in review

Chipotle’s recipe for digital transformation: Cloud plus AI

The Future of the Data Lakehouse – Open

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Run Spark SQL on Amazon Athena Spark

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

The Future of the Data Lakehouse – Open

P&G turns to AI to create digital manufacturing of the future

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Real estate CIOs drive deals with data

What is a data architect? Skills, salaries, and how to become a data framework master

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Make SASE your cybersecurity armor – but don’t go it alone

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Achieve your AI goals with an open data lakehouse approach

5 ways to maximize your cloud investment

How Fujitsu implemented a global data mesh architecture and democratized data

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Does Cost Reduction Play a Role in Digital Transformation?

OCBC Bank Accelerates Its Data Strategy with Cloudera

Why Business Intelligence is Top of Mind for CFOs for 2022

Better, faster decisions: Why businesses thrive on real-time data

CIOs press ahead for gen AI edge — despite misgivings

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

The Modern Data Lakehouse: An Architectural Innovation

Turning Streams Into Data Products

Wonderla Holidays goes digital to enhance business and customer fun

Get maximum value out of your cloud data warehouse with Amazon Redshift

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected