Data Lake, Data Strategy and IT - Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

The data flywheel: A better way to think about your data strategy

CIO Business Intelligence

OCTOBER 25, 2022

Data & Analytics is delivering on its promise. Every day, it helps countless organizations do everything from measure their ESG impact to create new streams of revenue, and consequently, companies without strong data cultures or concrete plans to build one are feeling the pressure. We discourage that thinking.

Data Strategy

Data Strategy Strategy Data Lake Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

A modern data strategy redefines and enables sharing data across the enterprise and allows for both reading and writing of a singular instance of the data using an open table format. Why Cloudinary chose Apache Iceberg Apache Iceberg is a high-performance table format for huge analytic workloads.

Data Lake

Data Lake Metadata Snapshot Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Differences Between Data Lake and Data Warehouses

TDAN

SEPTEMBER 14, 2021

Data lake is a newer IT term created for a new category of data store. But just what is a data lake? According to IBM, “a data lake is a storage repository that holds an enormous amount of raw or refined data in native format until it is accessed.” That makes sense. I think the […].

Data Lake

Data Lake Data Warehouse IT Data Strategy

Steps taken to build Sevita’s first enterprise data platform

CIO Business Intelligence

OCTOBER 23, 2024

You ’re building an enterprise data platform for the first time in Sevita’s history. Our legacy architecture consisted of multiple standalone, on-prem data marts intended to integrate transactional data from roughly 30 electronic health record systems to deliver a reporting capability. What’s driving this investment?

Enterprise

Enterprise Dashboards KPI Data Lake

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

With over 10 PB of data across 1,500 data assets, 1,000 data use cases, and more than 9000 users, the BMW CDH has become a resounding success since BMW decided to build it in a strategic collaboration with Amazon Web Services (AWS) in 2020. This led to inefficiencies in data governance and access control.

Data Lake

Data Lake Sales Metadata Machine Learning

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

OCBC Bank Accelerates Its Data Strategy with Cloudera

Cloudera

DECEMBER 14, 2022

The company recently migrated to Cloudera Data Platform (CDP ) and CDP Machine Learning to power a number of solutions that have increased operational efficiency, enabled new revenue streams and improved risk management. OCBC also won a Cloudera Data Impact Award 2022 in the Transformation category for the project.

Data Strategy

Data Strategy Strategy IT Contextual Data

Deriving Value from Data Lakes with AI

Sisense

DECEMBER 23, 2019

Data is growing at a phenomenal rate and that’s not going to stop anytime soon. AI and ML are the only ways to derive value from massive data lakes, cloud-native data warehouses, and other huge stores of information. Once your data is prepared for analysis, the next question is: how else can AI help you?

Data Lake

Data Lake Machine Learning Data Warehouse Digital Transformation

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. You can now view your project’s subscribed data directly within Tableau and build dashboards. Yogesh Dhimate is a Sr.

Analytics

Analytics Visualization Data Governance Data-driven

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

Previously, Walgreens was attempting to perform that task with its data lake but faced two significant obstacles: cost and time. Those challenges are well-known to many organizations as they have sought to obtain analytical knowledge from their vast amounts of data. Lakehouses redeem the failures of some data lakes.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Data Champions: Balancing IT and Business Needs

Cloudera

SEPTEMBER 10, 2020

This is because the majority of IT departments find it near impossible to just ‘ramp up’ data use, and even more difficult to do so at scale. Data Champions find the common ground that successfully meets the requirements of both business AND IT. How do you balance the business and IT needs around data access in your organization?

IT

IT Business Objectives Digital Transformation Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Data quality is no longer a back-office concern. As a leader, your commitment to data quality sets the tone for the entire organization, inspiring others to prioritize this crucial aspect of digital transformation. However, even the most sophisticated models and platforms can be undone by a single point of failure: poor data quality.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data. You need to process this to make it ready for analysis.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

This interoperability is crucial for enabling seamless data access, reducing data silos, and fostering a more flexible and efficient data ecosystem. Delta Lake UniForm is an open table format extension designed to provide a universal data representation that can be efficiently read by different processing engines.

Metadata

Metadata Data Warehouse Big Data Data Lake

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

In this post, we walk through a high-level architecture and a specific use case that demonstrates how you can continue to scale your organization’s data platform without needing to spend large amounts of development time to address data privacy concerns. The data will be consumed by downstream analytical processes.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

What you don’t know about data management could kill your business

CIO Business Intelligence

NOVEMBER 28, 2023

IT leaders take note: At your likely current trajectory, your organization is the Titanic and its data is the iceberg. To avoid the inevitable, CIOs must get serious about data management. Data, of course, has been all the rage the past decade, having been declared the “new oil” of the digital economy.

Management

Management Data Architecture Data Lake Data Strategy

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Analytics

Analytics Data Lake Unstructured Data Enterprise

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Cloudera

MAY 9, 2023

Supporting Data Access to Achieve Data-Driven Innovation Due to the spread of COVID-19, demand for digital services has increased at SoftBank. Cloudera Data Platform (CDP) will enable SoftBank to increase resources flexibly as needed and adjust resources to meet business needs.

Data Lake

Data Lake IoT Data Governance Data-driven

Creating Data Value With a Decentralized Data Strategy

CIO Business Intelligence

APRIL 6, 2022

For decades organizations chased the Holy Grail of a centralized data warehouse/lake strategy to support business intelligence and advanced analytics. Thinking about that intelligence as having millions of loosely connected decision points at the edge requires a different strategy, and you can’t micromanage it.

Data Strategy

Data Strategy Strategy Internet of Things Data Warehouse

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

Some of the important ones for Zero Copy data sharing includes: Data sharing is supported for all provisioned RA3 instance types (ra3.16xlarge, ra3.4xlarge, and ra3.xlplus) For cross-account and cross-Region data sharing, both the producer and consumer clusters and serverless namespaces must be encrypted.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Data platform, un impulso alla customer experience e ai progetti IA

CIO Business Intelligence

JUNE 17, 2024

La data platform 100% in cloud è infatti, per Grendele, la base fondante del programma di trasformazione digitale: “Ci garantisce di poter utilizzare i dati con la frequenza e la velocità di aggiornamento necessari, a differenza di quanto accadrebbe con un data warehouse”, sottolinea la Direttrice IT.

Data Lake

Data Lake Data Warehouse Data Strategy Strategy

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value. In November 2022, Lake Formation introduced version 3 of its cross-account sharing feature.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

Digitalization is on the agenda of almost every company, and data is the foundation of digitalization. Data management is unfortunately considered to be a thankless task. The problem is that data is abstract and therefore difficult for non-experts to understand. Why is it so difficult to create added value from data?

Data-driven

Data-driven Data Strategy Strategy Enterprise

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone , a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

For those in the data world, this post provides a curated guide for all analytics sessions that you can use to quickly schedule and build your itinerary. 11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. Book your spot early for the sessions you do not want to miss.

Analytics

Analytics Data Lake Data Warehouse Data-driven

How Etihad taps data science to optimise airline operations

CIO Business Intelligence

MARCH 9, 2022

Despite the worldwide chaos, UAE national airline Etihad has managed to generate productivity gains and cost savings from insights using data science. Etihad began its data science journey with the Cloudera Data Platform and moved its data to the cloud to set up a data lake. A change was needed. Talal Mufti.

Data Science

Data Science Data Lake Cost-Benefit Digital Transformation

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

The company’s orthodontics business, for instance, makes heavy use of image processing to the point that unstructured data is growing at a pace of roughly 20% to 25% per month. For example, imaging data can be used to show patients how an aligner will change their appearance over time. “It The offensive side?

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

This unified view helps your sales, service, and marketing teams build personalized customer experiences, invoke data-driven actions and workflows, and safely drive AI across all Salesforce applications. Instead, you simply connect and use the data in place, unlocking its value immediately with on demand access to the most recent data.

Data Lake

Data Lake Analytics Data-driven Management

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

This approach comes with a heavy computational cost in terms of processing and distributing the data across multiple tables while ensuring the system is ACID-compliant at all times, which can negatively impact performance and scalability. These types of queries are suited for a data warehouse. This is called index overloading.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Building a vision for real-time artificial intelligence

CIO Business Intelligence

APRIL 12, 2023

He had been trying to gather new data insights but was frustrated at how long it was taking. Data is a key component when it comes to making accurate and timely recommendations and decisions in real time, particularly when organizations try to implement real-time artificial intelligence. Sound familiar?) It isn’t easy.

Machine Learning

Machine Learning Cost-Benefit Data-driven Strategy

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

CIO Business Intelligence

AUGUST 2, 2023

A data and analytics capability cannot emerge from an IT or business strategy alone. With both technology and business organization deeply involved in the what, why, and how of data, companies need to create cross-functional data teams to get the most out of it. That strategy is doomed to fail. What are the layers?

Manufacturing

Manufacturing Data Architecture Data Strategy Strategy

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

At a time when AI is exploding in popularity and finding its way into nearly every facet of business operations, data has arguably never been more valuable. In fact, two thirds of respondents agreed that data lakehouses were crucial to reducing pipeline complexity.

Data Architecture

Data Architecture Strategy Data Lake Data-driven

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Artificial intelligence (AI) is now at the forefront of how enterprises work with data to help reinvent operations, improve customer experiences, and maintain a competitive advantage. It’s no longer a nice-to-have, but an integral part of a successful data strategy.

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

There were thousands of attendees at the event – lining up for book signings and meetings with recruiters to fill the endless job openings for developers experienced with MapReduce and managing Big Data. This was the gold rush of the 21st century, except the gold was data. That is the key to our open data lakehouse architecture.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Why Game Studios Should Exploit Visual Analytics | BizAcuity

BizAcuity

SEPTEMBER 5, 2022

Inability to get player level data from the operators. It does not make sense for most casino suppliers to opt for integrated data solutions like data warehouses or data lakes which are expensive to build and maintain. They do not have a single view of their data which affects them. The Data Strategy.

Visualization

Visualization Analytics Data Warehouse Data Lake

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

The data flywheel: A better way to think about your data strategy

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Differences Between Data Lake and Data Warehouses

Steps taken to build Sevita’s first enterprise data platform

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

OCBC Bank Accelerates Its Data Strategy with Cloudera

Deriving Value from Data Lakes with AI

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

The rise of the data lakehouse: A new era of data value

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Data Champions: Balancing IT and Business Needs

Data’s dark secret: Why poor quality cripples AI and growth

Create an end-to-end data strategy for Customer 360 on AWS

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Top analytics announcements of AWS re:Invent 2024

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

What you don’t know about data management could kill your business

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Creating Data Value With a Decentralized Data Strategy

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Data platform, un impulso alla customer experience e ai progetti IA

AWS Lake Formation 2022 year in review

A data strategy checklist for the journey to the data-driven enterprise

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Your guide to AWS Analytics at AWS re:Invent 2023

How Etihad taps data science to optimise airline operations

Straumann Group is transforming dentistry with data, AI

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Building a vision for real-time artificial intelligence

CIO Ryan Snyder on the benefits of interpreting data as a layer cake

Data Architecture and Strategy in the AI Era

Data governance in the age of generative AI

Achieve your AI goals with an open data lakehouse approach

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Why Game Studios Should Exploit Visual Analytics | BizAcuity

Stay Connected