Data Lake and Experimentation - Data Leaders Brief

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

With the core architectural backbone of the airlines gen AI roadmap in place, including United Data Hub and an AI and ML platform dubbed Mars, Birnbaum has released a handful of models into production use for employees and customers alike.

IT

IT Unstructured Data Experimentation Data Lake

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

It has far-reaching implications as to how such applications should be developed and by whom: ML applications are directly exposed to the constantly changing real world through data, whereas traditional software operates in a simplified, static, abstract world which is directly constructed by the developer. This approach is not novel.

IT

IT Testing Experimentation Software

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. You can use either the AWS Glue Data Catalog (recommended) or a Hive catalog for Iceberg tables.

Data Lake

Data Lake Snapshot Metadata Optimization

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

In the context of comprehensive data governance, Amazon DataZone offers organization-wide data lineage visualization using Amazon Web Services (AWS) services, while dbt provides project-level lineage through model analysis and supports cross-project integration between data lakes and warehouses.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. datazone_env_twinsimsilverdata"."cycle_end";') She can reached via LinkedIn. Siamak Nariman is a Senior Product Manager at AWS.

IoT

IoT Machine Learning Metadata Data-driven

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

The company’s multicloud infrastructure has since expanded to include Microsoft Azure for business applications and Google Cloud Platform to provide its scientists with a greater array of options for experimentation. Much of Regeneron’s data, of course, is confidential. That’s hard to do when you have 30 years of data.”

Data Lake

Data Lake IT Experimentation Data-driven

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

AWS Big Data

JUNE 20, 2023

It manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. About the Authors Vivek Gautam is a Data Architect with specialization in data lakes at AWS Professional Services.

Data Lake

Data Lake Data Science Recreation/Entertainment Data-driven

Einstein Studio 1: What it is and what to expect

CIO Business Intelligence

JULY 31, 2024

With this platform, Salesforce seeks to help organizations apply the cleverness of LLMs to the customer data they have squirreled away in Salesforce data lakes in the hopes of selling more. Salesforce is pushing the idea that Einstein 1 is a vehicle for experimentation and iteration. The data is there.

Data Lake

Data Lake IT Sales Experimentation

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex. The cloud is great for experimentation when data sets are smaller and model complexity is light.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Corinium

JUNE 6, 2019

Some of the work is very foundational, such as building an enterprise data lake and migrating it to the cloud, which enables other more direct value-added activities such as self-service. It is also important to have a strong test and learn culture to encourage rapid experimentation.

Insurance

Insurance Analytics Forecasting Deep Learning

Lessons from the field: How Generative AI is shaping software development in 2023

CIO Business Intelligence

SEPTEMBER 6, 2023

The use of AI-generated code is still in an experimental phase for many organizations due to numerous uncertainties such as its impact on security, data privacy, copyright, and more. For example, litigation has surfaced against companies for training AI tools using data lakes with thousands of unlicensed works.

Software

Software Risk Experimentation Data Lake

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. This is where the tagging feature in Apache Iceberg comes in handy.

Snapshot

Snapshot Data Lake Testing Strategy

Make Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

CIO Business Intelligence

MAY 23, 2022

For many nascent AI projects in the prototyping and experimentation phase, the cloud works just fine. But companies often discover that as data sets grow in volume and AI model complexity increases, the escalating cost of compute cycles, data movement, and storage can spiral out of control.

Cost-Benefit

Cost-Benefit Experimentation Data Lake Deep Learning

P&G turns to AI to create digital manufacturing of the future

CIO Business Intelligence

OCTOBER 1, 2022

“Accessing this level of data, at scale, is rare within the consumer goods industry,” Cretella says. Data and AI as digital fundamentals. It has moved past what Cretella calls the “experimentation phase” with scaled solutions and increasingly sophisticated AI applications.

Manufacturing

Manufacturing Digital Transformation IoT Internet of Things

Top 8 predictive analytics tools compared

CIO Business Intelligence

MAY 12, 2022

A free plan allows experimentation. A generous free tier makes it possible to experiment. Anyone who works in manufacturing knows SAP software. Its databases track our goods at all stages along the supply chain. Basic plans start at $36 per user, per month. More capable plans with more automation and integration available from the sales team.

Predictive Analytics

Predictive Analytics Analytics Statistics Machine Learning

Large Pharma Achieves 5X Productivity Gain With DataOps Process Hub

DataKitchen

JANUARY 17, 2022

If data is sequestered in access-controlled data islands, the process hub can enable access. Operational systems may be configured with live orchestrated feeds flowing into a data lake under the control of business analysts and other self-service users. Data is not static. Figure 1: A DataOps Process Hub.

Experimentation

Experimentation Data Lake Predictive Modeling Marketing

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

The utility for cloning and experimentation is available in the open-sourced GitHub repository. This solution only replicates metadata in the Data Catalog, not the actual underlying data. This ensures that the data lake will still be functional in another Region if Lake Formation has an availability issue.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

We collect lots of sensor data on machine performance, vibration data, temperature data, chemical data, and we like to have performative combinations of those datasets,” Dickson says.

Manufacturing

Manufacturing Data Lake Machine Learning Digital Transformation

Shutterstock capitalizes on the cloud’s cutting edge

CIO Business Intelligence

MARCH 6, 2023

Advancements in analytics and AI as well as support for unstructured data in centralized data lakes are key benefits of doing business in the cloud, and Shutterstock is capitalizing on its cloud foundation, creating new revenue streams and business models using the cloud and data lakes as key components of its innovation platform.

Data Lake

Data Lake Cost-Benefit Recreation/Entertainment Unstructured Data

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

AWS Big Data

FEBRUARY 12, 2024

It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Moreover, no separate effort is required to process historical data versus live streaming data. Apart from incremental analytics, Redshift simplifies a lot of operational aspects.

Analytics

Analytics Data Warehouse Snapshot Cost-Benefit

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Start where your data is Using your own enterprise data is the major differentiator from open access gen AI chat tools, so it makes sense to start with the provider already hosting your enterprise data. Organizations with experience building enterprise data lakes connecting to many different data sources have AI advantages.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Cloudera

DECEMBER 13, 2022

Workflows become so cumbersome that projects never make it past pilot and most importantly, data scientists’ ML models rarely emerge from experimentation to operation. . Operationalize ML with the Cloudera Data Platform. All with the integrated security and governance technologies required for compliance.

Machine Learning

Machine Learning Experimentation Data Lake Data Processing

Innovate What’s Next: How Living Labs Brings Ideas to Life

CIO Business Intelligence

APRIL 6, 2022

We are centered around co-creating with customers and promoting a systematic and scalable innovation approach to solve real-world customers problems—similar to Toyota leveraging Infosys Cobalt to modernize its vehicle data warehouse into a next-generation data lake on AWS. .

Experimentation

Experimentation Data Lake Uncertainty Enterprise

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP

OLAP Data Lake Data-driven Online Analytical Processing

Belcorp reimagines R&D with AI

CIO Business Intelligence

JUNE 28, 2023

As Belcorp considered the difficulties it faced, the R&D division noted it could significantly expedite time-to-market and increase productivity in its product development process if it could shorten the timeframes of the experimental and testing phases in the R&D labs. This allowed us to derive insights more easily.”

Digital Transformation

Digital Transformation Cost-Benefit Informatics Data mining

CIOs press ahead for gen AI edge — despite misgivings

CIO Business Intelligence

OCTOBER 18, 2023

Yet, the intense focus on gen AI has only accelerated experimentation for CIOs and vendors, including Musk, whose xAI will reportedly enter the AI arms race. Lastly, we tapped into our data lake to enrich and tailor specific customer emails to drive the conviction of our products and ultimately increased sales.

Risk

Risk Manufacturing Enterprise Technology

Havmor’s VP IT Dhaval Mankad on ‘melting’ hurdles with a scoop of digital innovation

CIO Business Intelligence

JULY 17, 2023

Currently, we have not implemented any full-fledged AI solutions, but internal discussions with the management are underway to develop dashboard solutions with data analytics. How do you foster a culture of innovation and experimentation in your team to ensure consistent learning, and achievement of your digital transformation goals?

IT

IT Digital Transformation IoT Internet of Things

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

Amazon Managed Service for Apache Flink now supports Apache Flink version 1.19

AWS Big Data

JULY 8, 2024

In every Apache Flink release, there are exciting new experimental features. Francisco collaborates closely with AWS customers to build scalable streaming data solutions and advanced streaming data lakes, ensuring seamless data processing and real-time insights. Connectors With the release of version 1.19.1,

Management

Management Snapshot Dashboards Consulting

Snowflake and Domino: Better Together

Domino Data Lab

JANUARY 11, 2021

Snowflake is a solution for data warehousing, data lakes, and data application development and specializes in securely sharing and consuming data. About Domino Data Lab. Domino Data Lab is the system-of-record for enterprise data science teams.

Data Science

Data Science Recreation/Entertainment Data Warehouse Publishing

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

DataRobot on Azure accelerates the machine learning lifecycle with advanced capabilities for rapid experimentation across new data sources and multiple problem types. Customers can build, run, and manage applications across multiple clouds, on-premises, and at the edge, with the tools of their choice.

Data-driven

Data-driven Machine Learning Experimentation Data Lake

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information.

Metadata

Metadata Data Lake Optimization Strategy

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

In a multi-tenant environment, many users need to access the same data sources. Experimental and production workloads access the same data without users impacting each others’ SLAs. Cloudera Data Warehouse has two high-performance, massively parallel processing (MPP) query engines — Impala and Hive LLAP. High performance.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Machine Learning

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In the case of CDP Public Cloud, this includes virtual networking constructs and the data lake as provided by a combination of a Cloudera Shared Data Experience (SDX) and the underlying cloud storage. Each project consists of a declarative series of steps or operations that define the data science workflow.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

AWS Big Data

NOVEMBER 14, 2023

Ten years ago, we launched Amazon Kinesis Data Streams , the first cloud-native serverless streaming data service, to serve as the backbone for companies, to move data across system boundaries, breaking data silos. Real-time streaming data technologies are essential for digital transformation.

IoT

IoT Data-driven Data Lake Data Strategy

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

MARCH 20, 2023

The data from the Kinesis data stream is consumed by two applications: A Spark streaming application on Amazon EMR is used to write data from the Kinesis data stream to a data lake hosted on Amazon Simple Storage Service (Amazon S3) in a partitioned way.

Analytics

Analytics Data Processing Slice and Dice Data Lake

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Corinium

APRIL 25, 2019

I’ve found many IT as well as Business leaders have a mental model of data in that it is simply part of, or belongs to, a specific database or application, and thus they falsely conclude that just procuring a tool to protect that given environment will sufficiently protect that data. In data-driven organizations, data is flowing.

Insurance

Insurance Risk IoT Data-driven

Escorts Kubota enlists AI to reinvent railway, construction, and agriculture

CIO Business Intelligence

NOVEMBER 11, 2024

Kubota has projects across these pillars in various stages of maturity, with some already live and some still in experimentation. He points to data cleanliness as a major challenge in this workflow. Kakkar’s litmus test for pursuing a project depends on whether it has a clear purpose, goal, and measurable objectives.

IoT

IoT Experimentation Dashboards Data Lake

Accelerate lightweight analytics using PyIceberg with AWS Lambda and an AWS Glue Iceberg REST endpoint

AWS Big Data

MAY 9, 2025

As data use cases become more complex, data engineering teams require sophisticated tooling to handle versioning, increasing data volumes, and schema changes across multiple data sources and applications.

Snapshot

Snapshot Analytics Data-driven Data Processing

Your data’s wasted without predictive AI. Here’s how to fix that

CIO Business Intelligence

MAY 6, 2025

Customer data in Salesforce, product usage data in Snowflake and financials in Oracle none integrated Regional systems using different naming conventions and field formats This fragmentation leads to inconsistent definitions, duplication of work and multiple versions of the truth. Thats a missed opportunity.

Prescriptive Analytics

Prescriptive Analytics Predictive Analytics Descriptive Analytics ROI

Load data incrementally from transactional data lakes to data warehouses

United Airlines sets its flight plan for gen AI success

Webinars

Trending Sources

MLOps and DevOps: Why Data Makes It Different

Webinars

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

How EUROGATE established a data mesh architecture using Amazon DataZone

Regeneron turns to IT to accelerate drug discovery

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

Einstein Studio 1: What it is and what to expect

Your New Cloud for AI May Be Inside a Colo

Interview with: Sankar Narayanan, Chief Practice Officer at Fractal Analytics

Lessons from the field: How Generative AI is shaping software development in 2023

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Make Better AI Infrastructure Decisions: Why Hybrid Cloud is a Solid Fit

P&G turns to AI to create digital manufacturing of the future

Top 8 predictive analytics tools compared

Large Pharma Achieves 5X Productivity Gain With DataOps Process Hub

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

DS Smith sets a single-cloud agenda for sustainability

Shutterstock capitalizes on the cloud’s cutting edge

How Gupshup built their multi-tenant messaging analytics platform on Amazon Redshift

Why enterprise CIOs need to plan for Microsoft gen AI

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Innovate What’s Next: How Living Labs Brings Ideas to Life

Unleashing the power of Presto: The Uber case study

Belcorp reimagines R&D with AI

CIOs press ahead for gen AI edge — despite misgivings

Havmor’s VP IT Dhaval Mankad on ‘melting’ hurdles with a scoop of digital innovation

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Amazon Managed Service for Apache Flink now supports Apache Flink version 1.19

Snowflake and Domino: Better Together

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

Improving Multi-tenancy with Virtual Private Clusters

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Of Muffins and Machine Learning Models

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

Accelerating revenue growth with real-time analytics: Poshmark’s journey

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Escorts Kubota enlists AI to reinvent railway, construction, and agriculture

Accelerate lightweight analytics using PyIceberg with AWS Lambda and an AWS Glue Iceberg REST endpoint

Your data’s wasted without predictive AI. Here’s how to fix that

Stay Connected