Data Lake, Data Processing and Data Science

Data Lake

Data Processing

Data Science

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker.

IoT

IoT Machine Learning Metadata Data-driven

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex. Companies such as Cyxtera, Digital Realty and Equinix, among others, offer hosting, managing and operations services for AI infrastructure.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

This post was co-written with Rajiv Arora, Director of Data Science Platform at Gilead Life Sciences. Gilead Sciences, Inc. This data volume is expected to increase monthly and is fully refreshed each month. Create a data lake external schema and table in Redshift Serverless.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

It unifies all data on a single platform, including data integration, engineering, and warehousing, where it can be used for data science, real-time analytics, and business intelligence – and accessed with natural language queries and the power of generative AI.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

We also have a blended architecture of deep process capabilities in our SAP system and decision-making capabilities in our Microsoft tools, and a great base of information in our integrated data hub, or data lake, which is all Microsoft-based. That’s what we’re running our AI and our machine learning against.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

What a quarter century of digital transformation at PayPal looks like

CIO Business Intelligence

OCTOBER 4, 2023

At the lowest layer is the infrastructure, made up of databases and data lakes. These applications live on innumerable servers, yet some technology is hosted in the public cloud. One strategy, five keys From a technological point of view, the brand’s strategic engine is divided into five investment areas.

Digital Transformation

Digital Transformation Deep Learning Data Lake Risk

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. This unlocks scalable analytics while maintaining data governance, compliance, and access control.

Analytics

Analytics Data-driven Data Integration Data Lake

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Cloudera’s Data Warehouse service allows raw data to be stored in the cloud storage of your choice (S3, ADLSg2). It will be stored in your own namespace, and not force you to move data into someone else’s proprietary file formats or hosted storage. Proprietary file formats mean no one else is invited in! Separate compute.

Data Warehouse

Data Warehouse Data Lake IT Analytics

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Cloudera

DECEMBER 13, 2022

The attack targeted a host of public and private sector organizations (18,000 customers) including NASA, the Justice Department, and Homeland Security, and it is believed the attackers persisted on SolarWinds systems for 14 months prior to discovery. The benefits and challenges of ML operations.

Machine Learning

Machine Learning Experimentation Data Lake Data Processing

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure. Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog.

Data Governance

Data Governance Publishing Data-driven Metadata

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. INDUSTRY TRANSFORMATION.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud. Learn from this to build querying capabilities across your data lake and the data warehouse. Let’s find out what role each of these components play in the context of C360.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

The top three items are essentially “the devil you know” for firms which want to invest in data science: data platform, integration, data prep. Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. Rinse, lather, repeat.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

The workflow contains the following steps: Data is saved by the producer in their own Amazon Simple Storage Service (Amazon S3) buckets. Data source locations hosted by the producer are created within the producer’s AWS Glue Data Catalog. Data source locations are registered with Lake Formation.

Finance

Finance Metadata Big Data Recreation/Entertainment

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

IAM Identity Center now supports trusted identity propagation , a streamlined experience for users who require access to data with AWS analytics services. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane. Select Named Data Catalog resources. Choose Grant.

Analytics

Analytics Data Lake Management Enterprise

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Unlocking the Value of Enterprise AI with Data Engineering Capabilities. They discuss how the data engineering team is instrumental in easing collaboration between analysts, data scientists and ML engineers to build enterprise AI solutions.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Visualization

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

The DataRobot AI Platform seamlessly integrates with Azure cloud services, including Azure Machine Learning, Azure Data Lake Storage Gen 2 (ADLS), Azure Synapse Analytics, and Azure SQL database. Models trained in DataRobot can also be easily deployed to Azure Machine Learning, allowing users to host models easier in a secure way.

Data-driven

Data-driven Machine Learning Experimentation Data Lake

Dancing with Elephants in 5 Easy Steps

Cloudera

AUGUST 21, 2020

There are now tens of thousands of instances of these Big Data platforms running in production around the world today, and the number is increasing every year. Many of them are increasingly deployed outside of traditional data centers in hosted, “cloud” environments. Streaming data analytics. .

Big Data

Big Data Cost-Benefit ROI Risk

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

By supporting open-source frameworks and tools for code-based, automated and visual data science capabilities — all in a secure, trusted studio environment — we’re already seeing excitement from companies ready to use both foundation models and machine learning to accomplish key tasks.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. The job runs in the target account.

Metadata

Metadata Data Lake Machine Learning Big Data

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. As such a head of analytics, BI and data science may emerge. CAO may well be a name for that role.

Data Analytics

Data Analytics Analytics Data-driven Finance

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix.

IT Insurance Cost-Benefit Testing

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data-driven Data Lake

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

AWS Big Data

MARCH 13, 2025

Flexible deployment options for StarTree Cloud StarTree offers multiple deployment options, including a StarTree hosted software as a service (SaaS) or customer hosted SaaS. StarTrees customer hosted SaaS provides flexibility for customers interested in deploying the solution within their AWS environment or other platform of choice.

Management

Management Analytics OLAP Online Analytical Processing

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

AWS Big Data

APRIL 21, 2025

Refer to Data lake administrator permissions and Set up AWS Lake Formation. You can also refer to Simplify data access for your enterprise using Amazon SageMaker Lakehouse for the Lake Formation admin setup in your AWS account. An S3 bucket to host the sample Iceberg table data and metadata.

Data Lake

Data Lake Metadata Interactive Big Data

Prioritizing AI investments: Balancing short-term gains with long-term vision

CIO Business Intelligence

FEBRUARY 18, 2025

The absence of known authoritative sources for something as fundamental as product data meant data fragmentation and data inaccuracies would be continually at odds with the quality of informed business decisions. A decision made with AI based on bad data is still the same bad decision without it.

Machine Learning

Machine Learning Data Quality Enterprise Sales

Data Leaders Brief

How EUROGATE established a data mesh architecture using Amazon DataZone

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Your New Cloud for AI May Be Inside a Colo

Top 15 data management platforms

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Preparing the foundations for Generative AI

Announcing the 2020 Data Impact Award Winners

Governing data in relational databases using Amazon DataZone

How data literacy allows gen AI to drive productivity at Dow

Top 15 data management platforms available today

What a quarter century of digital transformation at PayPal looks like

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Get Your Analytics Insights Instantly – Without Abandoning Central IT

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

HEMA accelerates their data governance journey with Amazon DataZone

Announcing the 2021 Data Impact Awards

Create an end-to-end data strategy for Customer 360 on AWS

Themes and Conferences per Pacoid, Episode 8

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

Dancing with Elephants in 5 Easy Steps

Exploring the AI and data capabilities of watsonx

How Cargotec uses metadata replication to enable cross-account data sharing

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Data Governance for Dummies: Your Questions, Answered

CIO 100 Award winners drive business results with IT

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

Prioritizing AI investments: Balancing short-term gains with long-term vision

Stay Connected