Data Lake, Data Processing and Machine Learning

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Oracle recently hosted its annual Database Analyst Summit, sharing the vision and strategy for its data platform. While much of the event was under non-disclosure as product plans and launch schedules are finalized, it still served as a useful recap of the broad portfolio of data platform capabilities that Oracle has to offer.

Data Lake

Data Lake Data Warehouse Machine Learning Software

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

A domain has an important job and a dedicated team – five to nine members – who develop an intimate knowledge of data sources, data consumers and functional nuances. For example, managing ordered data dependencies, inter-domain communication, shared infrastructure, and incoherent workflows.

Testing

Testing Data Lake Data Architecture Manufacturing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity. For Add data source , choose Add connection.

Visualization

Visualization Data Processing Testing Publishing

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. AWS also offers developers the technology to develop smart apps using machine learning and complex algorithms.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud. 2, machine learning/AI (31%), the packaging company has three use cases in proof of concept. As for No.

Manufacturing

Manufacturing Data Lake Machine Learning Digital Transformation

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

CIO Business Intelligence

DECEMBER 7, 2023

As a global company with more than 6,000 employees, BMC faces many of the same data challenges that other large enterprises face. The organization has 500 applications for business services, 80,000 VMs, 3,000 hosts, and more than 100,000 containers. Given the sheer volume of enterprise data, it’s impossible to do this manually.

IT

IT Data Lake Business Services Data Processing

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization. Her special areas of interest are data analytics, machine learning/AI, and application modernization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

All data is held in a lake-centric hub, and protected by a strong, universal security model, with data loss prevention and protection for sensitive data, and features for auditing and forensic investigation already built-in.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. In its first six months of operation, OVO UnCover has proven to be 7.9

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Previously head of cybersecurity at Ingersoll-Rand, Melby started developing neural networks and machine learning models more than a decade ago. I was literally just waiting for commercial availability [of LLMs] but [services] like Azure Machine Learning made it so you could easily apply it to your data.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Management

Management Data Lake Consulting Unstructured Data

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

Because Gilead is expanding into biologics and large molecule therapies, and has an ambitious goal of launching 10 innovative therapies by 2030, there is heavy emphasis on using data with AI and machine learning (ML) to accelerate the drug discovery pipeline. Create a data lake external schema and table in Redshift Serverless.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications.

Analytics

Analytics Data-driven Data Integration Data Lake

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. In the navigation pane, under Data catalog , choose Settings.

Data Lake

Data Lake Data-driven Management Data Architecture

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Access to an SFTP server with permissions to upload and download data. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. This makes gathering information for decision making a challenge.

Management

Management Metrics Data Processing Machine Learning

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

At the core, digital at Dow is about changing how we work, which includes how we interact with systems, data, and each other to be more productive and to grow. Data is at the heart of everything we do today, from AI to machine learning or generative AI. That’s what we’re running our AI and our machine learning against.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer.

Metadata

Metadata Data Governance Data Quality Data-driven

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

Deploying new data types for machine learning Mai-Lan Tomsen-Bukovec, vice president of foundational data services at AWS, sees the cloud giant’s enterprise customers deploying more unstructured data, as well as wider varieties of data sets, to inform the accuracy and training of ML models of late.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). About the authors Sivaprasad Mahamkali is a Senior Streaming Data Engineer at AWS Professional Services.

Metadata

Metadata Data Lake Visualization Data Quality

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Cloudera

DECEMBER 13, 2022

The attack targeted a host of public and private sector organizations (18,000 customers) including NASA, the Justice Department, and Homeland Security, and it is believed the attackers persisted on SolarWinds systems for 14 months prior to discovery. The world is awash in data. The benefits and challenges of ML operations.

Machine Learning

Machine Learning Experimentation Data Lake Data Processing

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. High-quality data is not just about accuracy; it’s also about timeliness. Reserve your seat now! Reserve your seat now! Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Cloudera’s Data Warehouse service allows raw data to be stored in the cloud storage of your choice (S3, ADLSg2). It will be stored in your own namespace, and not force you to move data into someone else’s proprietary file formats or hosted storage. Proprietary file formats mean no one else is invited in! Separate compute.

Data Warehouse

Data Warehouse Data Lake IT Analytics

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data.

Data Lake

Data Lake Visualization Optimization Interactive

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

CDP Private Cloud is a Game-changer for Partners

Cloudera

SEPTEMBER 2, 2020

Additionally, lines of business (LOBs) are able to gain access to a shared data lake that is secured and governed by the use of Cloudera Shared Data Experience (SDX). According to 451 Research’s Voice of the Enterprise: Cloud, Hosting & Managed Services study, 58% of Enterprises are moving towards a hybrid IT environment.

Cost-Benefit

Cost-Benefit Data Warehouse Data Lake Machine Learning

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads.

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables. Sudipta Bagchi is a Sr.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. DATA FOR ENTERPRISE AI.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI. Many of our service-based offerings include hosting and executing our customers’ omnichannel platforms.

IT

IT Interactive Marketing Consulting

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. Plus, the more mature machine learning (ML) practices place greater emphasis on these kinds of solutions than the less experienced organizations.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Oracle Wants to Be the Database for AI

Webinars

Trending Sources

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Eight Top DataOps Trends for 2022

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Scaling RISE with SAP data and AWS Glue

10 Things AWS Can Do for Your SaaS Company

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

DS Smith sets a single-cloud agenda for sustainability

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Preparing the foundations for Generative AI

Announcing the 2020 Data Impact Award Winners

Governing data in relational databases using Amazon DataZone

Dairyland powers up for a generative AI edge

Habib Bank manages data at scale with Cloudera Data Platform

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Use AWS Glue to streamline SFTP data processing

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

How data literacy allows gen AI to drive productivity at Dow

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

FINRA CIO Steve Randich pushes the public cloud forward

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Access Amazon Athena in your applications using the WebSocket API

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Run Spark SQL on Amazon Athena Spark

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

CDP Private Cloud is a Game-changer for Partners

Addressing the Three Scalability Challenges in Modern Data Platforms

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Amazon Redshift data ingestion options

Announcing the 2021 Data Impact Awards

Running both IT and digital at Alorica

Themes and Conferences per Pacoid, Episode 8

Stay Connected