Data Lake, Data Processing and IT

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

Oracle recently hosted its annual Database Analyst Summit, sharing the vision and strategy for its data platform. While much of the event was under non-disclosure as product plans and launch schedules are finalized, it still served as a useful recap of the broad portfolio of data platform capabilities that Oracle has to offer.

Data Lake

Data Lake Data Warehouse Machine Learning Software

Important Considerations When Migrating to a Data Lake

Smart Data Collective

MARCH 30, 2022

Azure Data Lake Storage Gen2 is based on Azure Blob storage and offers a suite of big data analytics features. If you don’t understand the concept, you might want to check out our previous article on the difference between data lakes and data warehouses. Migrate data, workloads, and applications.

Data Lake

Data Lake Cost-Benefit Data Warehouse Big Data

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Open AWS Glue Studio. Choose ETL Jobs.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

With over 10 PB of data across 1,500 data assets, 1,000 data use cases, and more than 9000 users, the BMW CDH has become a resounding success since BMW decided to build it in a strategic collaboration with Amazon Web Services (AWS) in 2020. This led to inefficiencies in data governance and access control.

Data Lake

Data Lake Sales Metadata Machine Learning

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. Let’s walk through the architecture chronologically for a closer look at each step.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. With in-prompt context awareness, you can now include this information in your natural language query, and Amazon Q data integration will automatically extract and incorporate it into the workflow.

Data Integration

Data Integration Visualization Data Processing Big Data

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Many in the data industry recognize the serious impact of AI bias and seek to take active steps to mitigate it. The data industry realizes that AI bias is simply a quality problem, and AI systems should be subject to this same level of process control as an automobile rolling off an assembly line. Rise of the DataOps Engineer.

Testing

Testing Data Lake Data Architecture Manufacturing

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. These controls are designed to grant access with the right level of privileges and context.

Sales

Sales Data-driven Data Processing Key Performance Indicator

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP. On your Visual Editor canvas, select your SAP sources.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Now they have a new requirement to allow ad-hoc queries through SageMaker Unified Studio to enable data engineers, data analysts, sales representatives, and others to take advantage of its unified experience. For Host , enter your host name of your Aurora PostgreSQL database cluster. Choose Add data.

Visualization

Visualization Data Processing Testing Publishing

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. Then we can query the data with Amazon Athena visualize it in Amazon QuickSight.

Data Lake

Data Lake Visualization Dashboards Insurance

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. Verify all table metadata is stored in the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

However, this enthusiasm may be tempered by a host of challenges and risks stemming from scaling GenAI. As the technology subsists on data, customer trust and their confidential information are at stake—and enterprises cannot afford to overlook its pitfalls. This is where data solutions like Dell AI-Ready Data Platform come in handy.

Strategy

Strategy Modeling Management Data Lake

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

CIO Business Intelligence

DECEMBER 7, 2023

The organization has 500 applications for business services, 80,000 VMs, 3,000 hosts, and more than 100,000 containers. BMC needed a solution to transform this large volume of data and enable observability to understand thousands of events as a single scenario. This is turn ensures availability, reliability, and performance.

IT

IT Data Lake Business Services Data Processing

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Snapshot and restore results in longer downtimes and greater loss of data between when the disaster event occurs and recovery. The workflow consists of the following initial steps: OpenSearch Service is hosted in the primary Region, and all the active traffic is routed to the OpenSearch Service domain in the primary Region.

Snapshot

Snapshot Strategy Dashboards Data Lake

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. And you also already know siloed data is costly, as that means it will be much tougher to derive novel insights from all of your data by joining data sets.

Data Warehouse

Data Warehouse Data Lake IT Analytics

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI. Many of our service-based offerings include hosting and executing our customers’ omnichannel platforms.

IT

IT Interactive Marketing Consulting

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex. Companies such as Cyxtera, Digital Realty and Equinix, among others, offer hosting, managing and operations services for AI infrastructure.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Therefore, organizations have come to host huge volumes of metadata of their structured datasets in the Hive metastore.

Data Lake

Data Lake Metadata Data Processing Big Data

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

But to get maximum value out of data and analytics, companies need to have a data-driven culture permeating the entire organization, one in which every business unit gets full access to the data it needs in the way it needs it. This is called data democratization. They have data swamps,” he says.

Data Lake

Data Lake Data-driven Finance Data Architecture

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

It’s necessary to say that these processes are recurrent and require continuous evolution of reports, online data visualization , dashboards, and new functionalities to adapt current processes and develop new ones. You need to determine if you are going with an on-premise or cloud-hosted strategy. Construction Iterations.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. For example, we are using a data lake administrator role called LF-Admin.

Data Lake

Data Lake Metadata Management Data Processing

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Whether it’s data management, analytics, or scalability, AWS can be the top-notch solution for any SaaS company. It has brought a lot of data to the cloud in recent years. Data storage databases. With the advancement of technology and more people accessing the internet, data security has become increasingly important.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Cloudera

DECEMBER 2, 2021

Cloudera Data Warehouse (CDW) is a cloud native data warehouse service that runs Cloudera’s powerful query engines on a containerized architecture to do analytics on any type of data. It is part of the Cloudera Data Platform, or CDP , which runs on Azure and AWS, as well as in the private cloud.

Data Lake

Data Lake Data Warehouse Data Processing Interactive

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. In the navigation pane, under Data catalog , choose Settings.

Data Lake

Data Lake Data-driven Management Data Architecture

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Generative AI tools are only as accurate and effective as the data they can access, and while government and public service agencies will have invested in their data infrastructure, it still may not be optimised for the demands of generative AI. This seems a formidable task. Microsoft Fabric could be one way to deal with it.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

Because Gilead is expanding into biologics and large molecule therapies, and has an ambitious goal of launching 10 innovative therapies by 2030, there is heavy emphasis on using data with AI and machine learning (ML) to accelerate the drug discovery pipeline. This data volume is expected to increase monthly and is fully refreshed each month.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud. This enables the company to extract additional value from the data through real-time availability and contextualization.

Manufacturing

Manufacturing Data Lake Machine Learning Digital Transformation

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Iceberg provides several implementation options for the Iceberg catalog, including the AWS Glue Data Catalog, Hive Metastore, and JDBC catalogs. Choose Create.

Data Lake

Data Lake Metadata Snapshot Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users.

Metadata

Metadata Data Lake Visualization Data Quality

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

We needed a solution to manage our data at scale, to provide greater experiences to our customers. The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL was the first Pakistani commercial bank to be established in Pakistan in 1947.

Management

Management Data Lake Consulting Unstructured Data

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. The data will be in the target S3 bucket.

Data Processing

Data Processing Visualization Data Lake Data Processing

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone , a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

During the first-ever virtual broadcast of our annual Data Impact Awards (DIA) ceremony, we had the great pleasure of announcing this year’s finalists and winners. The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. Connect the Data Lifecycle .

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Oracle Wants to Be the Database for AI

Webinars

Trending Sources

Important Considerations When Migrating to a Data Lake

Webinars

Migrate an existing data lake to a transactional data lake using Apache Iceberg

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Enrich your serverless data lake with Amazon Bedrock

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Eight Top DataOps Trends for 2022

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

How EUROGATE established a data mesh architecture using Amazon DataZone

Scaling RISE with SAP data and AWS Glue

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Build a data lake with Apache Flink on Amazon EMR

The success of GenAI models lies in your data management strategy

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Running both IT and digital at Alorica

Your New Cloud for AI May Be Inside a Colo

Query your Apache Hive metastore with AWS Lake Formation permissions

Top 15 data management platforms

The essential check list for effective data democratization

Governing data in relational databases using Amazon DataZone

Accomplish Agile Business Intelligence & Analytics For Your Business

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

10 Things AWS Can Do for Your SaaS Company

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Preparing the foundations for Generative AI

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

DS Smith sets a single-cloud agenda for sustainability

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Habib Bank manages data at scale with Cloudera Data Platform

Use AWS Glue to streamline SFTP data processing

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Announcing the 2020 Data Impact Award Winners

Stay Connected