Data Lake, Enterprise and Modeling

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Rapidminer Platform Supports Entire Data Science Lifecycle

David Menninger's Analyst Perspectives

SEPTEMBER 16, 2021

Rapidminer is a visual enterprise data science platform that includes data extraction, data mining, deep learning, artificial intelligence and machine learning (AI/ML) and predictive analytics. It can support AI/ML processes with data preparation, model validation, results visualization and model optimization.

Data Science

Data Science Data Lake Data mining Deep Learning

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). An organizations data architecture is the purview of data architects. Curate the data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Oracle Wants to Be the Database for AI

David Menninger's Analyst Perspectives

MAY 15, 2025

For context, read this perspective by my colleague, Matt Aslett, on the importance of local data processing. Our research shows that more than half of enterprises (58%) have the majority of data platforms in the cloud, but a substantial portion is deployed on premises. Regards, David Menninger

Data Lake

Data Lake Data Warehouse Machine Learning Software

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

From within the unified studio, you can discover data and AI assets from across your organization, then work together in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications.

Analytics

Analytics Data Lake Data Warehouse Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

They’re taking data they’ve historically used for analytics or business reporting and putting it to work in machine learning (ML) models and AI-powered applications. Amazon SageMaker Unified Studio (Preview) solves this challenge by providing an integrated authoring experience to use all your data and tools for analytics and AI.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

United Airlines sets its flight plan for gen AI success

CIO Business Intelligence

DECEMBER 20, 2024

With the core architectural backbone of the airlines gen AI roadmap in place, including United Data Hub and an AI and ML platform dubbed Mars, Birnbaum has released a handful of models into production use for employees and customers alike. That number has increased to 21% in just 18 months.

IT

IT Unstructured Data Experimentation Data Lake

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. This is where Amazon Bedrock comes into play.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

The rise of generative AI (GenAI) felt like a watershed moment for enterprises looking to drive exponential growth with its transformative potential. As the technology subsists on data, customer trust and their confidential information are at stake—and enterprises cannot afford to overlook its pitfalls.

Strategy

Strategy Modeling Management Data Lake

MongoDB Enhances Developer Data Platform

David Menninger's Analyst Perspectives

JANUARY 21, 2025

While new and emerging capabilities might catch the eye, features that address data platform security, performance and availability remain some of the most significant deal-breakers when enterprises are considering potential data platform providers. This is especially true for mission-critical workloads. Regards, Matt Aslett

Data Lake

Data Lake IoT Cost-Benefit Enterprise

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Data professionals need to access and work with this information for businesses to run efficiently, and to make strategic forecasting decisions through AI-powered data models.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

Why should you integrate data governance (DG) and enterprise architecture (EA)? Two of the biggest challenges in creating a successful enterprise architecture initiative are: collecting accurate information on application ecosystems and maintaining the information as application ecosystems change.

Data Governance

Data Governance Enterprise Risk Data Lake

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. In 2022, data organizations will institute robust automated processes around their AI systems to make them more accountable to stakeholders. Data Gets Meshier.

Testing

Testing Data Lake Data Architecture Manufacturing

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

Ostensibly, the new product represents Microsoft’s transition to a newer, more cloud-friendly ERP for midsized enterprises. That stands for “bring your own database,” and it refers to a model in which core ERP data are replicated to a separate standalone database used exclusively for reporting. Option 3: Azure Data Lakes.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. foundation model (FM) in Amazon Bedrock as the LLM. The answer is yes.

Metadata

Metadata Data Lake Modeling Data Warehouse

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. But first, let’s define the data mesh design pattern. The past decades of enterprise data platform architectures can be summarized in 69 words.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

This evaluation, we feel, critically examines vendors capabilities to address key service needs, including data engineering, operational data integration, modern data architecture delivery, and enabling less-technical data integration across various deployment models.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

Data mesh and DataOps provide the organization, enterprise architecture, and workflow automation that together enable a relatively small data team to address the analytics needs of hundreds of active business users. Figure 1: Data requirements for phases of the drug product lifecycle. The new Recipes run, and BOOM!

Data Warehouse

Data Warehouse Data Lake Manufacturing Testing

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

One-time and complex queries are two common scenarios in enterprise data analytics. Complex queries, on the other hand, refer to large-scale data processing and in-depth analysis based on petabyte-level data warehouses in massive data scenarios. Here, data modeling uses dbt on Amazon Redshift.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Between building gen AI features into almost every enterprise tool it offers, adding the most popular gen AI developer tool to GitHub — GitHub Copilot is already bigger than GitHub when Microsoft bought it — and running the cloud powering OpenAI, Microsoft has taken a commanding lead in enterprise gen AI. That’s risky.”

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. Building a model requires specialized, hard-to-find skills — and each new task requires repeating the process. ” These large models have lowered the cost and labor involved in automation.

Enterprise

Enterprise Technology Modeling Cost-Benefit

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Outdated business apps can cloud your AI vision

CIO Business Intelligence

FEBRUARY 20, 2025

When building a machine-learning-powered tool to predict the maintenance needs of its customers, Ensono found that its customers used multiple old apps to collect incident tickets, but those apps stored incident data in very different formats, with inconsistent types of data collected, he says.

Insurance

Insurance Cost-Benefit Unstructured Data Data Lake

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Developers can rapidly implement sophisticated data querying features without complex codingjust connect to the API endpoints and let users explore financial data using plain English. Enable Amazon Bedrock large language model (LLM) access for Amazon Nova Pro. Choose Enable specific models. Choose Test.

Structured Data

Structured Data Data Warehouse Analytics Finance

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

This is both frustrating for companies that would prefer making ML an ordinary, fuss-free value-generating function like software engineering, as well as exciting for vendors who see the opportunity to create buzz around a new category of enterprise software. The new category is often called MLOps. Compute.

IT

IT Testing Experimentation Software

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

I aim to outline pragmatic strategies to elevate data quality into an enterprise-wide capability. Key recommendations include investing in AI-powered cleansing tools and adopting federated governance models that empower domains while ensuring enterprise alignment. The patterns are consistent across industries.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations. A distributed data mesh is a better choice. The book will be available from O’Reilly Media here.

Data-driven

Data-driven Data Governance Big Data Data Science

The year’s top 10 enterprise AI trends — so far

CIO Business Intelligence

SEPTEMBER 21, 2023

AI is now a board-level priority Last year, AI consisted of point solutions and niche applications that used ML to predict behaviors, find patterns, and spot anomalies in carefully curated data sets. Today’s foundational models are jacks-of-all-trades. All of PwC’s clients are having this discussion, he says. Gen AI took a few months.

Enterprise

Enterprise Consulting Modeling Cost-Benefit

TransUnion transforms its business model with IT

CIO Business Intelligence

APRIL 26, 2024

Count TransUnion among the rising tide of enterprises evolving their identities thanks to IT. “We The power of productizing data TransUnion’s OneTru has been made possible by the company’s migration to AWS, dubbed “Project Rise,” which is slated for completion by year’s end.

Modeling

Modeling IT Machine Learning Data Governance

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Amazon SageMaker Lakehouse now supports attribute-based access control

AWS Big Data

APRIL 24, 2025

Scalability ABAC systems are more scalable for larger enterprises because they can handle a large number of users and resources without requiring a large number of roles. Attribute-based access control overview Previously, within SageMaker Lakehouse, Lake Formation granted access to resources based on the identity of a requesting user.

Sales

Sales Data Lake Management Data-driven

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

CIO Business Intelligence

OCTOBER 25, 2023

From IT, to finance, marketing, engineering, and more, AI advances are causing enterprises to re-evaluate their traditional approaches to unlock the transformative potential of AI. What can enterprises learn from these trends, and what future enterprise developments can we expect around generative AI?

Enterprise

Enterprise Manufacturing Risk Data-driven

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

AWS Big Data

JUNE 20, 2023

It manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. Data labeling is required for various use cases, including forecasting, computer vision, natural language processing, and speech recognition.

Data Lake

Data Lake Data Science Recreation/Entertainment Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Metadata

Metadata Data Governance Data Quality Data-driven

Einstein Studio 1: What it is and what to expect

CIO Business Intelligence

JULY 31, 2024

This year, however, Salesforce has accelerated its agenda, integrating much of its recent work with large language models (LLMs) and machine learning into a low-code tool called Einstein 1 Studio. Einstein 1 Studio is a set of low-code tools to create, customize, and embed AI models in Salesforce workflows. What is Einstein 1 Studio?

Data Lake

Data Lake IT Sales Experimentation

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3 Access Grants

AWS Big Data

MAY 29, 2024

Many organizations use external identity providers (IdPs) such as Okta or Microsoft Azure Active Directory to manage their enterprise user identities. Later in this post, we also briefly touch upon using CloudTrail Lake to query the data access events. In the following sections, we demonstrate how to build this architecture.

Data Lake

Data Lake Enterprise Management Business Intelligence

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Rapidminer Platform Supports Entire Data Science Lifecycle

Webinars

Trending Sources

Recap of Amazon Redshift key product announcements in 2024

Webinars

What is data architecture? A framework to manage data

Oracle Wants to Be the Database for AI

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

United Airlines sets its flight plan for gen AI success

Enrich your serverless data lake with Amazon Bedrock

The success of GenAI models lies in your data management strategy

MongoDB Enhances Developer Data Platform

Bridging the gap between mainframe data and hybrid cloud environments

Integrating Data Governance and Enterprise Architecture

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Eight Top DataOps Trends for 2022

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

What is a Data Mesh?

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Implementing a Pharma Data Mesh using DataOps

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Why enterprise CIOs need to plan for Microsoft gen AI

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Outdated business apps can cloud your AI vision

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

MLOps and DevOps: Why Data Makes It Different

Data’s dark secret: Why poor quality cripples AI and growth

2021 Gift Giving Guide for Data Nerds

The year’s top 10 enterprise AI trends — so far

TransUnion transforms its business model with IT

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Amazon SageMaker Lakehouse now supports attribute-based access control

Generative AI: 5 enterprise predictions for AI and security — for 2023, 2024, and beyond

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Einstein Studio 1: What it is and what to expect

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3 Access Grants

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Stay Connected