Data Lake, Data Transformation and Enterprise

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. Connect with him on LinkedIn.

Visualization

Visualization Data Lake Testing Data Governance

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Giving the mobile workforce access to this data via the cloud allows them to be productive from anywhere, fosters collaboration, and improves overall strategic decision-making. Four key challenges prevent them from doing so: 1.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

In this article, I am drawing from firsthand experience working with CIOs, CDOs, CTOs and transformation leaders across industries. I aim to outline pragmatic strategies to elevate data quality into an enterprise-wide capability. Inflexible schema, poor for unstructured or real-time data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Maintaining lists of possible values for the columns requires continuous updates.

Metadata

Metadata Data Lake Modeling Data Warehouse

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Analytics is the means for discovering those insights, and doing it well requires the right tools for ingesting and preparing data, enriching and tagging it, building and sharing reports, and managing and protecting your data and insights. For many enterprises, Microsoft Azure has become a central hub for analytics. Microsoft.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The Amazon EMR Flink CDC connector reads the binlog data and processes the data.

Data Lake

Data Lake Metadata Business Analysis Data-driven

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

This is both frustrating for companies that would prefer making ML an ordinary, fuss-free value-generating function like software engineering, as well as exciting for vendors who see the opportunity to create buzz around a new category of enterprise software. The new category is often called MLOps. Model Development.

IT

IT Testing Experimentation Software

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Metadata

Metadata Data Governance Data Quality Data-driven

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Customers can now seamlessly automate migration to Cloudera’s Hybrid Data Platform — Cloudera Data Platform (CDP) to dynamically auto-scale cloud services with Cloudera Data Engineering (CDE) integration with Modak Nabu. Also, enterprises can tap into new technologies like Kubernetes.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

It provides data prep, management, and enterprise data warehousing tools. It has a data pipeline tool , as well. Azure Logic Apps: This service helps you schedule, automate, and orchestrate tasks, business processes, and workflows when integrating apps, data, systems, and services across enterprises or organizations.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

And at First Commerce Bank, EVP and COO Gregory Garcia hopes to leverage unified, real-time data to monitor risks such as worsening vacancy rates that could make it harder for commercial property owners to pay their mortgages. After moving its expensive, on-premise data lake to the cloud, Comcast created a three-tiered architecture.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand. You can configure data transformation capabilities such as filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps.

Analytics

Analytics Data Warehouse Big Data Metrics

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

The goal, she explained, is to knock down data silos between those groups, using multiple data lakes supported by strong security and governance, to drive positive impact across the supply chain, manufacturing, and the clinical trials of new drugs. . Four ways to improve data-driven business transformation .

Machine Learning

Machine Learning Data Science Data-driven Testing

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud. CDP Machine Learning: a kubernetes-based service that allows data scientists to deploy collaborative workspaces with secure, self-service access to enterprise data.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The 100 projects recognized this year come from a range of industries and implement a wide variety of technologies to solve intractable problems, open up new possibilities, and give enterprises a leg up on their competition. The framework has fostered innovation and collaboration through an enterprise-wide inner source initiative.

IT

IT Insurance Cost-Benefit Testing

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

It does this by helping teams handle the T in ETL (extract, transform, and load) processes. It allows users to write data transformation code, run it, and test the output, all within the framework it provides. This process has been scheduled to run daily, ensuring a consistent batch of fresh data for analysis.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

Data platform architecture has an interesting history. Towards the turn of millennium, enterprises started to realize that the reporting and business intelligence workload required a new solution rather than the transactional applications. A read-optimized platform that can integrate data from multiple applications emerged.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Turning the page

Cloudera

JUNE 1, 2021

It also means we can complete our business transformation with the systems, processes and people that support a new operating model. . And, the Enterprise Data Cloud category we invented is also growing. Said simply, Datacoral offers a fully-managed service for worry-free data integrations. Our strategy.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Select Visual ETL in the central pane.

Data Processing

Data Processing Visualization Data Lake Data Processing

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. For an example, refer to How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

Using AWS Glue transformations is crucial when creating an AWS Glue job because they enable efficient data cleansing, enrichment, and restructuring, making sure the data is in the desired format and quality for downstream processes. Refer to Editing AWS Glue managed data transform nodes for more information.

Analytics

Analytics Data-driven Data Integration Data Lake

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

As organizations become more data-driven, different use cases will always require different types of transformations, putting a heavy load on the centralized teams. For large enterprises, data mesh distributes data ownership and reduces dependencies between services. by building data products with domain owners.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Have an AWS account with permission on AWS Lambda , QuickSight (Enterprise edition), and AWS CloudFormation. He has a specialty in big data services and technologies and an interest in building customer business outcomes together. Jiseong Kim is a Senior Data Architect at AWS ProServe. Install Python 3 on your local machine.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

The Right Self-Serve Data Preparation Solution is Sophisticated, Easy-to-Use and Ensures User Adoption! When your enterprise decides to roll out analytics for business users, it is important to implement the right solution.

Data Lake

Data Lake Machine Learning Data Integration Data Quality

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

In other words, instead of training numerous models on labeled, task-specific data, it’s now possible to pre-train one big model built on a transformer and then, with additional fine-tuning, reuse it as needed. They offer an enterprise-ready dataset with trusted data that’s undergone negative and positive curation.

Risk

Risk Modeling Management Metadata

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise. IBM watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. From detailed design to a beta release, Tricentis had customers expecting to consume data from a data lake specific to only their data, and all of the data that had been generated for over a decade.

Software

Software Data Lake Testing Dashboards

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

Trending Sources

Monitor data pipelines in a serverless data lake

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Data’s dark secret: Why poor quality cripples AI and growth

How to modernize data lakes with a data lakehouse architecture

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

7 key Microsoft Azure analytics services (plus one extra)

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Build a data lake with Apache Flink on Amazon EMR

MLOps and DevOps: Why Data Makes It Different

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Lay the groundwork now for advanced analytics and AI

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

How the BMW Group analyses semiconductor demand with AWS Glue

Straumann Group is transforming dentistry with data, AI

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

At AstraZeneca, data and AI are more than game changers – they are life changers

Happy Birthday, CDP Public Cloud

CIO 100 Award winners drive business results with IT

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Reference guide to build inventory management and forecasting solutions on AWS

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Data platform trinity: Competitive or complementary?

Turning the page

Use AWS Glue to streamline SFTP data processing

Addressing the Three Scalability Challenges in Modern Data Platforms

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Connect your data for faster decisions with AWS

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

How to use foundation models and trusted governance to manage AI workflow risk

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Stay Connected