Data Lake, Data Transformation and Information

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

According to a study from Rocket Software and Foundry , 76% of IT decision-makers say challenges around accessing mainframe data and contextual metadata are a barrier to mainframe data usage, while 64% view integrating mainframe data with cloud data sources as the primary challenge.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable).

Data Lake

Data Lake Snapshot Optimization Data Transformation

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

Resultant recommended a new, on-prem data infrastructure, complete with data lakes to provide stake holders with a better way to manage data reliability, accuracy, and timeliness. I think you have to toot your own horn that, yes, we have this information available.”. They want that information,” she says.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The original proof of concept was to have one data repository ingesting data from 11 sources, including flat files and data stored via APIs on premises and in the cloud, Pruitt says. There are a lot of variables that determine what should go into the data lake and what will probably stay on premise,” Pruitt says.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications.

IoT

IoT Machine Learning Metadata Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Analytics

Analytics Visualization Data Governance Data-driven

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

For example, the Flink FileSystem connector has FileSystemTableFactory to read/write data in Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (Amazon S3), the Flink HBase connector has HBase2DynamicTableFactory to read/write data in HBase, and the Flink Kafka connector has KafkaDynamicTableFactory to read/write data in Kafka.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Can it also help write SQL queries? The answer is yes.

Metadata

Metadata Data Lake Modeling Data Warehouse

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements.

Data Integration

Data Integration Visualization Data Processing Big Data

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Let’s expand the use case to run your data pipeline and perform extract, transform, and load (ETL) jobs when a new file lands in an Amazon Simple Storage Service (Amazon S3) bucket in your data lake. The following diagram illustrates one architectural approach.

Interactive

Interactive Testing Data-driven Data Lake

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

The manufacturers need to know BMW Group’s exact current and future semiconductor volume information, which will effectively help steer the available worldwide supply. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow bridges the gap between Google applications and Amazon Redshift, empowering organizations to unlock deeper insights and drive data-informed decisions. In this post, we show you how to establish the data ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup.

Analytics

Analytics Data Warehouse Big Data Metrics

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

ML use cases rarely dictate the master data management solution, so the ML stack needs to integrate with existing data warehouses. You want to carefully manage consistency of data between training and predictions, as well as make sure that there’s no leakage of information when models are being trained and tested with historical data.

IT

IT Testing Experimentation Software

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant. Data lives across siloed systems ERP, CRM, cloud platforms, spreadsheets with little integration or consistency.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

As businesses strive to make informed decisions, the amount of data being generated and required for analysis is growing exponentially. This trend is no exception for Dafiti , an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

But before consolidating the required data, Lenovo had to overcome concerns around sharing potentially sensitive information. Hoogar’s staff helped relieve such fears by educating employees that information included in the solution, such as notices of bug fixes or software updates, was already public.

Analytics

Analytics Data Lake Metadata Cost-Benefit

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning). So go ahead.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

Unifying its data within a centralized architecture allows AstraZeneca’s researchers to easily tag, search, share, transform, analyze, and govern petabytes of information at a scale unthinkable a decade ago. . This allows for engineers and data scientists to go from idea to insight quickly, delivering meaningful impact.

Machine Learning

Machine Learning Data Science Data-driven Testing

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu with CDP today have deployed Data Lakes and. An executive dashboard shows the status of crawlers, pipelines, and data profiling. Modak Nabu TM and CDE’s Spark-on-Kubernetes.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The firm also worked on creating a solid pipeline from the data warehouse to the data lake.

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. Divisions decide how many domains to have within their node; some may have one, others many. Nodes and domains serve business needs and are not technology mandated.

Metadata

Metadata Data Governance Data Quality Data-driven

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

The company stores vast amounts of transactional data, customer information, and product catalogs in Snowflake. However, they also generate and collect data from various other sources, such as web logs stored in Amazon S3, social media platforms, and third-party data providers. Choose Save.

Analytics

Analytics Data-driven Data Integration Data Lake

Turning the page

Cloudera

JUNE 1, 2021

After all, we invented the whole idea of Big Data. Well, at Cloudera, we envision a world where everyone can quickly and easily access the data-powered information and insights they need – in just a few clicks. . The mission is to “Make data and analytics easy and accessible, for everyone.” 650-644-3900.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Discovery of workload and integrations Conducting discovery and assessment for migrating a large on-premises data warehouse to Amazon Redshift is a critical step in the migration process.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. For more information, see Setting up networking for development for AWS Glue. This enables organizations to streamline data integration and analytics with OpenSearch Service.

Analytics

Analytics IT Data Lake Visualization

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Predict – Data Engineering (Apache Spark). This is Now. New Services.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Select Visual ETL in the central pane.

Data Processing

Data Processing Visualization Data Lake Data Processing

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

This allows data analysts and data scientists to rapidly construct the necessary data preparation steps to meet their business needs. In this scenario, you’re a data analyst in this company. This provides useful insights about the data to inform transformation decisions.

Interactive

Interactive Visualization Data Integration Statistics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Choose Create stack.

Data Lake

Data Lake Dashboards Metrics Metadata

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Then it retrieves the job history information (YARN logs from application managers) by calling the YARN ResourceManager application API. For more information on how to use the YARN log organizer, refer to the yarn-log-organizer GitHub repo. This information helps you understand who submits an application to a queue.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Bridging the gap between mainframe data and hybrid cloud environments

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Webinars

Trending Sources

Texas Rangers data transformation modernizes stadium operations

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

How EUROGATE established a data mesh architecture using Amazon DataZone

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Build a data lake with Apache Flink on Amazon EMR

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

How the BMW Group analyses semiconductor demand with AWS Glue

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

MLOps and DevOps: Why Data Makes It Different

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data’s dark secret: Why poor quality cripples AI and growth

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Straumann Group is transforming dentistry with data, AI

Lay the groundwork now for advanced analytics and AI

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

At AstraZeneca, data and AI are more than game changers – they are life changers

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Connecting the Data Lifecycle

Reference guide to build inventory management and forecasting solutions on AWS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Turning the page

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Amazon Redshift data ingestion options

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Happy Birthday, CDP Public Cloud

Use AWS Glue to streamline SFTP data processing

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Data platform trinity: Competitive or complementary?

Automate alerting and reporting for AWS Glue job resource usage

Connect your data for faster decisions with AWS

Stay Connected