Data Lake, Data Transformation and Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The dashboard now in production uses Databricks’ Azure data lake to ingest, clean, store, and analyze the data, and Microsoft’s Power BI to generate graphical analytics that present critical operational data in a single view, such as the number of flights coming into domestic and international terminals and average security wait times.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that builds upon Apache Airflow, offering its benefits while eliminating the need for you to set up, operate, and maintain the underlying infrastructure, reducing operational overhead while increasing security and resilience.

Interactive

Interactive Testing Data-driven Data Lake

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

With the new stadium on the horizon, the team needed to update existing IT systems and manual business and IT processes to handle the massive volumes of new data that would soon be at their fingertips. “In Analytics, Data Management Some of our systems were old. They want that information,” she says.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

Additionally, integrating mainframe data with the cloud enables enterprises to feed information into data lakes and data lake houses, which is ideal for authorized data professionals to easily leverage the best and most modern tools for analytics and forecasting.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. datazone_env_twinsimsilverdata"."cycle_end";')

IoT

IoT Machine Learning Metadata Data-driven

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Navigate to the Athena console and choose Query editor.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Data Lake

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Since software engineers manage to build ordinary software without experiencing as much pain as their counterparts in the ML department, it begs the question: should we just start treating ML projects as software engineering projects as usual, maybe educating ML practitioners about the existing best practices? Orchestration. Versioning.

IT

IT Testing Experimentation Software

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Choose Manage model access. Change the AWS Region to US West (Oregon).

Metadata

Metadata Data Lake Modeling Data Warehouse

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them.

Forecasting

Forecasting Management IoT Data-driven

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In this post, we show you how to establish the data ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup. With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand.

Analytics

Analytics Data Warehouse Metrics Big Data

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL. Matt Su is a Senior Product Manager on the AWS Glue team.

Data Integration

Data Integration Data Lake Data Warehouse Software

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In collaboration with AWS, BMS identified a business need to migrate and modernize their custom extract, transform, and load (ETL) platform to a native AWS solution to reduce complexities, resources, and investment to upgrade when new Spark, Python, or AWS Glue versions are released.

Metadata

Metadata Data Lake Visualization Data Quality

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Analytics is the means for discovering those insights, and doing it well requires the right tools for ingesting and preparing data, enriching and tagging it, building and sharing reports, and managing and protecting your data and insights. Azure Data Factory. Azure Data Lake Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives.

Visualization

Visualization Data Processing Testing Publishing

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

Additionally, this forecasting system needs to provide data enrichment steps including byproducts, serve as the master data around the semiconductor management, and enable further use cases at the BMW Group. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

This trend is no exception for Dafiti , an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. Amazon Redshift is widely used for Dafiti’s data analytics, supporting approximately 100,000 daily queries from over 400 users across three countries. We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. It provides data prep, management, and enterprise data warehousing tools. It has a data pipeline tool , as well. It does the job.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

My vision is that I can give the keys to my businesses to manage their data and run their data on their own, as opposed to the Data & Tech team being at the center and helping them out,” says Iyengar, director of Data & Tech at Straumann Group North America.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

AWS Glue manages running Spark and adjusts workers to achieve the best price performance. For workloads such as data transforms, joins, and queries, you can use G.1X Example: Memory-intensive transformations Data transformations are an essential step to preprocess and structure your data into an optimal form.

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The company needed a modern data architecture to manage the growing traffic effectively. .

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. Solution overview The new native OpenSearch Service connector is a powerful tool that can help organizations unlock the full potential of their data.

Analytics

Analytics IT Data Lake Visualization

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

It does this by helping teams handle the T in ETL (extract, transform, and load) processes. It allows users to write data transformation code, run it, and test the output, all within the framework it provides. This separation further simplifies data management and enhances the system’s overall performance.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

AI governance refers to the practice of directing, managing and monitoring an organization’s AI activities. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. It can be used with both on-premise and multi-cloud environments.

Risk

Risk Modeling Management Metadata

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. The Amazon Redshift integration for Apache Spark combined with AWS Glue or Amazon EMR performs transformations before loading data into Amazon Redshift. AWS Glue 4.0

IoT

IoT Data Warehouse Cost-Benefit Reporting

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. The SFTP connector is used to manage the connection to the SFTP server. Create the gateway endpoint.

Data Processing

Data Processing Visualization Data Lake Data Processing

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Enrich – Data Engineering (Apache Spark and Apache Hive). Predict – Data Engineering (Apache Spark). This is Now.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

The AWS Glue job uses the secure connection established by the VPC endpoints to access Snowflake data. Snowflake credentials are securely stored in AWS Secrets Manager. The AWS Glue job retrieves these credentials at runtime to authenticate and connect to Snowflake, providing secure access management.

Analytics

Analytics Data-driven Data Integration Data Lake

Turning the page

Cloudera

JUNE 1, 2021

This is an important milestone in Cloudera’s history, as we move beyond big data and “self-managed” services. These acquisitions usher in a new era of “ self-service ” by automating complex operations so customers can focus on building great data-driven apps instead of managing infrastructure. 650-644-3900.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, and document their data transformation and analytic pipelines on CDP Public Cloud, CDP One, and CDP Private Cloud. The Open Data Lakehouse . This variety can result in a lack of standardization, leading to data duplication and inconsistency.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

This allows data analysts and data scientists to rapidly construct the necessary data preparation steps to meet their business needs. Prerequisites For this tutorial, you need an S3 bucket to store output from the AWS Glue ETL job and Athena queries, and a Data Catalog database to create new tables. Choose Create role.

Interactive

Interactive Visualization Data Integration Statistics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

AWS Glue eliminates complexities and costs, allowing organizations to perform data integration tasks in minutes, boosting efficiency. This blog post explores the newly announced managed connector for Google BigQuery and demonstrates how to build a modern ETL pipeline with AWS Glue Studio without writing code.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

In this post, we dive deep into the tool, walking through all steps from log ingestion, transformation, visualization, and architecture design to calculate TCO. The tool provides a YARN log collector to connect Hadoop Resource Manager to collect YARN logs. About the authors Sungyoul Park is a Senior Practice Manager at AWS ProServe.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse: A mostly new platform.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

DataOps involves close collaboration between data scientists, IT professionals, and business stakeholders, and it often involves the use of automation and other technologies to streamline data-related tasks. One of the key benefits of DataOps is the ability to accelerate the development and deployment of data-driven solutions.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Monitor data pipelines in a serverless data lake

Webinars

Trending Sources

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Webinars

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Texas Rangers data transformation modernizes stadium operations

Bridging the gap between mainframe data and hybrid cloud environments

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How EUROGATE established a data mesh architecture using Amazon DataZone

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

MLOps and DevOps: Why Data Makes It Different

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Reference guide to build inventory management and forecasting solutions on AWS

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Introducing Amazon Q data integration in AWS Glue

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

7 key Microsoft Azure analytics services (plus one extra)

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How the BMW Group analyses semiconductor demand with AWS Glue

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Straumann Group is transforming dentistry with data, AI

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Connecting the Data Lifecycle

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

How to use foundation models and trusted governance to manage AI workflow risk

Amazon Redshift data ingestion options

Use AWS Glue to streamline SFTP data processing

Happy Birthday, CDP Public Cloud

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Turning the page

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Unlock scalable analytics with AWS Glue and Google BigQuery

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Data platform trinity: Competitive or complementary?

An AI Chat Bot Wrote This Blog Post …

Stay Connected