Data Lake, Data Transformation and Management

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. This enables you to extract insights from your data without the complexity of managing infrastructure.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

CIO Business Intelligence

AUGUST 9, 2024

The dashboard now in production uses Databricks’ Azure data lake to ingest, clean, store, and analyze the data, and Microsoft’s Power BI to generate graphical analytics that present critical operational data in a single view, such as the number of flights coming into domestic and international terminals and average security wait times.

Data Transformation

Data Transformation Machine Learning Data Lake Dashboards

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

With the new stadium on the horizon, the team needed to update existing IT systems and manual business and IT processes to handle the massive volumes of new data that would soon be at their fingertips. “In Analytics, Data Management Some of our systems were old. They want that information,” she says.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

Additionally, integrating mainframe data with the cloud enables enterprises to feed information into data lakes and data lake houses, which is ideal for authorized data professionals to easily leverage the best and most modern tools for analytics and forecasting.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that builds upon Apache Airflow, offering its benefits while eliminating the need for you to set up, operate, and maintain the underlying infrastructure, reducing operational overhead while increasing security and resilience.

Interactive

Interactive Testing Data-driven Data Lake

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. In the JDBC parameters dialog box, select Using IDC auth and copy the JDBC URL. The following screenshot shows the dialog box.

Analytics

Analytics Visualization Data Governance Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Managing projects with Jira leads to rich datasets, which can provide historical and predictive insights about project and development efforts. An AWS account and a login with access to the AWS Management Console.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially. datazone_env_twinsimsilverdata"."cycle_end";')

IoT

IoT Machine Learning Metadata Data-driven

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Navigate to the Athena console and choose Query editor.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

Since software engineers manage to build ordinary software without experiencing as much pain as their counterparts in the ML department, it begs the question: should we just start treating ML projects as software engineering projects as usual, maybe educating ML practitioners about the existing best practices? Orchestration. Versioning.

IT

IT Testing Experimentation Software

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Choose Manage model access. Change the AWS Region to US West (Oregon).

Metadata

Metadata Data Lake Modeling Data Warehouse

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

We also examine how centralized, hybrid and decentralized data architectures support scalable, trustworthy ecosystems. As data-centric AI, automated metadata management and privacy-aware data sharing mature, the opportunity to embed data quality into the enterprises core has never been more significant.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In this post, we show you how to establish the data ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup. With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand.

Analytics

Analytics Data Warehouse Big Data Metrics

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Analytics is the means for discovering those insights, and doing it well requires the right tools for ingesting and preparing data, enriching and tagging it, building and sharing reports, and managing and protecting your data and insights. Azure Data Factory. Azure Data Lake Analytics.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Apache Hudi also has its own catalog management.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them.

Forecasting

Forecasting Management IoT Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights. This enables global discoverability and collaboration without centralizing ownership or operations.

Metadata

Metadata Data Governance Data Quality Data-driven

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL. Matt Su is a Senior Product Manager on the AWS Glue team.

Data Integration

Data Integration Data Lake Data Warehouse Software

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In collaboration with AWS, BMS identified a business need to migrate and modernize their custom extract, transform, and load (ETL) platform to a native AWS solution to reduce complexities, resources, and investment to upgrade when new Spark, Python, or AWS Glue versions are released.

Metadata

Metadata Data Lake Visualization Data Quality

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives.

Visualization

Visualization Data Processing Testing Publishing

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

Additionally, this forecasting system needs to provide data enrichment steps including byproducts, serve as the master data around the semiconductor management, and enable further use cases at the BMW Group. To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. It provides data prep, management, and enterprise data warehousing tools. It has a data pipeline tool , as well. It does the job.

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

This trend is no exception for Dafiti , an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. Amazon Redshift is widely used for Dafiti’s data analytics, supporting approximately 100,000 daily queries from over 400 users across three countries. We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

My vision is that I can give the keys to my businesses to manage their data and run their data on their own, as opposed to the Data & Tech team being at the center and helping them out,” says Iyengar, director of Data & Tech at Straumann Group North America.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

It also helps him democratize credit union data so it can be used to improve customer service, automate the maintenance of such data by making various types of data easier to find, and provide chains of custody and audit controls to help meet regulatory needs.

Analytics

Analytics Data Lake Metadata Cost-Benefit

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

AWS Glue manages running Spark and adjusts workers to achieve the best price performance. For workloads such as data transforms, joins, and queries, you can use G.1X Example: Memory-intensive transformations Data transformations are an essential step to preprocess and structure your data into an optimal form.

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu with CDP today have deployed Data Lakes and. Figure 1: CDE containerized service for operational management of spark workloads. Integrated security model .

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

Data transforms businesses. That’s where the data lifecycle comes into play. Managing data and its flow, from the edge to the cloud, is one of the most important tasks in the process of gaining data intelligence. . The company needed a modern data architecture to manage the growing traffic effectively. .

Data Lake

Data Lake Data Warehouse Data Architecture Reporting

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. Solution overview The new native OpenSearch Service connector is a powerful tool that can help organizations unlock the full potential of their data.

Analytics

Analytics IT Data Lake Visualization

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

AI governance refers to the practice of directing, managing and monitoring an organization’s AI activities. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. It can be used with both on-premise and multi-cloud environments.

Risk

Risk Modeling Management Metadata

At AstraZeneca, data and AI are more than game changers – they are life changers

CIO Business Intelligence

OCTOBER 11, 2022

This initiative alone has generated an explosion in the quantity and complexity of data the company collects, stores, and analyzes for insights. . “We Four ways to improve data-driven business transformation . Gurinder Kaur, Vice President of Operations IT, AstraZeneca AstraZeneca. Start small, think big, and scale fast. “You

Machine Learning

Machine Learning Data Science Data-driven Testing

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

It does this by helping teams handle the T in ETL (extract, transform, and load) processes. It allows users to write data transformation code, run it, and test the output, all within the framework it provides. This separation further simplifies data management and enhances the system’s overall performance.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. The Amazon Redshift integration for Apache Spark combined with AWS Glue or Amazon EMR performs transformations before loading data into Amazon Redshift. AWS Glue 4.0

IoT

IoT Data Warehouse Cost-Benefit Reporting

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Hub: a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. . Enrich – Data Engineering (Apache Spark and Apache Hive). Predict – Data Engineering (Apache Spark). This is Now.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Trending Sources

Monitor data pipelines in a serverless data lake

Webinars

Data transformation takes flight at Atlanta’s Hartsfield-Jackson airport

Texas Rangers data transformation modernizes stadium operations

Bridging the gap between mainframe data and hybrid cloud environments

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

How EUROGATE established a data mesh architecture using Amazon DataZone

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

MLOps and DevOps: Why Data Makes It Different

How to modernize data lakes with a data lakehouse architecture

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Data’s dark secret: Why poor quality cripples AI and growth

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

7 key Microsoft Azure analytics services (plus one extra)

Build a data lake with Apache Flink on Amazon EMR

Reference guide to build inventory management and forecasting solutions on AWS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Introducing Amazon Q data integration in AWS Glue

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How the BMW Group analyses semiconductor demand with AWS Glue

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Straumann Group is transforming dentistry with data, AI

Lay the groundwork now for advanced analytics and AI

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Connecting the Data Lifecycle

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

How to use foundation models and trusted governance to manage AI workflow risk

At AstraZeneca, data and AI are more than game changers – they are life changers

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Amazon Redshift data ingestion options

Happy Birthday, CDP Public Cloud

Stay Connected