Data Lake, Data Transformation and Interactive

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. Airflow REST API The Airflow REST API is a programmatic interface that allows you to interact with Airflow’s core functionalities.

Interactive

Interactive Testing Data-driven Data Lake

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

The AWS Glue Studio visual editor is a graphical interface that enables you to create, run, and monitor data integration jobs in AWS Glue. The new data preparation interface in AWS Glue Studio provides an intuitive, spreadsheet-style view for interactively working with tabular data. Choose Create policy.

Interactive

Interactive Visualization Data Integration Statistics

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Maintaining lists of possible values for the columns requires continuous updates.

Metadata

Metadata Data Lake Modeling Data Warehouse

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. Generated jobs can use a variety of data transformations, including filter, project, union, join, and custom user-supplied SQL. Configure an IAM role to interact with Amazon Q.

Data Integration

Data Integration Data Lake Data Warehouse Software

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

But the features in Power BI Premium are now more powerful than the functionality in Azure Analysis Services, so while the service isn’t going away, Microsoft will offer an automated migration tool in the second half of this year for customers who want to move their data models into Power BI instead. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The Perilous State of Today’s Data Environments Data teams often navigate a labyrinth of chaos within their databases. Extrinsic Control Deficit: Many of these changes stem from tools and processes beyond the immediate control of the data team.

Data Quality

Data Quality Testing Data Lake Data Integration

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

For workloads such as data transforms, joins, and queries, you can use G.1X With exponentially growing data sources and data lakes, customers want to run more data integration workloads, including their most demanding transforms, aggregations, joins, and queries. 1X (1 DPU) and G.2X You can enable G.4X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub. In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This report is essential for understanding revenue streams, identifying opportunities for optimization, and making data-driven decisions regarding pricing and promotions. Refer to Editing AWS Glue managed data transform nodes for more information. Stop any AWS Glue interactive sessions.

Analytics

Analytics Data-driven Data Integration Data Lake

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

AWS Big Data

DECEMBER 5, 2023

To fill in the gaps in existing data, HR&A creates digital equity surveys to build a more complete picture before developing digital equity plans. HR&A has used Amazon Redshift Serverless and CARTO to process survey findings more efficiently and create custom interactive dashboards to facilitate understanding of the results.

Measurement

Measurement Dashboards Data Warehouse Analytics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Data engineers are crucial for schema conversion and data transformation, and DBAs can handle cluster configuration and workload monitoring. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

Solution overview For our use case, we use several AWS services to stream, ingest, transform, and analyze sample automotive sensor data in real time using Kinesis Data Analytics Studio. Kinesis Data Analytics Studio allows us to create a notebook, which is a web-based development environment. View the stream data.

Data Analytics

Data Analytics Analytics IoT Data Lake

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

They are used in everything from robotics to tools that reason and interact with humans. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

Risk

Risk Modeling Management Metadata

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases. offers a Prompt Lab, where users can interact with different prompts using prompt engineering on generative AI models for both zero-shot prompting and few-shot prompting.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

This post shows you how to integrate Apache Flink in Amazon EMR with the AWS Glue Data Catalog so that you can ingest streaming data in real time and access the data in near-real time for business analysis. For data read/write, Flink has the interface DynamicTableSourceFactory for read and DynamicTableSinkFactory for write.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In the past, First Service Credit Union’s Chief data officer Ty Robbins struggled to integrate data from the legacy, non-relational, and often proprietary tabular databases on which many credit unions run. After moving its expensive, on-premise data lake to the cloud, Comcast created a three-tiered architecture.

Analytics

Analytics Data Lake Metadata Cost-Benefit

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Modak Nabu reliably curates datasets for any line of business and personas, from business analysts to data scientists. Customers using Modak Nabu with CDP today have deployed Data Lakes and. This is the scale and speed that cloud-native solutions can provide — and Modak Nabu with CDP has been delivering the same.

Data Lake

Data Lake Cost-Benefit Data-driven Dashboards

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The initiative has enhanced coordination, as automation APIs facilitate interaction with security tools as well as streamline coordination and enhance mitigation responses. This is a new way to interact with the web and search. It then built a cutting-edge cloud-based analytics platform, designed with an innovative data architecture.

IT

IT Insurance Cost-Benefit Testing

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Business analysts use SageMaker Canvas to build ML models and generate predictions without needing to write code.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad built new customer-facing embedded dashboards within Showpad eOSTM and migrated its legacy dashboards to Amazon QuickSight , a unified BI service providing modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

Using AWS Glue , a serverless data integration service, companies can streamline this process, integrating data from internal and external sources into a centralized AWS data lake. From there, they can perform meaningful analytics, gain valuable insights, and optionally push enriched data back to external SaaS platforms.

Data Lake

Data Lake Testing Data Integration Metadata

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

Amazon EMR has long been the leading solution for processing big data in the cloud. Amazon EMR is the industry-leading big data solution for petabyte-scale data processing, interactive analytics, and machine learning using over 20 open source frameworks such as Apache Hadoop , Hive, and Apache Spark.

Big Data

Big Data Data Analytics Analytics Interactive

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Data Leaders Brief

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Webinars

Trending Sources

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Webinars

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Introducing Amazon Q data integration in AWS Glue

7 key Microsoft Azure analytics services (plus one extra)

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

How the BMW Group analyses semiconductor demand with AWS Glue

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

How HR&A uses Amazon Redshift spatial analytics on Amazon Redshift Serverless to measure digital equity in states across the US

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Data platform trinity: Competitive or complementary?

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

How to use foundation models and trusted governance to manage AI workflow risk

Exploring the AI and data capabilities of watsonx

Build a data lake with Apache Flink on Amazon EMR

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How to modernize data lakes with a data lakehouse architecture

Lay the groundwork now for advanced analytics and AI

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

CIO 100 Award winners drive business results with IT

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Connect your data for faster decisions with AWS

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

Introducing the HubSpot connector for AWS Glue

Hybrid big data analytics with Amazon EMR on AWS Outposts

What is a Data Pipeline?

What is Data Mapping?

Stay Connected