Data Lake, Data Transformation and Events

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. Event-driven architectures – The enhanced API facilitates seamless integration with external events, enabling the triggering of Airflow DAGs based on these events.

Interactive

Interactive Testing Data-driven Data Lake

Monitor data pipelines in a serverless data lake

AWS Big Data

AUGUST 9, 2023

The combination of a data lake in a serverless paradigm brings significant cost and performance benefits. By monitoring application logs, you can gain insights into job execution, troubleshoot issues promptly to ensure the overall health and reliability of data pipelines.

Data Lake

Data Lake Metrics Testing Cost-Benefit

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements.

Data Integration

Data Integration Visualization Data Processing Big Data

Texas Rangers data transformation modernizes stadium operations

CIO Business Intelligence

OCTOBER 18, 2022

Resultant recommended a new, on-prem data infrastructure, complete with data lakes to provide stake holders with a better way to manage data reliability, accuracy, and timeliness. The team can also use the data to enhance the fan experience. Now, Noel says, they want it for every game, every event, every concert.

Data Transformation

Data Transformation Consulting Data Lake Reporting

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable).

Data Lake

Data Lake Snapshot Optimization Data Transformation

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

But the features in Power BI Premium are now more powerful than the functionality in Azure Analysis Services, so while the service isn’t going away, Microsoft will offer an automated migration tool in the second half of this year for customers who want to move their data models into Power BI instead. Azure Data Factory.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Transformed data can be stored in Amazon S3.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Choose the Job details tab.

Data Processing

Data Processing Visualization Data Lake Data Processing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. Education and enablement – Conduct learning interventions to upskill teams on understanding and using the data as a product approach.

Metadata

Metadata Data Governance Data Quality Data-driven

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow is a fully managed integration service that you can use to securely transfer data from software as a service (SaaS) applications, such as Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, in just a few clicks.

Analytics

Analytics Data Warehouse Big Data Metrics

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. If storing operational data in a data warehouse is a requirement, synchronization of tables between operational data stores and Amazon Redshift tables is supported.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Turning the page

Cloudera

JUNE 1, 2021

These acquisitions usher in a new era of “ self-service ” by automating complex operations so customers can focus on building great data-driven apps instead of managing infrastructure. Datacoral powers fast and easy data transformations for any type of data via a robust multi-tenant SaaS architecture that runs in AWS.

Uncertainty

Uncertainty Cost-Benefit Risk Strategy

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

AWS Big Data

FEBRUARY 6, 2023

Transform the YARN job history logs from JSON to CSV After obtaining YARN logs, you run a YARN log organizer, yarn-log-organizer.py, which is a parser to transform JSON-based logs to CSV files. The parser also has other capabilities, including sorting events by time, removing dedicates, and merging multiple logs.

Dashboards

Dashboards Optimization Data Lake Cost-Benefit

Automate alerting and reporting for AWS Glue job resource usage

AWS Big Data

MAY 25, 2023

Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset.

Reporting

Reporting Metrics Optimization Data Lake

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. The data scientists and IT professionals were amazed, and they couldn’t believe their eyes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

Curated foundation models, such as those created by IBM or Microsoft, help enterprises scale and accelerate the use and impact of the most advanced AI capabilities using trusted data. In addition to natural language, models are trained on various modalities, such as code, time-series, tabular, geospatial and IT events data.

Risk

Risk Modeling Management Metadata

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In our solution, we create a notebook to access automotive sensor data, enrich the data, and send the enriched output from the Kinesis Data Analytics Studio notebook to an Amazon Kinesis Data Firehose delivery stream for delivery to an Amazon Simple Storage Service (Amazon S3) data lake.

Data Analytics

Data Analytics Analytics IoT Data Lake

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021!

Modeling

Modeling Big Data IoT Data Warehouse

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The project’s primary objectives were to maintain 100% functionality of the EMR during planned failover events; achieving a recovery point objective of less than one minute; and meet a recovery time objective of two hours for critical services. It also crafted multiple machine learning and AI models to tackle business challenges. “We

IT

IT Insurance Cost-Benefit Testing

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

Trino allows users to run ad hoc queries across massive datasets, making real-time decision-making a reality without needing extensive data transformations. This is particularly valuable for teams that require instant answers from their data. Data Lake Analytics: Trino doesn’t just stop at databases.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

Second, because traditional data warehousing approaches are unable to keep up with the volume, velocity, and variety of data, engineering teams are building data lakes and adopting open data formats such as Parquet and Apache Iceberg to store their data. Choose Create Firehose stream.

Metadata

Metadata Data Lake Management Internet of Things

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

Key services in the solution include Amazon API Gateway , Amazon Data Firehose , and Amazon Location Service. The challenge In the event of a disaster e.g. water flood, there is usually a lack of terrestrial data connectivity that prevents monitoring stations from taking actionable measures in real time.

Data Lake

Data Lake Metadata Testing Data-driven

Data Leaders Brief

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Webinars

Trending Sources

Monitor data pipelines in a serverless data lake

Webinars

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Texas Rangers data transformation modernizes stadium operations

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

How EUROGATE established a data mesh architecture using Amazon DataZone

7 key Microsoft Azure analytics services (plus one extra)

Build a data lake with Apache Flink on Amazon EMR

Data’s dark secret: Why poor quality cripples AI and growth

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Use AWS Glue to streamline SFTP data processing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Amazon Redshift data ingestion options

Reference guide to build inventory management and forecasting solutions on AWS

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Turning the page

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Deep dive into the AWS ProServe Hadoop Migration Delivery Kit TCO tool

Automate alerting and reporting for AWS Glue job resource usage

An AI Chat Bot Wrote This Blog Post …

How to use foundation models and trusted governance to manage AI workflow risk

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Building Better Data Models to Unlock Next-Level Intelligence

CIO 100 Award winners drive business results with IT

What is a Data Pipeline?

What is Data Mapping?

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected