Data Lake, Interactive and Workshop

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

This blog aims to answer two questions: What is a universal data distribution service? Why does every organization need it when using a modern data stack? Every organization on the hybrid cloud journey needs the ability to take control of their data flows from origination through all points of consumption.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Federated Learning, Machine Learning, Decentralized Data

Cloudera

DECEMBER 8, 2020

Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place — on smartphones, industrial sensing equipment, and other edge devices — and models are trained on-device. The Turbofan Tycoon prototype.

Machine Learning

Machine Learning Data Lake Reporting Data Collection

Webinars

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

This blog aims to answer two questions: What is a universal data distribution service? Why does every organization need it when using a modern data stack? Every organization on the hybrid cloud journey needs the ability to take control of their data flows from origination through all points of consumption.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Extend your data mesh with Amazon Athena and federated views

AWS Big Data

JULY 28, 2023

Amazon Athena is a serverless, interactive analytics service built on the Trino, PrestoDB, and Apache Spark open-source frameworks. Recently, Athena added support for creating and querying views on federated data sources to bring greater flexibility and ease of use to use cases such as interactive analysis and business intelligence reporting.

Big Data

Big Data Data Architecture Data Lake Interactive

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on.

Metadata

Metadata Data Lake Modeling Data Warehouse

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

AWS Big Data

AUGUST 22, 2023

AWS Glue integrates seamlessly with AWS services like Amazon S3, Amazon Relational Database Service (Amazon RDS), Amazon Redshift , Amazon DynamoDB , Amazon Kinesis Data Streams , and Amazon DocumentDB (with MongoDB compatibility) to offer a robust, cloud-native data integration solution.

Data Processing

Data Processing Testing Data Lake Data Integration

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

Prerequisites Before setting up the CloudFormation stacks, you must have an AWS account and an AWS Identity and Access Management (IAM) user with sufficient permissions to interact with the AWS Management Console and the services listed in the architecture. About the author Sandeep Bajwa is a Sr.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

JUNE 6, 2023

In the following code, replace the EKS endpoint as well as the S3 bucket then run the script: /bin/spark-submit --class ValueZones --master k8s://EKS-ENDPOINT --conf spark.kubernetes.namespace=data-team-a --conf spark.kubernetes.container.image=608033475327.dkr.ecr.us-west-1.amazonaws.com/spark/emr-6.10.0:latest amazonaws.com/spark/emr-6.10.0:latest

Optimization

Optimization Data Lake Cost-Benefit Management

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

This post shows you how to integrate Apache Flink in Amazon EMR with the AWS Glue Data Catalog so that you can ingest streaming data in real time and access the data in near-real time for business analysis. For data read/write, Flink has the interface DynamicTableSourceFactory for read and DynamicTableSinkFactory for write.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Putting your data to work with generative AI – Innovation Talk Thursday, November 30 | 12:30 – 1:30 PM PST | The Venetian Join Mai-Lan Tomsen Bukovec, Vice President, Technology at AWS to learn how you can turn your data lake into a business advantage with generative AI. Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? You cannot get away from a formalized delivery capability focused on regular, scheduled, structured and reasonably governed data. Data lakes don’t offer this nor should they. E.g. Data Lakes in Azure – as SaaS.

Data Analytics

Data Analytics Analytics Data-driven Finance

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

SSB provides a comprehensive interactive user interface for developers, data analysts, and data scientists to write streaming applications with industry standard SQL. By using SQL, the user can simply declare expressions that filter, aggregate, route, and mutate streams of data. Without context, streaming data is useless.”

Data Lake

Data Lake Manufacturing Metadata Dashboards

Data Leaders Brief

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Moving Enterprise Data From Anywhere to Any System Made Easy

Webinars

Trending Sources

Federated Learning, Machine Learning, Decentralized Data

Webinars

Moving Enterprise Data From Anywhere to Any System Made Easy

Extend your data mesh with Amazon Athena and federated views

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Lake Formation 2022 year in review

Build a data lake with Apache Flink on Amazon EMR

Your guide to AWS Analytics at AWS re:Invent 2023

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Turning Streams Into Data Products

Stay Connected