Data Lake and Workshop - Data Leaders Brief

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). The first task performs an initial copy of the full data into an S3 folder.

Data Lake

Data Lake Snapshot Optimization Data Transformation

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on.

Metadata

Metadata Data Lake Modeling Data Warehouse

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

2:30 PM – 3:30 PM (PDT) Mandalay Bay ANT335 | Get the most out of your data warehousing workloads. 5:30 PM – 6:30 PM (PDT) Ceasars Forum ANT349-R | Advanced real-time analytics and ML in your data warehouse [REPEAT]. 2:30 PM – 3:30 PM (PDT) Mandalay Bay ANT335 | Get the most out of your data warehousing workloads.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

Verify all table metadata is stored in the AWS Glue Data Catalog. Consume data with Athena or Amazon EMR Trino for business analysis. Update and delete source records in Amazon RDS for MySQL and validate the reflection of the data lake tables. the Flink table API/SQL can integrate with the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Altron is a pioneer of providing data-driven solutions for their customers by combining technical expertise with in-depth customer understanding to provide highly differentiated technology solutions. Data quality for account and customer data – Altron wanted to enable data quality and data governance best practices.

Optimization

Optimization B2B Data Quality Sales

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

This blog aims to answer two questions: What is a universal data distribution service? Why does every organization need it when using a modern data stack? Every organization on the hybrid cloud journey needs the ability to take control of their data flows from origination through all points of consumption.

Enterprise

Enterprise Data Lake Data Collection Data-driven

What’s cooking with Amazon Redshift at AWS re:Invent 2023

AWS Big Data

NOVEMBER 15, 2023

Sessions can be big room breakout sessions, usually with a customer speaker, or more intimate and technical chalk talks, workshops, or builder sessions. Take a look, plan your week, and soak in the learning!

Data Lake

Data Lake Data Warehouse B2B Deep Learning

How Novanta’s CIO mobilized its data-driven transformation

CIO Business Intelligence

MAY 10, 2023

On investing in capabilities: We’ve set up something called a BI Center of Excellence where we train and have workshops and seminars on a monthly basis that team members across Novanta can join to learn about how they could leverage data marts or data sources to build their own reporting.

Data-driven

Data-driven IT Digital Transformation Data Governance

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

This blog aims to answer two questions: What is a universal data distribution service? Why does every organization need it when using a modern data stack? Every organization on the hybrid cloud journey needs the ability to take control of their data flows from origination through all points of consumption.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Putting your data to work with generative AI – Innovation Talk Thursday, November 30 | 12:30 – 1:30 PM PST | The Venetian Join Mai-Lan Tomsen Bukovec, Vice President, Technology at AWS to learn how you can turn your data lake into a business advantage with generative AI. Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Federated Learning, Machine Learning, Decentralized Data

Cloudera

DECEMBER 8, 2020

Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place — on smartphones, industrial sensing equipment, and other edge devices — and models are trained on-device.

Machine Learning

Machine Learning Data Lake Reporting Data Collection

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

Innovate What’s Next: How Living Labs Brings Ideas to Life

CIO Business Intelligence

APRIL 6, 2022

We are centered around co-creating with customers and promoting a systematic and scalable innovation approach to solve real-world customers problems—similar to Toyota leveraging Infosys Cobalt to modernize its vehicle data warehouse into a next-generation data lake on AWS. .

Experimentation

Experimentation Uncertainty Data Lake Enterprise

Extend your data mesh with Amazon Athena and federated views

AWS Big Data

JULY 28, 2023

Clean up To clean up the resources created for this post, complete the following steps: On the Amazon S3 console, empty the bucket athena-federation-workshop-. If you’re using the AWS CLI, delete the objects in the athena-federation-workshop- bucket with the following code. Let’s dive into the solution.

Big Data

Big Data Data Architecture Data Lake Interactive

Belcorp reimagines R&D with AI

CIO Business Intelligence

JUNE 28, 2023

“We transferred our lab data—including safety, sensory efficacy, toxicology tests, product formulas, ingredients composition, and skin, scalp, and body diagnosis and treatment images—to our AWS data lake,” Gopalan says. This allowed us to derive insights more easily.”

Digital Transformation

Digital Transformation Cost-Benefit Informatics Data mining

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

During a customer workshop, Laila, as a seasoned former DBA, made the following commentary that we often hear from our customers: “Streaming data has little value unless I can easily integrate, join, and mesh those streams with the other data sources that I have in my warehouse, relational databases and data lake.

Data Lake

Data Lake Manufacturing Metadata Dashboards

How Aura from Unity revolutionized their big data pipeline with Amazon Redshift Serverless

AWS Big Data

APRIL 4, 2024

Amazon Redshift is a recommended service for online analytical processing (OLAP) workloads such as cloud data warehouses, data marts, and other analytical data stores. You can use simple SQL to analyze structured and semi-structured data, operational databases, and data lakes to deliver the best price/performance at any scale.

Big Data

Big Data Data Warehouse Advertising OLAP

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

AWS Big Data

AUGUST 22, 2023

Gain a high-level understanding of AWS Glue and its components by using the following hands-on workshop. Vivek Shrivastava is a Principal Data Architect, Data Lake in AWS Professional Services. He is a big data enthusiast and holds 14 AWS Certifications.

Data Processing

Data Processing Testing Data Lake Data Integration

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

For more information about automating dashboard deployment, customizing access to the QuickSight console, configuring for team collaboration, and implementing multi-tenancy and client user segregation, check out the videos Virtual Admin Workshop: Working with Amazon QuickSight APIs and Admin Level-Up Virtual Workshop, V2 on YouTube.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

AWS Big Data

JUNE 6, 2023

In the following code, replace the EKS endpoint as well as the S3 bucket then run the script: /bin/spark-submit --class ValueZones --master k8s://EKS-ENDPOINT --conf spark.kubernetes.namespace=data-team-a --conf spark.kubernetes.container.image=608033475327.dkr.ecr.us-west-1.amazonaws.com/spark/emr-6.10.0:latest amazonaws.com/spark/emr-6.10.0:latest

Optimization

Optimization Data Lake Cost-Benefit Management

Digital transformation: nei progetti a tutto campo la chiave è il change management

CIO Business Intelligence

FEBRUARY 12, 2024

La trasformazione digitale implica il passaggio graduale alla nuova data platform per raccogliere e aggregare i dati dal data lake (con sistemi BIM, Business Information Modelling) e poi metterli su cruscotti e condurre le analisi con la business intelligence.

Digital Transformation

Digital Transformation Management Business Intelligence Internet of Things

Build a real-time analytics solution with Apache Pinot on AWS

AWS Big Data

AUGUST 6, 2024

git clone [link] cd near-realtime-apache-pinot-workshop npm i Deploy the AWS CDK stack to create the AWS Cloud infrastructure by running the following command and enter y when prompted. Enter the IP address that you want to use to access the Apache Pinot controller and broker in /32 subnet mask format.

OLAP

OLAP Analytics Visualization Dashboards

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Analytics Tactics (known outcome/known data/BI/analytics v unknown outcome/unknown data/data science/ML) 11. Data Hub Strategy 10. Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Business Innovation with D&A 6.

IT

IT Data Lake Data Science Strategy

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? You cannot get away from a formalized delivery capability focused on regular, scheduled, structured and reasonably governed data. Data lakes don’t offer this nor should they. E.g. Data Lakes in Azure – as SaaS.

Data Analytics

Data Analytics Analytics Data-driven Finance

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

AWS Big Data

NOVEMBER 6, 2024

Second, because traditional data warehousing approaches are unable to keep up with the volume, velocity, and variety of data, engineering teams are building data lakes and adopting open data formats such as Parquet and Apache Iceberg to store their data.

Metadata

Metadata Data Lake Management Internet of Things

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Data Leaders Brief

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Webinars

Trending Sources

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Your guide to AWS Analytics at AWS re:Invent 2023

Build a data lake with Apache Flink on Amazon EMR

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Lake Formation 2022 year in review

Moving Enterprise Data From Anywhere to Any System Made Easy

What’s cooking with Amazon Redshift at AWS re:Invent 2023

How Novanta’s CIO mobilized its data-driven transformation

Moving Enterprise Data From Anywhere to Any System Made Easy

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Federated Learning, Machine Learning, Decentralized Data

Reference guide to build inventory management and forecasting solutions on AWS

Innovate What’s Next: How Living Labs Brings Ideas to Life

Extend your data mesh with Amazon Athena and federated views

Belcorp reimagines R&D with AI

Turning Streams Into Data Products

How Aura from Unity revolutionized their big data pipeline with Amazon Redshift Serverless

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit

Digital transformation: nei progetti a tutto campo la chiave è il change management

Build a real-time analytics solution with Apache Pinot on AWS

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Stay Connected