Data Leaders Brief

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

This post explores how to start using Delta Lake UniForm on Amazon Web Services (AWS). Note that the extra package ( delta-iceberg ) is required to create a UniForm table in AWS Glue Data Catalog. Amazon S3 and AWS Glue Data Catalog : These are used to manage the underlying files and the catalog of the Delta Lake UniForm table.

Metadata

Metadata Data Warehouse Big Data Data Lake

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

You can use Amazon Redshift to analyze structured and semi-structured data and seamlessly query data lakes and operational databases, using AWS designed hardware and automated machine learning (ML)-based tuning to deliver top-tier price performance at scale. category"; Create a materialized view using the external schema.

Data Lake

Data Lake Data Warehouse Optimization Testing

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. These examples use synthetic datasets created in AWS Glue and Amazon S3. Table metadata is fetched from AWS Glue.

Metadata

Metadata Data Lake Modeling Data Warehouse

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Write queries faster with Amazon Q generative SQL for Amazon Redshift

AWS Big Data

NOVEMBER 7, 2024

Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. Your content processed by generative SQL is not stored or used by AWS for service improvement. Xiao Qin is a senior applied scientist with the Learned Systems Group (LSG) at Amazon Web Services (AWS).

Metadata

Metadata Sales Data Warehouse Optimization

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

To interact with and analyze data stored in Amazon Redshift, AWS provides the Amazon Redshift Query Editor V2 , a web-based tool that allows you to explore, analyze, and share data using SQL. The browser automatically submits this SAML assertion, sending an HTTP POST to the AWS SAML endpoint.

Sales

Sales Metadata Enterprise Testing

2021 Data/AI Salary Survey

O'Reilly on Data

SEPTEMBER 15, 2021

Cloud certifications, specifically in AWS and Microsoft Azure, were most strongly associated with salary increases. As we’ll see later, cloud certifications (specifically in AWS and Microsoft Azure) were the most popular and appeared to have the largest effect on salaries. The top certification was for AWS (3.9%

Machine Learning

Machine Learning Statistics Reporting Consulting

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

To implement this solution, complete the following steps: Set up Zero-ETL integration from the AWS Management Console for Amazon Relational Database Service (Amazon RDS). An AWS Identity and Access Management (IAM) user with sufficient permissions to interact with the AWS Management Console and related AWS services.

Data Warehouse

Data Warehouse Analytics Testing Sales

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

This is both frustrating for companies that would prefer making ML an ordinary, fuss-free value-generating function like software engineering, as well as exciting for vendors who see the opportunity to create buzz around a new category of enterprise software. The new category is often called MLOps. Software Development Layers.

IT

IT Testing Experimentation Software

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

AWS Big Data

DECEMBER 9, 2024

Prerequisites The following prerequisites are required for the use cases: An active AWS Account that provides access to AWS Glue , Amazon Simple Storage Service (Amazon S3) and AWS CloudFormation. Permissions to create and deploy AWS CloudFormation stacks. aws-bundle Jar. Open AWS Glue Studio console.

Snapshot

Snapshot Data Warehouse Data Lake Data Quality

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

Organizations must decide on their hosting provider, whether it be an on-prem setup, cloud solutions like AWS, GCP, Azure or specialized data platform providers such as Snowflake and Databricks. They must also select the data processing frameworks such as Spark, Beam or SQL-based processing and choose tools for ML.

Management

Management Data Governance Data Science Reporting

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

In this post, we dive into the newly released feature of Amazon Redshift Data API support for SSO, Amazon Redshift RBAC for row-level security (RLS) and column-level security (CLS), and trusted identity propagation with AWS IAM Identity Center to let corporate identities connect to AWS services securely.

Visualization

Visualization Sales Data Warehouse Management

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

Prerequisites Make sure your meet the following prerequisites: Have an AWS account. Create an AWS Identity and Access Management (IAM) role. If youre creating an Amazon Bedrock knowledge base through the AWS Management Console , you can skip the service role setup mentioned in the previous section. Choose Test.

Structured Data

Structured Data Data Warehouse Analytics Finance

5 key areas for tech leaders to watch in 2020

O'Reilly on Data

FEBRUARY 18, 2020

Starting with data engineering, the backbone of all data work (the category includes titles covering data management, i.e., relational databases, Spark, Hadoop, SQL, NoSQL, etc.). This slowdown suggests that cloud as a category has achieved such a large share that (mathematically) any additional growth must occur at a slower rate.

Data-driven

Data-driven Software Statistics Marketing

Simplify your query performance diagnostics in Amazon Redshift with Query profiler

AWS Big Data

OCTOBER 23, 2024

The performance data you can use on the Amazon Redshift console falls into two categories: Amazon CloudWatch metrics – Helps you monitor the physical aspects of your cluster or serverless, such as resource utilization, latency, and throughput. Ekta Ahuja is an Amazon Redshift Specialist Solutions Architect at AWS.

Data Warehouse

Data Warehouse Metrics Broadcasting Dashboards

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

Amazon Redshift is a fully managed data warehouse service offered by Amazon Web Services (AWS). The conversion rules in BladeBridge’s configuration file fall into one of three categories: Line substitution Block substitution Function substitution Every line ending with a ; is a statement.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

AWS revenue growth stabilizes with a boost from generative AI-led services

CIO Business Intelligence

OCTOBER 27, 2023

AWS posted a stable 12% revenue growth in the third quarter of 2023 buoyed by demand for generative AI-led services, despite customers trying to optimize their cloud spending. For the last few sequential quarters, revenue growth for AWS has been on a constant decline. AWS posted revenue of $23.06

Optimization

Optimization Reporting Management IT

Enhance data security with fine-grained access controls in Amazon DataZone

AWS Big Data

JULY 2, 2024

The row and column asset filters in Amazon DataZone enable you to control who can access what using a consistent, business user-friendly mechanism for all of your data across AWS data lakes and data warehouses. The customer has multiple product categories, each operated by different divisions of the company.

Sales

Sales Data Lake Publishing Data Warehouse

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Because it is such a new category, both overly narrow and overly broad definitions of DataOps abound. AWS Code Deploy. AWS Code Pipeline. AWS Code Commit – A fully-managed source control service that hosts secure Git-based repositories. To date, we count over 100 companies in the DataOps ecosystem. Azure DevOps.

Testing

Testing Machine Learning Consulting Data Science

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

MARCH 4, 2024

This post demonstrates how you can harness Iceberg, Amazon Simple Storage Service (Amazon S3), AWS Glue , AWS Lake Formation , and AWS Identity and Access Management (IAM) to implement a transactional data lake supporting seamless evolution. Merge the data from the Dropzone location into Iceberg using AWS Glue.

Snapshot

Snapshot Data Lake Metadata Recreation/Entertainment

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

In this post, we showcase how to use AWS Glue with AWS Glue Data Quality , sensitive data detection transforms , and AWS Lake Formation tag-based access control to automate data governance. We use AWS CloudFormation to provision the resources. This gets tedious and delays the data adoption across the enterprise.

Data Quality

Data Quality Data Governance Data Lake Testing

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

Retrieve the Redshift endpoint by navigating to the Redshift Serverless or provisioned cluster in the AWS console. Configure the object, category, primary key, and fields: Set the object name and object API name. Set the category to specify the type of data to ingest. For more information, see Category.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

Amazon AppFlow , a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery. Next, provide AWS Glue Data Catalog settings to create a table for further analysis. Choose Create bucket. Choose Create bucket.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization. To learn more about the Cloud Data Hub, refer to BMW Group Uses AWS-Based Data Lake to Unlock the Power of Data.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

AWS Big Data

FEBRUARY 16, 2024

Redshift Spectrum uses the AWS Glue Data Catalog as a Hive metastore. AWS Lake Formation offers a straightforward and centralized approach to access management for S3 data sources. Lake Formation uses the AWS Glue Data Catalog to provide access control for Amazon S3. Lake Formation interface endpoint. Amazon S3 gateway endpoint.

Data Lake

Data Lake Data Warehouse Testing Business Objectives

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

In 2022, we announced that you can enforce fine-grained access control policies using AWS Lake Formation and query data stored in any supported file format using table formats such as Apache Iceberg , Apache Hudi, and more using Amazon Athena queries. An AWS Glue crawler is integrated on top of S3 buckets to automatically detect the schema.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

AWS Big Data

DECEMBER 15, 2023

In this post, we provide an automated solution to detect PII data in Amazon Redshift using AWS Glue. AWS Glue is a serverless data integration service that makes it straightforward to discover, prepare, and combine data for analytics, ML, and application development. Run an AWS Glue job to detect the PII data.

Data Lake

Data Lake Data Warehouse Big Data Structured Data

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

AWS Glue Data Quality reduces the effort required to validate data from days to hours, and provides computing recommendations, statistics, and insights about the resources required to run data validation. This post is Part 6 of a six-part series of posts to explain how AWS Glue Data Quality works.

Data Quality

Data Quality Measurement Testing Visualization

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

In Part 2 of this series, we discussed how to enable AWS Glue job observability metrics and integrate them with Grafana for real-time monitoring. In this post, we explore how to connect QuickSight to Amazon CloudWatch metrics and build graphs to uncover trends in AWS Glue job observability metrics.

Metrics

Metrics Visualization Dashboards Publishing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. The Iceberg table is synced with the AWS Glue Data Catalog.

Data Lake

Data Lake Sales Data Warehouse Snapshot

The rise of the data lakehouse: A new era of data value

CIO Business Intelligence

AUGUST 18, 2022

The result is an emerging paradigm shift in how enterprises surface insights, one that sees them leaning on a new category of technology architected to help organizations maximize the value of their data. Moonfare selected Dremio in a proof-of-concept runoff with AWS Athena, an interactive query service that enables SQL queries on S3 data.

Data Lake

Data Lake Data Warehouse Unstructured Data Business Intelligence

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Many AWS customers adopted Apache Hudi on their data lakes built on top of Amazon S3 using AWS Glue , a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

Data Lake

Data Lake Snapshot Metadata Optimization

The ChatGPT Surge

O'Reilly on Data

AUGUST 8, 2023

At its peak, ChatGPT was in very exclusive company: it’s not quite on the level of Python, Kubernetes, and Java, but it’s in the mix with AWS and React, and significantly ahead of Docker. Although large language models clearly fall into the category of NLP, we suspect that most users associate NLP with older approaches to building chatbots.

Machine Learning

Machine Learning Modeling Software IT

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

AWS Big Data

JULY 2, 2024

A full load is performed from SQL Server to Amazon Redshift using AWS Database Migration Service (AWS DMS). When Amazon EventBridge receives a full load completion notification from AWS DMS, ETL processes are run on Amazon Redshift to process data. AWS Step Functions is used to orchestrate this ETL pipeline.

Data Warehouse

Data Warehouse Sales Testing Big Data

AI for 3D Generative Design

Insight

MARCH 20, 2020

Specifically, I wanted to be able to generate objects from at least 10 different categories (the papers below capture only 2–3) and I wanted to develop the model architecture with the capacity to extend to unlabelled 3D shape data. From the 24 categories in PartNet I narrowed it down to 11 categories to use for my project.

Interactive

Interactive Modeling Machine Learning Testing

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

With OCSF support, the service can normalize and combine security data from AWS and a broad range of enterprise security data sources. We also walk you through how to use a series of prebuilt visualizations to view events across multiple AWS data sources provided by Security Lake.

Publishing

Publishing Dashboards Visualization Management

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

In 2022, Zurich began a multi-year program to accelerate their digital transformation and innovation through the migration of 1,000 applications to AWS, including core insurance and SAP workloads. In this post, we discuss how Zurich built a hybrid architecture on AWS incorporating AWS services to satisfy their requirements.

Insurance

Insurance Management Cost-Benefit Optimization

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The following figure summarizes the AWS services available to support the solution framework described so far. Application logic is currently implemented as a container, but it can be deployed with AWS Lambda as required. The catalog frontend application sends the user search to the generative AI application.

Management

Management Analytics Data Lake Interactive

Cost monitoring for Amazon EMR on Amazon EKS

AWS Big Data

JUNE 9, 2023

It also provides a wide variety of job submission methods, like an AWS API called StartJobRun, or through a declarative way with a Kubernetes controller through the AWS Controllers for Kubernetes for Amazon EMR on EKS. The supporting infrastructure for CUR is deployed as defined in Setting up Athena using AWS CloudFormation templates.

Reporting

Reporting Optimization Measurement Digital Transformation

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

AWS Big Data

NOVEMBER 17, 2023

Table metadata, such as column names and data types, is stored using the AWS Glue Data Catalog. The Athena DynamoDB connector runs in a pre-built, serverless AWS Lambda function. AWS Glue provides supplemental metadata from the DynamoDB table. Solution overview The following diagram illustrates the solution architecture.

Visualization

Visualization Metadata Testing Internet of Things

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

You can use AWS Glue Studio to set up data replication and mask PII with no coding required. AWS Glue Studio visual editor provides a low-code graphic environment to build, run, and monitor extract, transform, and load (ETL) scripts. Behind the scenes, AWS Glue handles underlying resource provisioning, job monitoring, and retries.

Visualization

Visualization Metadata Data Transformation Testing

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. Use case The Enterprise Data Analytics group of a large jewelry retailer embarked on their cloud journey with AWS in 2021.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

AWS Big Data

JANUARY 27, 2023

The Post Call Analytics (PCA) solution uses AWS machine learning (ML) services like Amazon Transcribe and Amazon Comprehend to extract insights from contact center call audio recordings uploaded after the call, or from integration with our companion Live Call Analytics (LCA) solution. You can apply data and agent filters for targeted search.

Analytics

Analytics Reporting Dashboards Visualization

Cloud Data Science News – Beta 8

Data Science 101

DECEMBER 27, 2019

AWS Deep Learning Containers now support Tensorflow 2.0 AWS Deep Learning Containers are docker images which are preconfigured for deep learning tasks. Build a custom classifier using AWS Comprehend AWS Comprehend is a Natural Language Processing (NLP) service. Here are the few bits of information I could find.

Data Science

Data Science Deep Learning Data-driven IT

AWS adds machine learning capabilities to Amazon Connect

CIO Business Intelligence

NOVEMBER 29, 2022

In a bid to help enterprises offer better customer service and experience , Amazon Web Services (AWS) on Tuesday, at its annual re:Invent conference, said that it was adding new machine learning capabilities to its cloud-based contact center service, Amazon Connect.

Machine Learning

Machine Learning Forecasting Recreation/Entertainment Metrics

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Webinars

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

2021 Data/AI Salary Survey

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

MLOps and DevOps: Why Data Makes It Different

Implement historical record lookup and Slowly Changing Dimensions Type-2 using Apache Iceberg

The future of data: A 5-pillar approach to modern data management

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

5 key areas for tech leaders to watch in 2020

Simplify your query performance diagnostics in Amazon Redshift with Query profiler

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS revenue growth stabilizes with a boost from generative AI-led services

Enhance data security with fine-grained access controls in Amazon DataZone

The DataOps Vendor Landscape, 2021

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

How the BMW Group analyses semiconductor demand with AWS Glue

Enhance data security and governance for Amazon Redshift Spectrum with VPC endpoints

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

Measure performance of AWS Glue Data Quality for ETL pipelines

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

The rise of the data lakehouse: A new era of data value

Introducing Apache Hudi support with AWS Glue crawlers

The ChatGPT Surge

Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

AI for 3D Generative Design

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

How Zurich Insurance Group built a log management solution on AWS

Differentiate generative AI applications with your data using AWS analytics and managed databases

Cost monitoring for Amazon EMR on Amazon EKS

Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Advanced reporting and analytics for the Post Call Analytics (PCA) solution with Amazon QuickSight

Cloud Data Science News – Beta 8

AWS adds machine learning capabilities to Amazon Connect

Stay Connected