Demo and Reference - Data Leaders Brief

Beyond “Prompt and Pray”

O'Reilly on Data

JANUARY 21, 2025

When we talk about conversational AI, were referring to systems designed to have a conversation, orchestrate workflows, and make decisions in real time. Its quick to implement and demos well. The prompt-and-pray approach is tempting because it demos well and feels fast.

Cost-Benefit

Cost-Benefit Testing Interactive Software

How to Implement Data Engineering in Practice?

Analytics Vidhya

DECEMBER 1, 2021

Components of Data Engineering Object Storage Object Storage MinIO Install Object Storage MinIO Data Lake with Buckets Demo Data Lake Management Conclusion References What is Data Engineering? Image Source: GitHub Table of Contents What is Data Engineering? Initially, we have the definition of Software […].

Data Lake

Data Lake Data Science Publishing Software

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

O'Reilly on Data

MARCH 25, 2025

Someone hacks together a quick demo with ChatGPT and LlamaIndex. The system is inconsistent, slow, hallucinatingand that amazing demo starts collecting digital dust. Check out the graph belowsee how excitement for traditional software builds steadily while GenAI starts with a flashy demo and then hits a wall of challenges?

Testing

Testing Data-driven Software Measurement

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

AWS Big Data

MAY 20, 2025

For instructions, refer to Creating a general purpose bucket. For this demo, we use Amazon Bedrock to access the Amazon Nova FMs. For more information, refer to the Set up query engine for your structured data store in Amazon Bedrock Knowledge Bases. Create an Amazon Simple Storage Service (Amazon S3) bucket with a unique name.

Structured Data

Structured Data Data Warehouse Analytics Finance

Simplify your query performance diagnostics in Amazon Redshift with Query profiler

AWS Big Data

OCTOBER 23, 2024

For more information, refer to Amazon Redshift clusters. However, if you would like to implement this demo in your existing Amazon Redshift data warehouse, download Redshift query editor v2 notebook, Redshift Query profiler demo , and refer to the Data Loading section later in this post. Run cell #12.

Data Warehouse

Data Warehouse Metrics Broadcasting Dashboards

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2025

For more detailed configuration, refer to Write properties in the Iceberg documentation. Replace with your database name, with your table name, amzn-s3-demo-bucket with your S3 bucket name. These conflicts are typically transient and can be automatically resolved through retries. config(f"spark.sql.catalog. groupBy("value").agg(max_("timestamp").alias("timestamp"))

Snapshot

Snapshot Management Metadata Big Data

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

Prerequisites Complete the following prerequisites before setting up the solution: Create a bucket in Amazon S3 called zero-etl-demo- - (for example, zero-etl-demo-012345678901-us-east-1 ). Create an AWS Glue database , such as zero_etl_demo_db and associate the S3 bucket zero-etl-demo- - as a location of the database.

Data Integration

Data Integration Data Lake Statistics Data-driven

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

In the following steps, replace amzn-s3-demo-destination-bucket with the name of the S3 bucket. For Role name , enter a role name (for this post, GlueJobRole-demo ). On the Job Details tab, under Basic properties, specify the IAM role that the job will use ( GlueJobRole-demo ). An AWS Glue Data Catalog database. Choose Next.

Visualization

Visualization Data Processing Testing Publishing

Defining Simplicity for Enterprise Software as “a 10 Year Old Can Demo it”

Cloudera

NOVEMBER 12, 2021

During the development of Operational Database and Replication Manager, I kept telling folks across the team it has to be “so simple that a 10 year old can demo it”. so simple that a 10 year old can demo it”. Watch this: Enterprise Software that is so easy a 10 year old can demo it.

Software

Software Enterprise Snapshot IT

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

AWS Big Data

NOVEMBER 7, 2024

Contact BladeBridge through Request demo and obtain an Analyzer key for your organization. For more details, refer to the BladeBridge Analyzer Demo. Refer to this BladeBridge documentation to get more details on SQL and expression conversion. This line ending also can be replaced with other breakers.

Data Warehouse

Data Warehouse Reporting Big Data Data Lake

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In your Google Cloud project, youve enabled the following APIs: Google Analytics API Google Analytics Admin API Google Analytics Data API Google Sheets API Google Drive API For more information, refer to Amazon AppFlow support for Google Sheets. Refer to the Amazon Redshift Database Developer Guide for more details.

Analytics

Analytics Data Warehouse Big Data Metrics

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

The S3 object path can reference a set of folders that have the same key prefix. Automate ingestion from a single data source With a auto-copy job, you can automate ingestion from a single data source by creating one job and specifying the path to the S3 objects that contain the data. You can drop auto-copy job using following command.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

AWS Big Data

APRIL 9, 2025

Solution demo To demonstrate the exact match filter solution, we have ingested an individual asset loaded from the TPC-DS tables and also created data product bundling of assets. Refer to the product documentation to learn more about how to set up metadata rules for subscription and publishing workflows.

Metadata

Metadata Metrics Cost-Benefit Data-driven

What you need to know about product management for AI

O'Reilly on Data

MARCH 31, 2020

There may even be someone on your team who built a personalized video recommender before and can help scope and estimate the project requirements using that past experience as a point of reference. An AI pilot project, even one that sounds simple, probably won’t be something you can demo quickly. AI doesn’t fit that model.

Management

Management Machine Learning Experimentation Metrics

Run Kinesis Agent on Amazon ECS

AWS Big Data

JANUARY 2, 2024

We also avoid the implementation details and packaging process of our test data generation application, referred to as the producer. log and publish them to a Kinesis Data Firehose delivery stream called kinesis-agent-demo : { "firehose.endpoint": "firehose.ap-southeast-2.amazonaws.com", southeast-2.amazonaws.com", southeast-2.amazonaws.com/producer:latest",

Testing

Testing Data Processing Metrics Publishing

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

AWS Big Data

MAY 22, 2025

encounter_id : Unique number to refer to an encounter with a patient who has diabetes. In the Amazon Redshift Query Editor V2 , connect to serverless:rs-demo-wg , an Amazon Redshift Serverless instance created by the CloudFormation template. You can do this by connecting to serverless:rs-demo-wg as Federated user.

Analytics

Analytics Data Lake Management Insurance

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

JANUARY 20, 2021

In this last installment, we’ll discuss a demo application that uses PySpark.ML For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production. In this demo, half of this training data is stored in HDFS and the other half is stored in an HBase table. Serving The Model .

Machine Learning

Machine Learning Modeling Data Science Big Data

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

AWS has invested in native service integration with Apache Hudi and published technical contents to enable you to use Apache Hudi with AWS Glue (for example, refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started ).

Data Lake

Data Lake Data Processing Metadata Snapshot

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

AWS Big Data

JUNE 2, 2023

We use AWS Glue , a fully managed, serverless, ETL (extract, transform, and load) service, and the Google BigQuery Connector for AWS Glue (for more information, refer to Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors ). If you don’t have one, refer to Amazon Redshift Serverless. An S3 bucket.

Metadata

Metadata Data Warehouse Big Data Analytics

Derive operational insights from application logs using Automated Data Analytics on AWS

AWS Big Data

AUGUST 16, 2023

Solution overview In this section, we present the solution architecture for the demo and explain the workflow. The historical application logs are stored in an S3 bucket for reference and for querying purposes. For this demo, MFA is not enabled. For additional details, refer to the Tableau licensing information.

Data Analytics

Data Analytics Analytics Visualization Software

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

To create your namespace and workgroup, refer to Creating a data warehouse with Amazon Redshift Serverless. For this exercise, name your workgroup sandbox and your namespace adx-demo. To configure Query Editor v2 for your AWS account, refer to Data load made easy and secure in Amazon Redshift using Query Editor V2.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

AWS Big Data

OCTOBER 10, 2024

Add a view that references a data lake table to a Redshift datashare When you create data lake tables that you intend to add to a datashare, the recommended and most common way to do this is to add a view to the datashare that references a data lake table or tables.

Data Lake

Data Lake Data Warehouse Recreation/Entertainment Data-driven

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Refer to IAM Identity Center identity source tutorials for the IdP setup. For more details, refer to Creating a workgroup with a namespace. Refer to Authorization servers for more information about authorization servers in Okta. For more information, refer to the CreateTokenWithIAM API reference.

Visualization

Visualization Sales Data Warehouse Management

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

For additional information about roles, refer to Requirements for roles used to register locations. Refer to Registering an encrypted Amazon S3 location for guidance. For Target database , enter lf-demo-db. In the Athena query editor, run the following SELECT query on the shared table: SELECT * FROM "lf-demo-db"."consumer_iceberg"

Interactive

Interactive Snapshot Data Lake Software

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

AWS Big Data

JULY 27, 2023

As shown in the following reference architecture, DynamoDB table data changes are streamed into Amazon Redshift through Kinesis Data Streams and Amazon Redshift streaming ingestion for near-real-time analytics dashboard visualization using Amazon QuickSight. For instructions, refer to Create a sample Amazon Redshift cluster.

Data Warehouse

Data Warehouse Analytics Metadata Dashboards

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

AWS Big Data

AUGUST 21, 2024

For more information, refer to Amazon Redshift adds new AI capabilities, including Amazon Q, to boost efficiency and productivity. Refer to Managing IAM roles created for a cluster using the console for instructions. Set up an AWS Identity and Access Management (IAM) role as the default IAM role.

Optimization

Optimization Data Lake Data Warehouse Cost-Benefit

Bias in AI

DataRobot

AUGUST 25, 2021

Preprocessing refers to mitigation methods applied to the training dataset before a model is trained on it. In-processing refers to mitigation techniques incorporated into the model training process itself. Request a Demo. Altering weights on rows of the data to achieve greater parity in assigned outcomes is one example.

Metrics

Metrics Measurement Machine Learning Data Collection

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

Second, while OpenAI’s GPT-4 announcement last March demoed generating website code from a hand-drawn sketch, that capability wasn’t available until after the survey closed. Third, while roughing out the HTML and JavaScript for a simple website makes a great demo, that isn’t really the problem web designers need to solve.

Enterprise

Enterprise Testing Modeling Reporting

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Before proceeding with the demo, create a folder named custdata under the created S3 bucket. For Data stream name , enter demo-data-stream. Wait for successful creation of demo-data-stream and for it to be in Active status. Select the Kinesis data stream demo-data-stream. Choose Create data stream.

Data Lake

Data Lake Metadata Testing Data Warehouse

Build a serverless log analytics pipeline using Amazon OpenSearch Ingestion with managed Amazon OpenSearch Service

AWS Big Data

JULY 17, 2023

Refer to Setting up roles and users in Amazon OpenSearch Ingestion to get more details on roles and permissions required to use OpenSearch Ingestion. In the demo, you use the AWS Cloud9 EC2 instance profile’s credentials to sign requests sent to OpenSearch Ingestion. For this post, select Public access under Network.

Management

Management Analytics Data Processing Metrics

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

AWS Big Data

APRIL 12, 2024

Streaming data refers to data that is continuously generated from a variety of sources. For instructions, refer to the following: Generate the private key Generate a public key Store the private and public keys securely Assign the public key to a Snowflake user Verify the user’s public key fingerprint An S3 bucket for error logging.

Data Architecture

Data Architecture IoT Internet of Things Recreation/Entertainment

Humans and AI: Should We Describe AI as Autonomous?

DataRobot

MARCH 10, 2021

This effect is referred to as operational transparency. Request a demo. The research participants also reported more willingness to pay for the services, a perception of higher quality, and a greater likelihood to use the site again. See DataRobot in Action. The post Humans and AI: Should We Describe AI as Autonomous?

Machine Learning

Machine Learning Experimentation Key Performance Indicator Reporting

Build multi-Region resilient Apache Kafka applications with identical topic names using Amazon MSK and Amazon MSK Replicator

AWS Big Data

MARCH 25, 2025

For an active-active setup, refer to Create an active-active setup using MSK Replicator. However, for the purpose of the demo, we are using console producer and consumers, so our clients are already stopped. For more information, refer to What is Amazon MSK Replicator?

Metrics

Metrics Testing Management Risk

How to Build Trust in AI

DataRobot

JULY 16, 2021

Accuracy — this refers to a subset of model performance indicators that measure a model’s aggregated errors in different ways. Speed — for model performance, speed refers to the time it takes to use a model to score a prediction. Request a demo. AI You Can Trust. The post How to Build Trust in AI appeared first on DataRobot.

Machine Learning

Machine Learning Uncertainty Modeling Measurement

Inside Walmart’s generative AI journey

CIO Business Intelligence

OCTOBER 19, 2023

Donna and other executive leaders were deeply involved in our strategy, vision, and execution, including participating in early product demos and weekly check-ins,” says Peterson, noting that their involvement was critical to maintaining focus and removing roadblocks to execution. “I

Consulting

Consulting Digital Transformation Technology Strategy

What Are ChatGPT and Its Friends?

O'Reilly on Data

MARCH 23, 2023

Bard Google’s code name for its chat-oriented search engine, based on their LaMDA model, and only demoed once in public. There’s a very important difference between these two almost identical sentences: in the first, “it” refers to the cup. In the second, “it” refers to the pitcher. These are questions we can’t not answer.

IT

IT Modeling Testing Risk

Amazon OpenSearch Service search enhancements: 2023 roundup

AWS Big Data

JANUARY 9, 2024

To learn more about semantic search and cross-modal search and experiment with a demo of the Compare Search Results tool, refer to Try semantic search with the Amazon OpenSearch Service vector engine. To learn more, refer to Byte-quantized vectors in OpenSearch.

Visualization

Visualization Cost-Benefit Modeling Machine Learning

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

If my explanation above is the correct interpretation of the high percentage, and if the statement refers to successfully deployed applications (i.e., A similarly high percentage of tabular data usage among data scientists was mentioned here.

Data-driven

Data-driven Enterprise Analytics Machine Learning

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Traditional batch ingestion and processing pipelines that involve operations such as data cleaning and joining with reference data are straightforward to create and cost-efficient to maintain. Solution overview For our example use case, streaming data is coming through Amazon Kinesis Data Streams , and reference data is managed in MySQL.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

AWS Big Data

JULY 28, 2023

To learn more about auto-mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog. For this post, we add full AWS Glue, Amazon Redshift, and Amazon S3 permissions for demo purposes. For more information, refer to Changing the default settings for your data lake.

Data Lake

Data Lake Data Governance Data Warehouse Modeling

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

If you don’t have one, refer to How do I create and activate a new AWS account? If you’re new to Amazon DataZone, refer to Getting started. To understand how to associate multiple accounts and consume the subscribed assets using Amazon Athena , refer to Working with associated accounts to publish and consume data.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

How To Use Artificial Intelligence To Create Websites That Thrive

Smart Data Collective

AUGUST 14, 2019

Check every free trial and demo to make sure you have covered all your bases. This refers to everything from having a great SEO app, meta titles, descriptions, and social images. Then sign up and take free tours and demos to be sure your user experience matches your company vision. Some AI tools make it easier to develop them.

Data Processing

Data Processing Optimization Testing IT

7 secrets of successful digital transformations

CIO Business Intelligence

APRIL 18, 2022

The challenge with this approach is that companies end up in what we refer to as the ‘digital trap. Although Young “talked to some people” before hiring the provider, he acknowledges that officials could have dug deeper and found people the company didn’t refer them to for references.

Digital Transformation

Digital Transformation Dashboards Cost-Benefit Consulting

New Applied ML Prototypes Now Available in Cloudera Machine Learning

Cloudera

NOVEMBER 17, 2021

In recognition of the diverse workload that data scientists face, Cloudera’s library of Applied ML Prototypes (AMPs) provide Data Scientists with pre-built reference examples and end-to-end solutions, using some of the most cutting edge ML methods, for a variety of common data science projects.

Machine Learning

Machine Learning Visualization Data Science Metrics

Beyond “Prompt and Pray”

How to Implement Data Engineering in Practice?

Webinars

Trending Sources

Escaping POC Purgatory: Evaluation-Driven Development for AI Systems

Webinars

Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift

Simplify your query performance diagnostics in Amazon Redshift with Query profiler

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Defining Simplicity for Enterprise Software as “a 10 Year Old Can Demo it”

Accelerate SQL code migration from Google BigQuery to Amazon Redshift using BladeBridge

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

What you need to know about product management for AI

Run Kinesis Agent on Amazon ECS

Scalable analytics and centralized governance for Apache Iceberg tables using Amazon S3 Tables and Amazon Redshift

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

Derive operational insights from application logs using Automated Data Analytics on AWS

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

Bias in AI

Generative AI in the Enterprise

Build a real-time GDPR-aligned Apache Iceberg data lake

Build a serverless log analytics pipeline using Amazon OpenSearch Ingestion with managed Amazon OpenSearch Service

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

Humans and AI: Should We Describe AI as Autonomous?

Build multi-Region resilient Apache Kafka applications with identical topic names using Amazon MSK and Amazon MSK Replicator

How to Build Trust in AI

Inside Walmart’s generative AI journey

What Are ChatGPT and Its Friends?

Amazon OpenSearch Service search enhancements: 2023 roundup

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

How To Use Artificial Intelligence To Create Websites That Thrive

7 secrets of successful digital transformations

New Applied ML Prototypes Now Available in Cloudera Machine Learning

Stay Connected