Data Processing, Data Warehouse and Reference

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. The system had an integration with legacy backend services that were all hosted on premises. The downside here is over-provisioning.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

You can now generate data integration jobs for various data sources and destinations, including Amazon Simple Storage Service (Amazon S3) data lakes with popular file formats like CSV, JSON, and Parquet, as well as modern table formats such as Apache Hudi , Delta , and Apache Iceberg.

Data Integration

Data Integration Visualization Data Processing Data Lake

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. Migrating a data warehouse can be complex. You have to migrate terabytes or petabytes of data from your legacy system while not disrupting your production workload.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Introduction To The Basic Business Intelligence Concepts

datapine

MAY 9, 2019

Business intelligence concepts refer to the usage of digital computing technologies in the form of data warehouses, analytics and visualization with the aim of identifying and analyzing essential business-based data to generate new, actionable corporate insights. The data warehouse. 1) The raw data.

Business Intelligence

Business Intelligence Dashboards Data Warehouse Visualization

Take Your SQL Skills To The Next Level With These Popular SQL Books

datapine

SEPTEMBER 27, 2022

With a MySQL dashboard builder , for example, you can connect all the data with a few clicks. A host of notable brands and retailers with colossal inventories and multiple site pages use SQL to enhance their site’s structure functionality and MySQL reporting processes. Would highly recommend for SQL experts.”.

Business Intelligence

Business Intelligence Data Warehouse Data Processing Data mining

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

Amazon Redshift is the most widely used data warehouse in the cloud, best suited for analyzing exabytes of data and running complex analytical queries. Amazon QuickSight is a fast business analytics service to build visualizations, perform ad hoc analysis, and quickly get business insights from your data.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

How Will The Cloud Impact Data Warehousing Technologies?

Smart Data Collective

APRIL 8, 2020

Dating back to the 1970s, the data warehousing market emerged when computer scientist Bill Inmon first coined the term ‘data warehouse’. Created as on-premise servers, the early data warehouses were built to perform on just a gigabyte scale. The post How Will The Cloud Impact Data Warehousing Technologies?

Technology

Technology Data Warehouse Big Data Machine Learning

Implement data warehousing solution using dbt on Amazon Redshift

AWS Big Data

NOVEMBER 17, 2023

For more information, refer SQL models. Seeds – These are CSV files in your dbt project (typically in your seeds directory), which dbt can load into your data warehouse using the dbt seed command. During the run, dbt creates a Directed Acyclic Graph (DAG) based on the internal reference between the dbt components.

Snapshot

Snapshot Data Processing Testing Data Warehouse

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. For additional details, refer to Automated snapshots.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

Find the best Amazon Redshift configuration for your workload using Redshift Test Drive

AWS Big Data

JULY 27, 2023

Amazon Redshift is a widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Amazon Redshift RA3 with managed storage is the newest instance type for Provisioned clusters.

Testing

Testing Data Warehouse Data Processing Snapshot

Amazon DocumentDB zero-ETL integration with Amazon OpenSearch Service is now available

AWS Big Data

MAY 16, 2024

This will be used temporarily to hold the data from Amazon DocumentDB for data synchronization. OpenSearch hosts – Provide the OpenSearch Service domain endpoint for the host and provide the preferred index name to store the data. He has worked with building databases and data warehouse solutions for over 15 years.

Data Processing

Data Processing Data Warehouse Management Dashboards

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Tens of thousands of customers use Amazon Redshift for modern data analytics at scale, delivering up to three times better price-performance and seven times better throughput than other cloud data warehouses. Refer to IAM Identity Center identity source tutorials for the IdP setup. IAM Identity Center enabled.

Visualization

Visualization Sales Data Warehouse Management

Resolve private DNS hostnames for Amazon MSK Connect

AWS Big Data

OCTOBER 20, 2023

The connectors were only able to reference hostnames in the connector configuration or plugin that are publicly resolvable and couldn’t resolve private hostnames defined in either a private hosted zone or use DNS servers in another customer network. For instructions, refer to create key-pair here.

Data Processing

Data Processing Snapshot Data Warehouse Management

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

One of the key challenges in modern big data management is facilitating efficient data sharing and access control across multiple EMR clusters. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. The producer account will host the EMR cluster and S3 buckets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. You can get faster insights without spending valuable time managing your data warehouse. Fault tolerance is built in.

Analytics

Analytics Data Warehouse Dashboards Testing

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

APRIL 25, 2024

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows.

Metadata

Metadata Data Processing Management Testing

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

A CDC-based approach captures the data changes and makes them available in data warehouses for further analytics in real-time. usually a data warehouse) needs to reflect those changes in near real-time. This post showcases how to use streaming ingestion to bring data to Amazon Redshift.

Data Warehouse

Data Warehouse Snapshot Data Processing Internet of Things

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

AWS Big Data

SEPTEMBER 10, 2024

With the launch of Amazon Redshift Serverless and the various provisioned instance deployment options , customers are looking for tools that help them determine the most optimal data warehouse configuration to support their Amazon Redshift workloads. For guidance, refer to the Authoring and running notebooks.

Testing

Testing Snapshot Data Warehouse Metrics

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. With its massively parallel processing (MPP) architecture and columnar data storage, Amazon Redshift delivers high price-performance for complex analytical queries against large datasets.

Sales

Sales Metadata Enterprise Testing

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Reporting being part of an effective DQM, we will also go through some data quality metrics examples you can use to assess your efforts in the matter. But first, let’s define what data quality actually is. What is the definition of data quality? Why Do You Need Data Quality Management?

Data Quality

Data Quality Metrics Data-driven Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. If you’d like to learn more about other workflows in this solution, please refer to the implementation guide.

Metadata

Metadata Data Lake Data Processing Data-driven

What is business intelligence? Transforming data into business insights

CIO Business Intelligence

JANUARY 20, 2023

BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts, and maps to provide users with detailed intelligence about the state of the business. Benefits of BI BI helps business decision-makers get the information they need to make informed decisions.

Business Intelligence

Business Intelligence Dashboards Data mining OLAP

Perform secure database write-backs with Amazon QuickSight

AWS Big Data

MAY 10, 2023

A write-back is the ability to update a data mart, data warehouse, or any other database backend from within BI dashboards and analyze the updated data in near-real time within the dashboard itself. AnyCompany currently uses Amazon Redshift as their enterprise data warehouse platform and QuickSight as their BI solution.

Dashboards

Dashboards Data Warehouse Visualization Data Processing

Architectural Patterns for real-time analytics using Amazon Kinesis Data Streams, Part 2: AI Applications

AWS Big Data

MAY 28, 2024

The ingested data gets transformed and analyzed in near real time using Amazon Managed Service for Apache Flink. Stream data can further be enriched using lookup data hosted in a data warehouse such as Amazon Redshift. We will continue to add new architectural patterns in the future posts of this series.

IoT

IoT Analytics Dashboards Data-driven

Attribute Amazon EMR on EC2 costs to your end-users

AWS Big Data

AUGUST 27, 2024

Refer to How do I set up a NAT gateway for a private subnet in Amazon VPC? For more information, refer to Prerequisites. For more information, refer to Storing database credentials in AWS Secrets Manager. For instructions to set up AWS Cloud9, refer to Getting started: basic tutorials for AWS Cloud9. manylinux2014_x86_64.whl

Metrics

Metrics Dashboards Data Lake Optimization

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Data lakes are not transactional by default; however, there are multiple open-source frameworks that enhance data lakes with ACID properties, providing a best of both worlds solution between transactional and non-transactional storage mechanisms. The reference data is continuously replicated from MySQL to DynamoDB through AWS DMS.

Data Lake

Data Lake Data Analytics Analytics Data Processing

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

AWS Big Data

MARCH 9, 2023

Thousands of customers rely on Amazon Redshift to build data warehouses to accelerate time to insights with fast, simple, and secure analytics at scale and analyze data from terabytes to petabytes by running complex analytical queries. Data loading is one of the key aspects of maintaining a data warehouse.

Slice and Dice

Slice and Dice Data Warehouse Metrics Metadata

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

A Guide To Starting A Career In Business Intelligence & The BI Skills You Need

datapine

MARCH 31, 2022

On the flip side, if you enjoy diving deep into the technical side of things, with the right mix of skills for business intelligence you can work a host of incredibly interesting problems that will keep you in flow for hours on end. This could involve anything from learning SQL to buying some textbooks on data warehouses.

Business Intelligence

Business Intelligence Statistics Visualization Data-driven

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

The integration of Talend Cloud and Talend Stitch with Amazon Redshift Serverless can help you achieve successful business outcomes without data warehouse infrastructure management. In this post, we demonstrate how Talend easily integrates with Redshift Serverless to help you accelerate and scale data analytics with trusted data.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Cloudera Data Engineering – Integration steps to leverage spark on Kubernetes

Cloudera

APRIL 14, 2021

Refer to the following cloudera blog to understand the full potential of Cloudera Data Engineering. . For further details on the API, please refer to the following doc link here. . New jobs are defined with references to the resource which automatically downloads the custom runtime image to run the spark drivers and executors.

Data Warehouse

Data Warehouse Data Processing Machine Learning Data Quality

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

Please refer to the product documentation for more information about specific releases. Supported AI models and services The SQL AI Assistant is not bundled with a specific LLM; instead it supports various LLMs and hosting services. Log in to the Cloudera Data Warehouse service as DWAdmin. or higher on the public cloud.

Data Warehouse

Data Warehouse Data Processing Optimization Modeling

Integrate Tableau and Okta with Amazon Redshift using AWS IAM Identity Center

AWS Big Data

JUNE 3, 2024

Amazon Redshift is a fast, scalable cloud data warehouse built to serve workloads at any scale. This integration positions Amazon Redshift as an IAM Identity Center-managed application, enabling you to use database role-based access control on your data warehouse for enhanced security. Tableau Server 2023.3.4

Data Warehouse

Data Warehouse Reporting Testing Publishing

Improve your ETL performance using multiple Redshift warehouses for writes

AWS Big Data

FEBRUARY 19, 2024

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Thousands of customers use Amazon Redshift read data sharing to enable instant, granular, and fast data access across Redshift provisioned clusters and serverless workgroups.

Cost-Benefit

Cost-Benefit Data Warehouse Marketing Interactive

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines. Data quality at rest focuses on validating the data stored in data lakes, databases, or data warehouses. It ensures that the data meets specific quality standards before it is consumed.

Data Quality

Data Quality Data Lake Visualization Data-driven

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

It is prudent to consolidate this data into a single customer view, serving as a primary reference for downstream applications, ranging from ecommerce platforms to CRM systems. This consolidated view acts as a liaison between the data platform and customer-centric applications.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

Apache Hive is a SQL-based data warehouse system for processing highly distributed datasets on the Apache Hadoop platform. The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table.

Data Lake

Data Lake Metadata Data Processing Big Data

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

AWS Big Data

SEPTEMBER 3, 2024

Amazon Redshift and Tableau empower data analysis. Amazon Redshift is a cloud data warehouse that processes complex queries at scale and with speed. Tableau’s extensive capabilities and enterprise connectivity help analysts efficiently prepare, explore, and share data insights company-wide. Open Tableau Desktop.

Reporting

Reporting Publishing Data Warehouse Management

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Incremental data is generated in the PostgreSQL table by running custom SQL scripts. Let’s refer to this S3 bucket as the raw layer. with Apache Spark version 3.3.0)

Data Lake

Data Lake Dashboards Metrics Metadata

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Webinars

Trending Sources

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Introduction To The Basic Business Intelligence Concepts

Take Your SQL Skills To The Next Level With These Popular SQL Books

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

How Will The Cloud Impact Data Warehousing Technologies?

Implement data warehousing solution using dbt on Amazon Redshift

Implement disaster recovery with Amazon Redshift

Find the best Amazon Redshift configuration for your workload using Redshift Test Drive

Amazon DocumentDB zero-ETL integration with Amazon OpenSearch Service is now available

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Resolve private DNS hostnames for Amazon MSK Connect

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Amazon Redshift data ingestion options

Evaluating sample Amazon Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Governing data in relational databases using Amazon DataZone

What is business intelligence? Transforming data into business insights

Perform secure database write-backs with Amazon QuickSight

Architectural Patterns for real-time analytics using Amazon Kinesis Data Streams, Part 2: AI Applications

Attribute Amazon EMR on EC2 costs to your end-users

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

Top 20 most-asked questions about Amazon RDS for Db2 answered

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

A Guide To Starting A Career In Business Intelligence & The BI Skills You Need

Enable data analytics with Talend and Amazon Redshift Serverless

Cloudera Data Engineering – Integration steps to leverage spark on Kubernetes

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Integrate Tableau and Okta with Amazon Redshift using AWS IAM Identity Center

Improve your ETL performance using multiple Redshift warehouses for writes

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Create an end-to-end data strategy for Customer 360 on AWS

Query your Apache Hive metastore with AWS Lake Formation permissions

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected