Data Analytics, Data Architecture, Reference and Testing

Data Analytics

Data Architecture

Reference

Testing

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

AWS Big Data

APRIL 12, 2024

Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. Create a Kinesis data stream. Query the Snowflake table to validate the data loaded into Snowflake.

Data Architecture

Data Architecture IoT Internet of Things Recreation/Entertainment

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The AI Superhero Approach to Product Management

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

MORE WEBINARS

Trending Sources

Why the Data Journey Manifesto?

DataKitchen

JUNE 12, 2023

Today we have had over 20,000 signatures , millions of page views, and copycat clones, and it is frequently used as a reference guide. For example, just a few weeks ago, Microsoft announced data fabric, and John Kerski used it to frame up the discussion of how Microsoft data fabric supports DataOps principles.

Testing

Testing Dashboards Data Lake Data Science

Webinars

The AI Superhero Approach to Product Management

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

MORE WEBINARS

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

AWS Big Data

APRIL 3, 2023

To learn about new options for database scripting, refer to Accelerate your data warehouse migration to Amazon Redshift – Part 4. For more details, refer to Auto Scaling groups , the Amazon EFT User Guide , and Integrating CodeDeploy with Amazon EC2 Auto Scaling. For more information, refer to Prerequisites.

Data Warehouse

Data Warehouse Testing Data Lake Data-driven

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors. The customer leverages Cloudera’s multi-function analytics stack in CDP. Test and QA.

Testing

Testing Metadata Risk Data Science

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. For more details, refer to Spark Release 3.3.0 AWS Glue Data Catalog client 3.6.0

Testing

Testing Data Lake Cost-Benefit Data Integration

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

For detailed information on managing your Apache Hive metastore using Lake Formation permissions, refer to Query your Apache Hive metastore with AWS Lake Formation permissions. In this post, we present a methodology for deploying a data mesh consisting of multiple Hive data warehouses across EMR clusters.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

Whereas data governance is about the roles, responsibilities, and processes for ensuring accountability for and ownership of data assets, DAMA defines data management as “an overarching term that describes the processes used to plan, specify, enable, create, acquire, maintain, use, archive, retrieve, control, and purge data.”

Data Governance

Data Governance Management Metadata Data Quality

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

For instructions, refer to Amazon DataZone quickstart with AWS Glue data. You also need to define and run a ruleset against your data, which is a set of data quality rules in AWS Glue Data Quality. The ruleset contains 27 individual rules (one of them failing), so the overall data quality score is 96%.

Data Quality

Data Quality Visualization Metadata Metrics

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

These are six main steps in the data pipeline: Amazon EventBridge triggers an AWS Lambda function when the event pattern for AWS Glue Data Quality matches the defined rule. For more information, refer to Working with Query Results, Output Files, and Query History. For S3 path , enter the S3 path to your data source. (

Data Quality

Data Quality Metrics Visualization Dashboards

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

AWS Big Data

JUNE 29, 2023

Tracking such user queries as part of the centralized governance of the data warehouse helps stakeholders understand potential risks and take prompt action to mitigate them following the operational excellence pillar of the AWS Data Analytics Lens. Test the filter by selecting the actual log stream.

Data Warehouse

Data Warehouse Dashboards Testing Visualization

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

AWS Big Data

SEPTEMBER 11, 2024

This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS. The new solution has helped Aruba integrate data from multiple sources, along with optimizing their cost, performance, and scalability.

Data Architecture

Data Architecture Optimization Data Warehouse Metadata

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Experimentation Data-driven

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Refer to How can I access OpenSearch Dashboards from outside of a VPC using Amazon Cognito authentication for a detailed evaluation of the available options and the corresponding pros and cons. For more information, refer to the AWS CDK v2 Developer Guide. For instructions, refer to Creating a public hosted zone.

Dashboards

Dashboards Data Processing Metadata Consulting

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. The biggest challenge is broken data pipelines due to highly manual processes. Figure 1 shows a manually executed data analytics pipeline.

Testing

Testing Metadata Dashboards Statistics

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Many customers migrate their data warehousing workloads to Amazon Redshift and benefit from the rich capabilities it offers, such as the following: Amazon Redshift seamlessly integrates with broader data, analytics, and AI or machine learning (ML) services on AWS , enabling you to choose the right tool for the right job.

Analytics

Analytics Data Warehouse Dashboards Testing

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

Test SCD Type 2 implementation With the infrastructure in place, you’re ready to test out the overall solution design and query historical records from the employee dataset. This post is designed to be implemented for a real customer use case, where you get full snapshot data on a daily basis.

Data Lake

Data Lake Testing Snapshot Sales

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data. Tens of thousands of customers use Amazon Redshift to process large amounts of data, modernize their data analytics workloads, and provide insights for their business users.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

The downstream consumers consist of business intelligence (BI) tools, with multiple data science and data analytics teams having their own WLM queues with appropriate priority values. Consequently, there was a fivefold rise in data integrations and a fivefold increase in ad hoc queries submitted to the Redshift cluster.

Data Warehouse

Data Warehouse Data Lake Analytics Data Science

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

They store attributes such as object size, total time, turn-around time, and HTTP referer for log records. In our test bed with 470 GB (1,544,692 objects) of access logs, large Spark drivers using AWS Glue’s G.8X Ray job run details: Please feel free to download the script and test this solution in your development environment.

Metadata

Metadata Dashboards Metrics Visualization

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale. This post is not intended to provide detailed technical guidance (e.g.

Data Lake

Data Lake Metadata Optimization Statistics

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

DataOps Observability includes monitoring and testing the data pipeline, data quality, data testing, and alerting. Data testing is an essential aspect of DataOps Observability; it helps to ensure that data is accurate, complete, and consistent with its specifications, documentation, and end-user requirements.

Testing

Testing Data Governance Data Quality Data-driven

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Third-party data might include industry benchmarks, data feeds (such as weather and social media), and/or anonymized customer data. Four Approaches to Data Analytics The world of data analytics is constantly and quickly changing. It’s all about context.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake

Data science vs data analytics: Unpacking the differences

Webinars

Trending Sources

Why the Data Journey Manifesto?

Webinars

Generic orchestration framework for data warehousing workloads using Amazon Redshift RSQL

Upgrade Journey: The Path from CDH to CDP Private Cloud

Dive deep into AWS Glue 4.0 for Apache Spark

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

What is data governance? Best practices for managing data assets

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

A Day in the Life of a DataOps Engineer

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Choosing an open table format for your transactional data lake on AWS

“You Complete Me,” said Data Lineage to DataOps Observability.

What Is Embedded Analytics?

Stay Connected