Big Data, Data Architecture and Structured Data

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data.

Big Data

Big Data B2B Cost-Benefit Structured Data

Incremental refresh for Amazon Redshift materialized views on data lake tables

AWS Big Data

NOVEMBER 8, 2024

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. He has worked with building data warehouses and big data solutions for over 15+ years. Tahir Aziz is an Analytics Solution Architect at AWS.

Data Lake

Data Lake Data Warehouse Optimization Testing

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. He has helped customers build scalable data warehousing and big data solutions for over 16 years.

Analytics

Analytics Data Warehouse Big Data Metrics

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

This post was co-written with Dipankar Mazumdar, Staff Data Engineering Advocate with AWS Partner OneHouse. Data architecture has evolved significantly to handle growing data volumes and diverse workloads. For more examples and references to other posts on using XTable on AWS, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Traditional data warehouses, for example, support datasets from multiple sources but require a consistent data structure.

Data Lake

Data Lake Unstructured Data Data Warehouse Big Data

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

The Future Is Hybrid Data, Embrace It

CIO Business Intelligence

JUNE 23, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Data Architecture Unstructured Data Big Data

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

erwin

OCTOBER 24, 2019

Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data.

Metadata

Metadata Management Data-driven Data Architecture

What is data governance? Best practices for managing data assets

CIO Business Intelligence

MARCH 24, 2023

A framework for managing data 10 master data management certifications that will pay off Big Data, Data and Information Security, Data Integration, Data Management, Data Mining, Data Science, IT Governance, IT Governance Frameworks, Master Data Management

Data Governance

Data Governance Management Metadata Data Quality

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

It won’t protect you from issues of data quality or from service failures. […] But Linked Data does provide you with new ways to manage these existing data-management challenges. 6 Linked Data, Structured Data on the Web. Linked Data and Volume. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

They classified the metrics and indicators in the following categories: Data usage – A clear understanding of who is consuming what data source, materialized with a mapping of consumers and producers. For other organizations, the desired data mesh might look different and the approach might have other learnings.

Data-driven

Data-driven Advertising Metadata Data Architecture

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

If Johnny Mnemonic Smuggled Linked Data

Ontotext

MAY 30, 2019

It won’t protect you from issues of data quality or from service failures. […] But Linked Data does provide you with new ways to manage these existing data-management challenges. 6 Linked Data, Structured Data on the Web. Linked Data and Volume. Linked Data and Information Retrieval.

Cost-Benefit

Cost-Benefit Big Data Technology Metadata

Leverage Data Virtualization to Build a Modern Data System

CDW Research Hub

OCTOBER 12, 2021

Business leaders need to quickly access data—and to trust the accuracy of that data—to make better decisions. As organizations grow and evolve, many find a need for more sophisticated analytics across an ever-increasing amount of digital and consumer data. Unreliable Data as a Service (DaaS) implementations.

Data Warehouse

Data Warehouse Big Data Data Architecture Cost-Benefit

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources.

Analytics

Analytics Data Lake Metadata Data Warehouse

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Before data records land on Amazon S3, we implement an ingestion layer to bring all data streams reliably and securely to the data lake. Kinesis Data Streams is deployed as an ingestion layer for accelerated intake of structured and semi-structured data streams.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Strategize based on how your teams explore data, run analyses, wrangle data for downstream requirements, and visualize data at different levels. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

Data ingestion, whether real time or batch, forms the basis of any effective data analysis, enabling organizations to gather information from diverse sources and use it for insightful decision-making. It’s raw, unprocessed data straight from the source.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

The hidden history of Db2

IBM Big Data Hub

JULY 5, 2022

In today’s world of complex data architectures and emerging technologies, databases can sometimes be undervalued and unrecognized. Take control of your data governance, security and compliance with Db2’s comprehensive, built-in auditing, access control, and data visibility capabilities.

Data Lake

Data Lake Data Warehouse Publishing Structured Data

Snowflake: A New Blueprint for the Modern Data Warehouse

CDW Research Hub

JULY 22, 2019

Snowflake’s cloud-built data warehouse enables the data-driven enterprise with instant elasticity, secure data sharing, and per-second pricing across multiple clouds. With Snowflake, you can store, transform and analyze structured and semi-structured data together.

Data Warehouse

Data Warehouse Business Intelligence Structured Data Data-driven

Take advantage of AI and use it to make your business better

IBM Big Data Hub

AUGUST 15, 2023

To that end, IBM is building a set of domain-specific foundation models that go beyond natural language learning models and are trained on multiple types of business data, including code, time-series data, tabular data, geospatial data, semi-structured data, and mixed-modality data such as text combined with images.

IT

IT Data Governance Modeling Cost-Benefit

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

Conclusion In this post, we demonstrated how to identify the changed data for a semi-structured data source and preserve the historical changes (SCD Type 2) on an S3 Delta Lake, when source systems are unable to provide the change data capture capability, with AWS Glue.

Data Lake

Data Lake Testing Snapshot Big Data

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Key considerations Gameskraft embraces a modern data architecture, with the data lake residing in Amazon S3.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Get maximum value out of your cloud data warehouse with Amazon Redshift

AWS Big Data

APRIL 19, 2023

Different departments within an organization can place data in a data lake or within their data warehouse depending on the type of data and usage patterns of that department. Nasdaq’s massive data growth meant they needed to evolve their data architecture to keep up.

Data Warehouse

Data Warehouse Data Lake Unstructured Data Optimization

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Both engines provide native ingestion support from Kinesis Data Streams and Amazon MSK via a separate streaming pipeline to a data lake or data warehouse for analysis. Data streaming enables you to ingest data from a variety of databases across various systems.

Data Lake

Data Lake Unstructured Data Management Snapshot

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Metadata plays a key role here in discovering the data assets.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless

AWS Big Data

NOVEMBER 19, 2024

Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, set operations, sub-queries and limited JOINs. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Internet of Things

Internet of Things Visualization Structured Data Data Architecture

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

AWS Big Data

MAY 9, 2025

Their infrastructure consists of a Redshift data warehouse for structured data and an S3 data lake for structured and semi-structured data. Subhasis Sarkar is a Senior Data Engineer with Amazon. Subhasis thrives on solving complex technological challenges with innovative solutions.

Data Lake

Data Lake Data Warehouse Marketing Management

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ontotext

NOVEMBER 11, 2024

The use of knowledge graphs has an enormous effect on various systems and processes which is why Garner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision-making across the enterprise.

Metadata

Metadata Knowledge Discovery Data Integration Management

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

Each AWS account has one Data Catalog per AWS Region. Each Data Catalog is a highly scalable collection of tables organized into databases. He has helped customers build scalable data warehousing and big data solutions for over 20 years. He is a big data enthusiast and holds 14 AWS Certifications.

Data Lake

Data Lake Metadata Testing Data-driven

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

AWS Big Data

OCTOBER 30, 2024

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with Amazon Redshift Spectrum and create the gold (consumption) layer. In our use case, we use Redshift Query Editor to create data marts using SQL code.

Data Lake

Data Lake Machine Learning Data Architecture Data-driven

Big Data Ingestion: Parameters, Challenges, and Best Practices

Incremental refresh for Amazon Redshift materialized views on data lake tables

Webinars

Trending Sources

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Webinars

Run Apache XTable in AWS Lambda for background conversion of open table formats

How EUROGATE established a data mesh architecture using Amazon DataZone

Building a Beautiful Data Lakehouse

The Future Is Hybrid Data, Embrace It

The Future Is Hybrid Data, Embrace It

Unstructured data management and governance using AWS AI/ML and analytics services

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions

What is data governance? Best practices for managing data assets

If Johnny Mnemonic Smuggled Linked Data

Design a data mesh on AWS that reflects the envisioned organization

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Data science vs data analytics: Unpacking the differences

How smava makes loans transparent and affordable using Amazon Redshift Serverless

If Johnny Mnemonic Smuggled Linked Data

Leverage Data Virtualization to Build a Modern Data System

Top analytics announcements of AWS re:Invent 2024

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Create an end-to-end data strategy for Customer 360 on AWS

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

The hidden history of Db2

Snowflake: A New Blueprint for the Modern Data Warehouse

Take advantage of AI and use it to make your business better

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Get maximum value out of your cloud data warehouse with Amazon Redshift

Exploring real-time streaming for generative AI Applications

Data platform trinity: Competitive or complementary?

Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

Knowledge Graphs 101: The Story (and Benefits) Behind the Hype

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

Stay Connected