2012, Big Data and Data Architecture

2012

Big Data

Data Architecture

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

He has helped customers build scalable data warehousing and big data solutions for over 16 years. He has worked with building data warehouses and big data solutions for over 13 years. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture. Choose Next.

Analytics

Analytics Data Warehouse Big Data Metrics

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

In the Specify application credentials section, choose Edit the application policy and use the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "redshift-data.amazonaws.com" }, "Action": "sso-oauth:*", "Resource": "*" } ] } Choose Submit.

Visualization

Visualization Sales Data Warehouse Management

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Users can begin ingesting data to Redshift from Amazon S3 with simple SQL commands and gain access to the most up-to-date data without the need for third-party tools or custom implementation. He has worked with building data warehouses and big data solutions for over 15+ years.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

SEPTEMBER 24, 2021

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. The Dawn of Telco Big Data: 2007-2012. Suddenly, it was possible to build a data model of the network and create both a historical and predictive view of its behaviour.

Analytics

Analytics IoT Cost-Benefit Big Data

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions. Partner Solution Architect at AWS.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. compute.internal ).

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. Srikant Das is an Acceleration Lab Solutions Architect at Amazon Web Services.

Data-driven

Data-driven Advertising Metadata Data Architecture

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

git clone [link] cd automate-and-simplify-aws-glue-data-asset-publish-to-amazon-datazone At the base of the repository folder, run the following commands to build and deploy resources to AWS. For guidance on establishing your organization’s data mesh with Amazon DataZone, contact your AWS team today.

Data Lake

Data Lake Publishing Metadata Data-driven

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

AWS Big Data

SEPTEMBER 6, 2023

He works with enterprise FSI customers and is primarily specialized in machine learning and data architectures. In this free time, Philipp spends time with his family and enjoys every geek hobby possible. Daniel Wessendorf is a Global Solutions Architect at AWS based in Munich.

Testing

Testing Metadata Cost-Benefit Internet of Things

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

AWS Big Data

FEBRUARY 15, 2024

You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to define and manage the data access based on IdP identities. Leave your questions and feedback in the comments section.

Management

Management Data Lake Sales Data Warehouse

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

He focuses on modern data architectures and helping customers accelerate their cloud journey with serverless technologies. To learn more about Amazon DataZone, refer to the Amazon DataZone User Guide. About the Authors Andrea Filippo is a Partner Solutions Architect at AWS supporting Public Sector partners and customers in Italy.

Data Quality

Data Quality Visualization Metadata Metrics

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

AWS Big Data

JUNE 29, 2023

Additionally, you can extend this solution to include DDL commands used for Amazon Redshift data sharing across clusters. Operational excellence is a critical part of the overall data governance on creating a modern data architecture, as it’s a great enabler to drive our customers’ business.

Data Warehouse

Data Warehouse Dashboards Testing Visualization

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

When building a scalable data architecture on AWS, giving autonomy and ownership to the data domains are crucial for the success of the platform. Solution overview In the first post of this series, we explained how Novo Nordisk and AWS Professional Services built a modern data architecture based on data mesh tenets.

Data Governance

Data Governance Management Data-driven Analytics

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. Of course some architectures featured both paradigms as well.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

AWS Big Data

JANUARY 21, 2025

He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS. For more details on the feature, refer to Using an OpenSearch Ingestion pipeline with AWS Lambda. About the Authors Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service.

Data Processing

Data Processing Metrics Data-driven Publishing

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

AWS Big Data

JANUARY 23, 2025

Amazon SageMaker Lakehouse enables a unified, open, and secure lakehouse platform on your existing data lakes and warehouses. Its unified data architecture supports data analysis, business intelligence, machine learning, and generative AI applications, which can now take advantage of a single authoritative copy of data.

Data Lake

Data Lake Data Warehouse Metadata Machine Learning

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

AWS Big Data

APRIL 28, 2025

Whether youre working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows. Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team.

Big Data

Big Data Visualization Data Processing Data Processing

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to Amazon EMR Serverless , detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations. Avichay Marciano is a Sr.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Jumia builds a next-generation data platform with metadata-driven specification frameworks

AWS Big Data

DECEMBER 20, 2024

Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. Jumia is built around a marketplace, a logistics service, and a payment service.

Metadata

Metadata Data-driven Snapshot Data Lake

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

AWS Big Data

MAY 9, 2025

She collaborates with the service team to enhance product features, works with AWS customers and partners to architect lakehouse solutions, and establishes best practices for data governance. Subhasis Sarkar is a Senior Data Engineer with Amazon. Subhasis thrives on solving complex technological challenges with innovative solutions.

Data Lake

Data Lake Data Warehouse Marketing Management

Data Leaders Brief

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Webinars

Trending Sources

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Webinars

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Telecom Network Analytics: Transformation, Innovation, Automation

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Design a data mesh on AWS that reflects the envisioned organization

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Build streaming data pipelines with Amazon MSK Serverless and IAM authentication

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

How Novo Nordisk built distributed data governance and control at scale

Convergent Evolution

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Jumia builds a next-generation data platform with metadata-driven specification frameworks

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

Stay Connected