2012, Data Architecture and Data Lake

2012

Data Architecture

Data Lake

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

He has over 13 years of professional experience building and optimizing enterprise data warehouses and is passionate about enabling customers to realize the power of their data. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Analytics

Analytics Data Warehouse Big Data Metrics

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

AWS Big Data

FEBRUARY 15, 2024

You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to define and manage the data access based on IdP identities. Choose Register location.

Management

Management Data Lake Sales Data Warehouse

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling.

Data-driven

Data-driven Advertising Metadata Data Architecture

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In the 2010s, the growing scope of the data landscape gave rise to a new profession: the data scientist. This new role, combined with the creation of data lakes and the increasing use of cloud services, created new employment opportunities in data analytics, data architecture, and data management.

Metadata

Metadata Data-driven Insurance Statistics

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

This will include how to configure Okta, AWS Lake Formation , and a business intelligence tool to enable SAML-based federated use of Athena for an enterprise BI activity. When building a scalable data architecture on AWS, giving autonomy and ownership to the data domains are crucial for the success of the platform.

Data Governance

Data Governance Management Data-driven Analytics

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

AWS Big Data

JANUARY 23, 2025

Amazon SageMaker Lakehouse enables a unified, open, and secure lakehouse platform on your existing data lakes and warehouses. Its unified data architecture supports data analysis, business intelligence, machine learning, and generative AI applications, which can now take advantage of a single authoritative copy of data.

Data Lake

Data Lake Data Warehouse Metadata Machine Learning

Jumia builds a next-generation data platform with metadata-driven specification frameworks

AWS Big Data

DECEMBER 20, 2024

Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. After the tables are created, the second task transfers HDFS data to a landing bucket in Amazon S3 using AWS DataSync to sync customer data. The following diagram illustrates the architecture.

Metadata

Metadata Data-driven Snapshot Data Lake

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

AWS Big Data

MAY 9, 2025

Many organizations build and operate enterprise-wide data mesh architectures using the AWS Glue Data Catalog and AWS Lake Formation for their Amazon Simple Storage Service (Amazon S3) based data lakes. AWS Glue is a serverless service that makes data integration simpler, faster, and cheaper.

Data Lake

Data Lake Data Warehouse Marketing Management

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

AWS Big Data

MAY 12, 2025

Streaming pipelines used Spark Streaming to ingest real-time data from Kafka, writing raw datasets to an Amazon Simple Storage Service (Amazon S3) data lake while simultaneously loading them into BigQuery and Google Cloud Storage to build logical data layers. Avichay Marciano is a Sr. Avichay Marciano is a Sr.

Metrics

Metrics Cost-Benefit Metadata Data Lake

Data Leaders Brief

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Webinars

Trending Sources

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Webinars

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Design a data mesh on AWS that reflects the envisioned organization

Why We Started the Data Intelligence Project

How Novo Nordisk built distributed data governance and control at scale

Convergent Evolution

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

Jumia builds a next-generation data platform with metadata-driven specification frameworks

Configure cross-account access of Amazon SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless

Stay Connected