Big Data, Data Architecture and Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Big Data Opportunity in Manufacturing

TDAN

JANUARY 5, 2022

The world now runs on Big Data. Defined as information sets too large for traditional statistical analysis, Big Data represents a host of insights businesses can apply towards better practices. But what exactly are the opportunities present in big data? In manufacturing, this means opportunity.

Manufacturing

Manufacturing Big Data Statistics Data Processing

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Why telco should consider modern data architecture. The challenges.

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data. Big Data Ingestion.

Big Data

Big Data B2B Cost-Benefit Structured Data

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

Manish Limaye Pillar #1: Data platform The data platform pillar comprises tools, frameworks and processing and hosting technologies that enable an organization to process large volumes of data, both in batch and streaming modes. He is currently a technology advisor to multiple startups and mid-size companies.

Management

Management Data Governance Data Science Reporting

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Need for a data mesh architecture Because entities in the EUROGATE group generate vast amounts of data from various sourcesacross departments, locations, and technologiesthe traditional centralized data architecture struggles to keep up with the demands for real-time insights, agility, and scalability.

IoT

IoT Machine Learning Metadata Data-driven

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Copy and save the client ID and client secret needed later for the Streamlit application and the IAM Identity Center application to connect using the Redshift Data API. Generate the client secret and set sign-in redirect URL and sign-out URL to [link] (we will host the Streamlit application locally on port 8501).

Visualization

Visualization Sales Data Warehouse Management

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

In this post, we delve into the key aspects of using Amazon EMR for modern data management, covering topics such as data governance, data mesh deployment, and streamlined data discovery. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Create an Amazon Route 53 public hosted zone such as mydomain.com to be used for routing internet traffic to your domain. For instructions, refer to Creating a public hosted zone. Request an AWS Certificate Manager (ACM) public certificate for the hosted zone. hosted_zone_id – The Route 53 public hosted zone ID.

Dashboards

Dashboards Data Processing Metadata Consulting

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

Cloudera

JANUARY 26, 2022

Modern, real-time businesses require accelerated cycles of innovation that are expensive and difficult to maintain with legacy data platforms. The hybrid cloud’s premise—two data architectures fused together—gives companies options to leverage those solutions and to address decision-making criteria, on a case-by-case basis. .

Data Processing

Data Processing IoT Data Warehouse Cost-Benefit

The Multifaceted Value Proposition of the Cloudera Data Platform

Cloudera

FEBRUARY 22, 2021

The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. Cloudera Data Catalog (part of SDX) replaces data governance tools to facilitate centralized data governance (data cataloging, data searching / lineage, tracking of data issues etc. ).

Cost-Benefit

Cost-Benefit Data Warehouse Data Processing Data Governance

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

Deciphering the Pros & Cons of Real-Time Data Streaming

Smart Data Collective

SEPTEMBER 15, 2021

It is essential to process sensitive data only after acquiring a thorough knowledge of a stream processing architecture. The data architecture assimilates and processes sizable volumes of streaming data from different data sources. This very architecture ingests data right away while it is getting generated.

IoT

IoT Business Objectives Manufacturing Data Processing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Through the Looking Glass: Suspending Judgement on Synthetic Data

TDAN

MAY 31, 2022

In fact, according to Gartner, “60 percent of the data used for the development of AI and analytics projects will be synthetically generated.”[1] 1] I had never heard about synthetic data until I listened to the AI Today podcast, hosted by Kathleen Welch […].

Data Processing

Data Processing Analytics Data Architecture Data Governance

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

One Data Platform The ODP architecture is based on the AWS Well Architected Framework Analytics Lens and follows the pattern of having raw, standardized, conformed, and enriched layers as described in Modern data architecture. See the following admin user code: admin_secret_kms_key_options = KmsKeyOptions(.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. By decoupling storage and compute, data lakes promote cost-effective storage and processing of big data. Why did Orca choose Apache Iceberg?

Data Lake

Data Lake Analytics Snapshot Data Quality

Power analytics as a service capabilities using Amazon Redshift

AWS Big Data

APRIL 17, 2024

The workflow includes the following steps: The AaaS provider pulls data from customer data sources like operational databases, files, and APIs, and ingests them into the Redshift data warehouse hosted in their account. Data processing jobs enrich the data in Amazon Redshift.

Data Warehouse

Data Warehouse Analytics Cost-Benefit Data Processing

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. Srikant Das is an Acceleration Lab Solutions Architect at Amazon Web Services.

Data-driven

Data-driven Advertising Metadata Data Architecture

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

HEMA has a bespoke enterprise architecture, built around the concept of services. Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure. Tommaso is the Head of Data & Cloud Platforms at HEMA.

Data Governance

Data Governance Publishing Data-driven Metadata

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

With an extensive career in the financial and tech industries, she specializes in data management and has been involved in initiatives ranging from reporting to data architecture. She currently serves as the Global Head of Cyber Data Management at Zurich Group.

Insurance

Insurance Management Cost-Benefit Optimization

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

Overview of solution As a data-driven company, smava relies on the AWS Cloud to power their analytics use cases. smava ingests data from various external and internal data sources into a landing stage on the data lake based on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Data Warehouse Data-driven B2B

Digital Twin at a Glance – Revolutionizing the Industrial Domain

TDAN

OCTOBER 6, 2020

Prominent entities across a myriad of sectors are preparing for the digital revolution by integrating a host of technologies such as IoT, AI, Big Data, digital twins, and robotics, in their processes, products, and workflows. The industrial landscape is undergoing a digital transformation at a breakneck speed.

IoT

IoT Digital Transformation Manufacturing Data Processing

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Four-layered data lake and data warehouse architecture – The architecture comprises four layers, including the analytical layer, which houses purpose-built facts and dimension datasets that are hosted in Amazon Redshift.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Modern analytics is much wider than SQL-based data warehousing. With Amazon Redshift, you can build lake house architectures and perform any kind of analytics, such as interactive analytics , operational analytics , big data processing , visual data preparation , predictive analytics, machine learning , and more.

Analytics

Analytics Data Warehouse Dashboards Testing

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving. Choose ETL Jobs.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

An essential capability needed in such a data lake architecture is the ability to continuously understand changes in the data lakes in various other domains and make those available to data consumers. The data mesh producer account hosts the encrypted S3 bucket, which is shared with the central governance account.

Data Lake

Data Lake Data-driven Management Data Architecture

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Strategize based on how your teams explore data, run analyses, wrangle data for downstream requirements, and visualize data at different levels. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

The power of remote engine execution for ETL/ELT data pipelines

IBM Big Data Hub

MAY 15, 2024

These approaches minimize data movement, latencies, and egress fees by leveraging integration patterns alongside a remote runtime engine, enhancing pipeline performance and optimization, while simultaneously offering users flexibility in designing their pipelines for their use case.

Cost-Benefit

Cost-Benefit Data Integration Data Architecture Manufacturing

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Overall, the current architecture didn’t support workload prioritization, therefore a physical model of resources was reserved for this reason. The system had an integration with legacy backend services that were all hosted on premises. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

This approach has several benefits, such as streamlined migration of data from on-premises to the cloud, reduced query tuning requirements and continuity in SRE tooling, automations, and personnel. This enabled data-driven analytics at scale across the organization 4.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Metadata exporter This section provides details on the AWS Glue job that exports the AWS Glue Data Catalog into an S3 location. The source code for the application is hosted the AWS Glue GitHub. He advises clients on architecting and adopting Data Architectures that best serve their Data Analytics and Machine Learning needs.

Metadata

Metadata Data Lake Machine Learning Big Data

A guide to efficient Oracle implementation

IBM Big Data Hub

DECEMBER 4, 2023

During configuration, an organization constructs its data architecture and defines user roles. It’s also useful, following initial deployment, to host a celebration to congratulate the team on all their hard work, and to get direct feedback from early system users.

Testing

Testing Consulting Digital Transformation Cost-Benefit

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

When building a scalable data architecture on AWS, giving autonomy and ownership to the data domains are crucial for the success of the platform. Solution overview In the first post of this series, we explained how Novo Nordisk and AWS Professional Services built a modern data architecture based on data mesh tenets.

Data Governance

Data Governance Management Data-driven Analytics

VeloxCon 2024: Innovation in data management

IBM Big Data Hub

APRIL 29, 2024

VeloxCon 2024 , the premier developer conference that is dedicated to the Velox open-source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management.

Management

Management Optimization Data Processing Metrics

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source. Do what you expect your customers to do.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

AWS Big Data

JANUARY 21, 2025

The Lambda function will invoke the Amazon Titan Text Embeddings Model hosted in Amazon Bedrock , allowing for efficient and scalable embedding creation. This architecture simplifies various use cases, including recommendation engines, personalized chatbots, and fraud detection systems.

Data Processing

Data Processing Metrics Data-driven Publishing

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

AWS Big Data

APRIL 28, 2025

Whether youre working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows. In your project, in the navigation pane, choose Data. Select PostgreSQL.

Big Data

Big Data Visualization Data Processing Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Big Data Opportunity in Manufacturing

Webinars

Trending Sources

Modern Data Architecture for Telecommunications

Webinars

Big Data Ingestion: Parameters, Challenges, and Best Practices

The future of data: A 5-pillar approach to modern data management

How EUROGATE established a data mesh architecture using Amazon DataZone

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Announcing the 2020 Data Impact Award Winners

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

The Multifaceted Value Proposition of the Cloudera Data Platform

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Deciphering the Pros & Cons of Real-Time Data Streaming

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Through the Looking Glass: Suspending Judgement on Synthetic Data

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Power analytics as a service capabilities using Amazon Redshift

Design a data mesh on AWS that reflects the envisioned organization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

HEMA accelerates their data governance journey with Amazon DataZone

How Zurich Insurance Group built a log management solution on AWS

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Digital Twin at a Glance – Revolutionizing the Industrial Domain

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Amazon Redshift data ingestion options

Create an end-to-end data strategy for Customer 360 on AWS

The power of remote engine execution for ETL/ELT data pipelines

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

5 misconceptions about cloud data warehouses

How Cargotec uses metadata replication to enable cross-account data sharing

A guide to efficient Oracle implementation

How Novo Nordisk built distributed data governance and control at scale

VeloxCon 2024: Innovation in data management

What Is Embedded Analytics?

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

Stay Connected