Analytics, Data Architecture and Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Zero-ETL integration with Amazon Redshift reduces the need for custom pipelines, preserves resources for your transactional systems, and gives you access to powerful analytics. The data in Amazon Redshift is transactionally consistent and updates are automatically and continuously propagated. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

The proposed model illustrates the data management practice through five functional pillars: Data platform; data engineering; analytics and reporting; data science and AI; and data governance. However, the analytics/reporting function needs to drive the organization of the reports and self-service analytics.

Management

Management Data Governance Data Science Reporting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. Instead of allowing technology to be a barrier to teamwork, leading data organizations in 2022 will further expand the automation of workflows to improve and facilitate communication and coordination between the groups.

Testing

Testing Data Lake Data Architecture Manufacturing

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Why telco should consider modern data architecture. The challenges.

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Enhance agility by localizing changes within business domains and clear data contracts. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

For this reason, organizations with significant data debt may find pursuing many gen AI opportunities more challenging and risky. What CIOs can do: Avoid and reduce data debt by incorporating data governance and analytics responsibilities in agile data teams , implementing data observability , and developing data quality metrics.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

Power analytics as a service capabilities using Amazon Redshift

AWS Big Data

APRIL 17, 2024

Analytics as a service (AaaS) is a business model that uses the cloud to deliver analytic capabilities on a subscription basis. This model provides organizations with a cost-effective, scalable, and flexible solution for building analytics. times better price-performance than other cloud data warehouses.

Data Warehouse

Data Warehouse Analytics Cost-Benefit Data Processing

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. The decoupled compute and storage architecture of Amazon Redshift enables you to build highly scalable, resilient, and cost-effective workloads.

Analytics

Analytics Data Warehouse Dashboards Testing

Deciphering the Pros & Cons of Real-Time Data Streaming

Smart Data Collective

SEPTEMBER 15, 2021

The data architecture assimilates and processes sizable volumes of streaming data from different data sources. This very architecture ingests data right away while it is getting generated. Data streaming in real-time enables an organization to act in the moment, which eventually enables it to prosper.

IoT

IoT Business Objectives Manufacturing Data Processing

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

In todays data-driven world, securely accessing, visualizing, and analyzing data is essential for making informed business decisions. The Amazon Redshift Data API simplifies access to your Amazon Redshift data warehouse by removing the need to manage database drivers, connections, network configurations, data buffering, and more.

Visualization

Visualization Sales Data Warehouse Management

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery. Data Security & Governance.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

In particular, companies that were leaders at using data and analytics had three times higher improvement in revenues, were nearly three times more likely to report shorter times to market for new products and services, and were over twice as likely to report improvement in customer satisfaction, profits, and operational efficiency.

Data Lake

Data Lake Data-driven Finance Data Architecture

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

The producer account will host the EMR cluster and S3 buckets. The catalog account will host Lake Formation and AWS Glue. The consumer account will host EMR Serverless, Athena, and SageMaker notebooks. By using Data Catalog metadata federation, organizations can construct a sophisticated data architecture.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Through the Looking Glass: Suspending Judgement on Synthetic Data

TDAN

MAY 31, 2022

Synthetic Data is, according to Gartner and other industry oracles, “hot, hot, hot.” In fact, according to Gartner, “60 percent of the data used for the development of AI and analytics projects will be synthetically generated.”[1]

Data Processing

Data Processing Analytics Data Architecture Data Governance

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Amazon OpenSearch Service is a fully managed search and analytics service powered by the Apache Lucene search library that can be operated within a virtual private cloud (VPC). Create an Amazon Route 53 public hosted zone such as mydomain.com to be used for routing internet traffic to your domain.

Dashboards

Dashboards Data Processing Metadata Consulting

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

SAP announced today a host of new AI copilot and AI governance features for SAP Datasphere and SAP Analytics Cloud (SAC). The combination enables SAP to offer a single data management system and advanced analytics for cross-organizational planning. Ventana Research’s Menninger agrees. “At

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Cloudera

OCTOBER 19, 2022

The telecommunications industry continues to develop hybrid data architectures to support data workload virtualization and cloud migration. Telco organizations are planning to move towards hybrid multi-cloud to manage data better and support their workforces in the near future. 2- AI capability drives data monetization.

Optimization

Optimization Data Architecture Data Governance B2B

Understanding Digital Interactions in Real-Time

CIO Business Intelligence

JUNE 29, 2022

Enterprises across industries have been obsessed with real-time analytics for some time. The insights provided by analytics “in the moment” can uncover valuable information in customer interactions and alert users or trigger responses as events happen. Flexibility is built in with an open data stack. billion market by 2026.

Interactive

Interactive Data-driven Data Architecture Software

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

However, many companies today still struggle to effectively harness and use their data due to challenges such as data silos, lack of discoverability, poor data quality, and a lack of data literacy and analytical capabilities to quickly access and use data across the organization.

Data Governance

Data Governance Publishing Data-driven Metadata

The Multifaceted Value Proposition of the Cloudera Data Platform

Cloudera

FEBRUARY 22, 2021

The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. CDP helps clients reduce (or avoid entirely) costs for ancillary technology tools that are used in conjunction with competing analytical solutions.

Cost-Benefit

Cost-Benefit Data Warehouse Data Processing Data Governance

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

Cloudera

JANUARY 26, 2022

Modern, real-time businesses require accelerated cycles of innovation that are expensive and difficult to maintain with legacy data platforms. The hybrid cloud’s premise—two data architectures fused together—gives companies options to leverage those solutions and to address decision-making criteria, on a case-by-case basis. .

Data Processing

Data Processing IoT Data Warehouse Cost-Benefit

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

CIO Business Intelligence

MARCH 19, 2025

The time has come for data leaders to move beyond traditional governance and analytics sustainability is the next frontier for CDOs, and the opportunity to lead is now. If sustainability-related data projects fail to demonstrate a clear financial impact, they risk being deprioritized in favor of more immediate business concerns.

IT

IT Data Governance Data-driven Metrics

Four Ways Telcos Can Realize Data-Driven Transformation

Cloudera

OCTOBER 19, 2023

While navigating so many simultaneous data-dependent transformations, they must balance the need to level up their data management practices—accelerating the rate at which they ingest, manage, prepare, and analyze data—with that of governing this data.

Data-driven

Data-driven Data Architecture Predictive Modeling Digital Transformation

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Four-layered data lake and data warehouse architecture – The architecture comprises four layers, including the analytical layer, which houses purpose-built facts and dimension datasets that are hosted in Amazon Redshift. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

4 paths to sustainable AI

CIO Business Intelligence

JANUARY 31, 2024

The size of the data sets is limited by business concerns. Use renewable energy Hosting AI operations at a data center that uses renewable power is a straightforward path to reduce carbon emissions, but it’s not without tradeoffs. Data analytics lead Diego Cáceres urges caution about when to use AI.

Cost-Benefit

Cost-Benefit Modeling Testing IoT

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving. Launch the notebooks hosted under this link and unzip them on a local workstation. Open AWS Glue Studio.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

While Cloudera CDH was already a success story at HBL, in 2022, HBL identified the need to move its customer data centre environment from Cloudera’s CDH to Cloudera Data Platform (CDP) Private Cloud to accommodate growing volumes of data. and primarily served regulatory reporting and internal analytics requirements.

Management

Management Data Lake Consulting Unstructured Data

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. Data can be organized into three different zones, as shown in the following figure.

Data Lake

Data Lake Sales Data Warehouse Snapshot

How Universal Data Distribution Accelerates Complex DoD Missions

Cloudera

AUGUST 11, 2022

But information broadly, and the management of data specifically, is still “the” critical factor for situational awareness, streamlined operations, and a host of other use cases across today’s tech-driven battlefields. . and routing through to descriptive, prescriptive, and predictive analytics . edge processing. transformation.

Predictive Analytics

Predictive Analytics Data-driven Data Processing Data Architecture

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

In this post, we discuss how Zurich built a hybrid architecture on AWS incorporating AWS services to satisfy their requirements. The solution was based on categorizing and prioritizing log data into priority levels between 1–3, and routing logs to different destinations based on priority.

Insurance

Insurance Management Cost-Benefit Optimization

For IT leaders, operationalized gen AI is still a moving target

CIO Business Intelligence

FEBRUARY 28, 2024

Data and API infrastructure “Data still matters,” says Bradley Shimmin, chief analyst for AI platforms, analytics, and data management at London-based independent analyst and consultancy Omdia. So by using the company’s data, a general-purpose language model becomes a useful business tool.

IT

IT Consulting Modeling Enterprise

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Consumer data: Data transmitted by customers including, banking records, banking data, stock market transactions, employee benefits, insurance claims, etc. Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc.

Big Data

Big Data B2B Cost-Benefit Structured Data

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. Some examples of Acast’s domains are presented in the following figure.

Data-driven

Data-driven Advertising Metadata Data Architecture

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Finance

Finance Metadata Big Data Recreation/Entertainment

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

As data volumes continue to grow exponentially, traditional data warehousing solutions may struggle to keep up with the increasing demands for scalability, performance, and advanced analytics. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Legacy architecture The customer’s platform was the main source for one-time, batch, and content processing.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Data Governance and Strategy for the Global Enterprise

Cloudera

OCTOBER 1, 2022

Sam Charrington, founder and host of the TWIML AI Podcast. As countries introduce privacy laws, similar to the European Union’s General Data Protection Regulation (GDPR), the way organizations obtain, store, and use data will be under increasing legal scrutiny. Sam Charrington, founder and host of the TWIML AI Podcast.

Data Governance

Data Governance Strategy Enterprise Machine Learning

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

The future of data: A 5-pillar approach to modern data management

Webinars

Eight Top DataOps Trends for 2022

Modern Data Architecture for Telecommunications

How EUROGATE established a data mesh architecture using Amazon DataZone

7 types of tech debt that could cripple your business

Power analytics as a service capabilities using Amazon Redshift

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Deciphering the Pros & Cons of Real-Time Data Streaming

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Announcing the 2020 Data Impact Award Winners

The essential check list for effective data democratization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Through the Looking Glass: Suspending Judgement on Synthetic Data

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

SAP enhances Datasphere and SAC for AI-driven transformation

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Understanding Digital Interactions in Real-Time

HEMA accelerates their data governance journey with Amazon DataZone

The Multifaceted Value Proposition of the Cloudera Data Platform

96 Percent of Businesses Can’t Be Wrong: How Hybrid Cloud Came to Dominate the Data Sector

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

Four Ways Telcos Can Realize Data-Driven Transformation

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Amazon Redshift data ingestion options

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

4 paths to sustainable AI

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Habib Bank manages data at scale with Cloudera Data Platform

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

How Universal Data Distribution Accelerates Complex DoD Missions

How Cloudera Data Flow Enables Successful Data Mesh Architectures

How Zurich Insurance Group built a log management solution on AWS

For IT leaders, operationalized gen AI is still a moving target

Big Data Ingestion: Parameters, Challenges, and Best Practices

Design a data mesh on AWS that reflects the envisioned organization

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Data Governance and Strategy for the Global Enterprise

Stay Connected