Data Architecture, Data Processing and Optimization

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Why telco should consider modern data architecture. The challenges.

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

Build up: Databases that have grown in size, complexity, and usage build up the need to rearchitect the model and architecture to support that growth over time. It also anonymizes all PII so the cloud-hosted chatbot cant be fed private information.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau.

IoT

IoT Machine Learning Metadata Data-driven

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Cloudera

OCTOBER 19, 2022

The telecommunications industry continues to develop hybrid data architectures to support data workload virtualization and cloud migration. Telco organizations are planning to move towards hybrid multi-cloud to manage data better and support their workforces in the near future. 2- AI capability drives data monetization.

Optimization

Optimization Data Architecture Data Governance B2B

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

CIO Business Intelligence

MARCH 19, 2025

Integrating ESG into data decision-making CDOs should embed sustainability into data architecture, ensuring that systems are designed to optimize energy efficiency, minimize unnecessary data replication and promote ethical data use.

IT

IT Data Governance Data-driven Metrics

National Grid’s energy transformation is fueled by IT

CIO Business Intelligence

MAY 20, 2022

Modernizing a utility’s data architecture. These capabilities allow us to reduce business risk as we move off of our monolithic, on-premise environments and provide cloud resiliency and scale,” the CIO says, noting National Grid also has a major data center consolidation under way as it moves more data to the cloud.

IT

IT Internet of Things Digital Transformation Data Architecture

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse. In this post, we show how smava optimized their data platform by using Amazon Redshift Serverless and Amazon Redshift data sharing to overcome right-sizing challenges for unpredictable workloads and further improve price-performance.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

The new approach would need to offer the flexibility to integrate new technologies such as machine learning (ML), scalability to handle long-term retention at forecasted growth levels, and provide options for cost optimization. Athena supports a variety of compression formats for reading and writing data.

Insurance

Insurance Management Cost-Benefit Optimization

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

The Multifaceted Value Proposition of the Cloudera Data Platform

Cloudera

FEBRUARY 22, 2021

The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. Infrastructure cost optimization. reduce technology costs, accelerate organic growth initiatives). Business value acceleration. In particular, SDX enables clients to: .

Cost-Benefit

Cost-Benefit Data Warehouse Data Processing Data Governance

Large Language Models and Data Management

Ontotext

JULY 24, 2023

I did some research because I wanted to create a basic framework on the intersection between large language models (LLM) and data management. But there are also a host of other issues (and cautions) to take into consideration. Another concern relates to the definition of ‘data constraints.’

Modeling

Modeling Management Structured Data Data Architecture

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging. The following high-level architecture diagram shows ODP with different layers of the modern data architecture.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. HEMA has a bespoke enterprise architecture, built around the concept of services. Tommaso is the Head of Data & Cloud Platforms at HEMA.

Data Governance

Data Governance Publishing Data-driven Metadata

Understanding Digital Interactions in Real-Time

CIO Business Intelligence

JUNE 29, 2022

But this glittering prize might cause some organizations to overlook something significantly more important: constructing the kind of event-driven data architecture that supports robust real-time analytics. An event-based, real-time data architecture is precisely how businesses today create the experiences that consumers expect.

Interactive

Interactive Data-driven Data Architecture Software

New Practices in Data Governance and Data Fabric for Telecommunications

Cloudera

SEPTEMBER 8, 2022

Combined with the characteristics of the infrastructure itself (location, cost, performance) should be workload profiles, including access controls and collaboration, workload optimization features (e.g. for machine learning), and other enterprise policies.

Data Governance

Data Governance B2B Data Architecture IoT

4 paths to sustainable AI

CIO Business Intelligence

JANUARY 31, 2024

The size of the data sets is limited by business concerns. Use renewable energy Hosting AI operations at a data center that uses renewable power is a straightforward path to reduce carbon emissions, but it’s not without tradeoffs.

Cost-Benefit

Cost-Benefit Modeling Testing IoT

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. This data is sent to Apache Kafka, which is hosted on Amazon Managed Streaming for Apache Kafka (Amazon MSK).

Data Lake

Data Lake Analytics Snapshot Data Quality

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data. Self-Service.

Big Data

Big Data B2B Cost-Benefit Structured Data

Boosting Object Storage Performance with Ozone Manager

Cloudera

JULY 19, 2023

Cisco has multiple reference architectures for running Ozone. The hardware certification includes high density nodes with close to 500 TB per node optimized for performance and TCO. Data processing workloads tend to be more sensitive to the performance of transferring data between Datanodes and the various applications that process it.

Management

Management Metadata Metrics Optimization

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Addressing the data storm with the Enterprise Data Cloud

Cloudera

SEPTEMBER 15, 2020

The main reason for this change is that this title better represents the move that our customers are making; away from acknowledging the ability to have data ‘anywhere’. It delivers the same data management capabilities across all of these disparate environments.

Enterprise

Enterprise Cost-Benefit Digital Transformation Data Processing

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

The architecture consists of many layers: Rules engine – The rules engine was responsible for intercepting every incoming request. Based on the nature of the request, it routed the request to the API cluster that could optimally process that specific request based on the response time requirement.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets.

Finance

Finance Metadata Big Data Recreation/Entertainment

The power of remote engine execution for ETL/ELT data pipelines

IBM Big Data Hub

MAY 15, 2024

Transformation styles like TETL (transform, extract, transform, load) and SQL Pushdown also synergies well with a remote engine runtime to capitalize on source/target resources and limit data movement, thus further reducing costs. With a multicloud data strategy, organizations need to optimize for data gravity and data locality.

Cost-Benefit

Cost-Benefit Data Integration Data Architecture Manufacturing

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

JANUARY 17, 2021

The last two years have seen remarkable acceleration of digital transformation in a whole host of segments. The data spun off its business is remarkable allowing advanced analytics use cases such as: Business Category. Marketing and Sales Optimization – . Pricing Optimization – . By 2025, Industry 4.0

Digital Transformation

Digital Transformation Cost-Benefit Manufacturing Sales

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. Data can be organized into three different zones, as shown in the following figure.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling.

Data-driven

Data-driven Advertising Metadata Data Architecture

Meet the newest Data Superheros: The Sixth Annual Data Impact Awards Finalists Are…

Cloudera

AUGUST 28, 2018

Here’s what a few our judges had to say after reviewing and scoring nominations: “The nominations showed highly creative, innovative ways of using data, analytics, data science and predictive methodologies to optimize processes and to provide more positive customer experiences. ” – Cornelia Levy-Bencheton. .”

Machine Learning

Machine Learning Digital Transformation Consulting IoT

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Ontotext

DECEMBER 1, 2023

Most organisations are missing this ability to connect all the data together. from Q&A with Tim Berners-Lee ) Finally, Sumit highlighted the importance of knowledge graphs to advance semantic data architecture models that allow unified data access and empower flexible data integration.

Metadata

Metadata Sales Machine Learning Consulting

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is straightforward to use with self-tuning and self-optimizing capabilities. You can get faster insights without spending valuable time managing your data warehouse. All data written to Amazon Redshift is automatically and continuously replicated to Amazon Simple Storage Service (Amazon S3).

Analytics

Analytics Data Warehouse Dashboards Testing

Introducing erwin Data Modeler 14.0: The next step in a tradition of data modeling excellence

erwin

SEPTEMBER 16, 2024

Migration and modernization : It enables seamless transitions between legacy systems and modern platforms, ensuring your data architecture evolves without disruption. Migration and modernization : It enables seamless transitions between legacy systems and modern platforms, ensuring your data architecture evolves without disruption.

Modeling

Modeling Visualization Data Governance Data Architecture

VeloxCon 2024: Innovation in data management

IBM Big Data Hub

APRIL 29, 2024

VeloxCon 2024 , the premier developer conference that is dedicated to the Velox open-source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management.

Management

Management Optimization Data Processing Metrics

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

Cloudera

SEPTEMBER 7, 2023

3- Advanced AI Integration At this stage of adoption, financial institutions and insurance companies engage more intensively with AI and its capabilities, extracting more valuable insights from data. Push predictive analytics to optimize operations and enhance profitability. Even more training and upskilling.

Insurance

Insurance Risk Data-driven Finance

5 misconceptions about cloud data warehouses

IBM Big Data Hub

FEBRUARY 2, 2023

Misconception 3: All data warehouse migrations are the same, irrespective of vendors While migrating to the cloud, CTOs often feel the need to revamp and “modernize” their entire technology stack – including moving to a new cloud data warehouse vendor. This enabled data-driven analytics at scale across the organization 4.

Data Warehouse

Data Warehouse Cost-Benefit Unstructured Data Data Architecture

Announcing Alation 4.0 with Alation Connect

Alation

FEBRUARY 20, 2020

Alation Connect previously synced metadata and query logs from data storage systems including the Hive Metastore on Hadoop and databases from Teradata, IBM, Oracle, SqlServer, Redshift, Vertica, SAP Hana and Greenplum. Get the latest data cataloging news and trends in your inbox. In the release of Alation 4.0,

Metadata

Metadata Enterprise Data Processing Data Architecture

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Strategize based on how your teams explore data, run analyses, wrangle data for downstream requirements, and visualize data at different levels. The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Solution overview The following diagram shows the overall architecture of the solution that we implement in this post. Monjumi Sarma is a Data Lab Solutions Architect at AWS.

Data Lake

Data Lake Dashboards Metrics Metadata

A guide to efficient Oracle implementation

IBM Big Data Hub

DECEMBER 4, 2023

The platform has been used to modernize and unify the information technology (IT) ecosystem of major financial firms, simplify human capital management (HCM) across brands’ subsidiaries, and optimize reporting processes in complex healthcare settings.

Testing

Testing Consulting Digital Transformation Cost-Benefit

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

The selection of the best BI tools stands as a critical step in leveraging data effectively, driving success, and maintaining competitive advantage in modern markets. Data-driven Decisions: BI tools empower businesses to make informed decisions by furnishing actionable insights, optimizing operations, and uncovering growth opportunities.

Dashboards

Dashboards Visualization Data mining Data-driven

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Modern Data Architecture for Telecommunications

Webinars

Trending Sources

7 types of tech debt that could cripple your business

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

CDOs: Your AI is smart, but your ESG is dumb. Here’s how to fix it

National Grid’s energy transformation is fueled by IT

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How Zurich Insurance Group built a log management solution on AWS

Announcing the 2020 Data Impact Award Winners

The Multifaceted Value Proposition of the Cloudera Data Platform

Large Language Models and Data Management

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

HEMA accelerates their data governance journey with Amazon DataZone

Understanding Digital Interactions in Real-Time

New Practices in Data Governance and Data Fabric for Telecommunications

4 paths to sustainable AI

Amazon Redshift data ingestion options

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Big Data Ingestion: Parameters, Challenges, and Best Practices

Boosting Object Storage Performance with Ozone Manager

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Addressing the data storm with the Enterprise Data Cloud

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

The power of remote engine execution for ETL/ELT data pipelines

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Design a data mesh on AWS that reflects the envisioned organization

Meet the newest Data Superheros: The Sixth Annual Data Impact Awards Finalists Are…

KGF 2023: Bikes To The Moon, Datastrophies, Abstract Art And A Knowledge Graph Forum To Embrace Them All

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Introducing erwin Data Modeler 14.0: The next step in a tradition of data modeling excellence

VeloxCon 2024: Innovation in data management

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

5 misconceptions about cloud data warehouses

Announcing Alation 4.0 with Alation Connect

Create an end-to-end data strategy for Customer 360 on AWS

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

A guide to efficient Oracle implementation

Best BI Tools For 2024 You Need to Know

Stay Connected