Data Architecture, Data Processing and Document

Data Architecture

Data Processing

Document

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

Manish Limaye Pillar #1: Data platform The data platform pillar comprises tools, frameworks and processing and hosting technologies that enable an organization to process large volumes of data, both in batch and streaming modes. Now, mature organizations implement cybersecurity broadly using DevSecOps practices.

Management

Management Data Governance Data Science Reporting

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

AWS Big Data

JANUARY 21, 2025

The Lambda function will invoke the Amazon Titan Text Embeddings Model hosted in Amazon Bedrock , allowing for efficient and scalable embedding creation. This architecture simplifies various use cases, including recommendation engines, personalized chatbots, and fraud detection systems.

Data Processing

Data Processing Metrics Data-driven Publishing

Webinars

How to Streamline Payment Applications & Lien Waivers Through Innovative Construction Technology

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

7 types of tech debt that could cripple your business

CIO Business Intelligence

MARCH 25, 2025

Build up: Databases that have grown in size, complexity, and usage build up the need to rearchitect the model and architecture to support that growth over time. It also anonymizes all PII so the cloud-hosted chatbot cant be fed private information.

Risk

Risk Cost-Benefit Data-driven Digital Transformation

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Teradata

MAY 30, 2025

Infrastructure layout Diagram illustrating the data flow between each component of the infrastructure Prerequisites Before you embark on this integration, ensure you have the following set up: Access to a Vantage instance: If you need a test instance of Vantage, you can provision one for free Python 3.10 dbt-core dagster==1.7.9

Data Integration

Data Integration Data Processing Metadata Testing

Centralize Apache Spark observability on Amazon EMR on EKS with external Spark History Server

AWS Big Data

JUNE 3, 2025

Set up AWS Private CA and create a Route 53 private hosted zone Use the following code to deploy AWS Private CA and create a Route 53 private hosted zone. For detailed guidance, refer to Spark’s web UI security documentation and SHS security features. Suvojit Dasgupta is a Principal Data Architect at AWS. deploy_ssl.sh

Metrics

Metrics Data Processing Visualization Data-driven

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Data Architecture

Data Architecture Data Warehouse Metadata Sales

SAP enhances Datasphere and SAC for AI-driven transformation

CIO Business Intelligence

MARCH 6, 2024

SAP announced today a host of new AI copilot and AI governance features for SAP Datasphere and SAP Analytics Cloud (SAC). Invoices as an object inside an enterprise could be an email or a PDF document; it could be a text file,” Chirapurath said. SAC has to be able to understand all those things and then provide links to it.

Unstructured Data

Unstructured Data Dashboards Business Intelligence Data Governance

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Create an Amazon Route 53 public hosted zone such as mydomain.com to be used for routing internet traffic to your domain. For instructions, refer to Creating a public hosted zone. Request an AWS Certificate Manager (ACM) public certificate for the hosted zone. hosted_zone_id – The Route 53 public hosted zone ID.

Dashboards

Dashboards Data Processing Metadata Consulting

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents.

Reporting

Reporting Data Quality Data-driven Strategy

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling.

Data-driven

Data-driven Advertising Metadata Data Architecture

Announcing Alation 4.0 with Alation Connect

Alation

FEBRUARY 20, 2020

to catalog enterprise data by observing analyst behaviors. Our approach was contrasted with the traditional manual wiki of notes and documentation and labeled as a modern data catalog. We envisioned and learnt from the early production customer implementations that cataloging data wasn’t enough.

Metadata

Metadata Enterprise Data Processing Data Architecture

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

Cloudera

SEPTEMBER 7, 2023

Use NLP to analyze and break down regulatory documents, translating complex legal jargon into actionable tasks. Leverage ML/AI to refine risk models, incorporating data from diverse sources, and predicting outcomes based on market sentiment, climate data, etc. Find out more about CDP, modern data architectures and AI here.

Insurance

Insurance Risk Data-driven Data Quality

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

On Thursday January 6th I hosted Gartner’s 2022 Leadership Vision for Data and Analytics webinar. Is there a good map that shows the connections between data, advanced analytics, digital, innovation, etc. I would have to admit that there are few documents that talk about all the connections across any set of topics.

Analytics

Analytics Measurement Data-driven Data Science

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source. Read carefully.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Build a centralized observability platform for Apache Spark on Amazon EMR on EKS using external Spark History Server

AWS Big Data

JUNE 3, 2025

Set up AWS Private CA and create a Route 53 private hosted zone Use the following code to deploy AWS Private CA and create a Route 53 private hosted zone. For detailed guidance, refer to Sparks web UI security documentation and SHS security features. Suvojit Dasgupta is a Principal Data Architect at AWS. deploy_ssl.sh

Metrics

Metrics Data Processing Visualization Data-driven

Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

AWS Big Data

MAY 1, 2025

Refer to the BPG REST API interface documentation for additional details. This architecture provides several key benefits: Separation of responsibilities Data Engineering and Platform Engineering teams in enterprise organizations typically maintain distinct responsibilities. You can find the code base in the GitHub repo.

Cost-Benefit

Cost-Benefit Interactive Management Data Processing

Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

The future of data: A 5-pillar approach to modern data management

Webinars

Trending Sources

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

7 types of tech debt that could cripple your business

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Centralize Apache Spark observability on Amazon EMR on EKS with external Spark History Server

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

SAP enhances Datasphere and SAC for AI-driven transformation

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

How Cloudera Data Flow Enables Successful Data Mesh Architectures

CIOs rise to the ESG reporting challenge

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Design a data mesh on AWS that reflects the envisioned organization

Announcing Alation 4.0 with Alation Connect

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

What Is Embedded Analytics?

Build a centralized observability platform for Apache Spark on Amazon EMR on EKS using external Spark History Server

Build end-to-end Apache Spark pipelines with Amazon MWAA, Batch Processing Gateway, and Amazon EMR on EKS clusters

Stay Connected