Data Architecture, Data Processing and Reference

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

To succeed in todays landscape, every company small, mid-sized or large must embrace a data-centric mindset. This article proposes a methodology for organizations to implement a modern data management function that can be tailored to meet their unique needs.

Management

Management Data Governance Data Science Reporting

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

AWS Big Data

FEBRUARY 7, 2024

Refer to How can I access OpenSearch Dashboards from outside of a VPC using Amazon Cognito authentication for a detailed evaluation of the available options and the corresponding pros and cons. For more information, refer to the AWS CDK v2 Developer Guide. For instructions, refer to Creating a public hosted zone.

Dashboards

Dashboards Data Processing Metadata Consulting

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Refer to IAM Identity Center identity source tutorials for the IdP setup. Copy and save the client ID and client secret needed later for the Streamlit application and the IAM Identity Center application to connect using the Redshift Data API. For more details, refer to Creating a workgroup with a namespace.

Visualization

Visualization Sales Data Warehouse Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

AWS Big Data

JUNE 12, 2024

In a two-part series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom ODP solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references. See the following admin user code: admin_secret_kms_key_options = KmsKeyOptions(.

Data Architecture

Data Architecture Cost-Benefit Data-driven Experimentation

Large Language Models and Data Management

Ontotext

JULY 24, 2023

I did some research because I wanted to create a basic framework on the intersection between large language models (LLM) and data management. But there are also a host of other issues (and cautions) to take into consideration. It was emphasized many times that LLMs are only as good as the data sources.

Modeling

Modeling Management Structured Data Data Architecture

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

For detailed information on managing your Apache Hive metastore using Lake Formation permissions, refer to Query your Apache Hive metastore with AWS Lake Formation permissions. In this post, we present a methodology for deploying a data mesh consisting of multiple Hive data warehouses across EMR clusters.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Understanding Digital Interactions in Real-Time

CIO Business Intelligence

JUNE 29, 2022

But this glittering prize might cause some organizations to overlook something significantly more important: constructing the kind of event-driven data architecture that supports robust real-time analytics. We can, in the semantics of the software world, refer to digitally mediated business activities asreal-time events.

Interactive

Interactive Data-driven Data Architecture Software

Big Data Ingestion: Parameters, Challenges, and Best Practices

datapine

AUGUST 20, 2019

Operations data: Data generated from a set of operations such as orders, online transactions, competitor analytics, sales data, point of sales data, pricing data, etc. The gigantic evolution of structured, unstructured, and semi-structured data is referred to as Big data. Videos, pictures etc.

Big Data

Big Data B2B Cost-Benefit Structured Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

The Data-Centric Revolution: Report From the Front Lines

TDAN

MARCH 17, 2020

Earlier this month we hosted the second annual Data-Centric Architecture Forum (#DCAF2020) in Fort Collins, CO. Last year, (2019) we hosted the first Data-Centric Architecture conference. In 2019, the focus was on getting a sketch of a reference architecture (click here to see). Trip report).

Reporting

Reporting Data Processing Data Architecture Data Strategy

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

To create it, refer to Tutorial: Get started with Amazon EC2 Windows instances. To download and install AWS SCT on the EC2 instance that you created, refer to Installing, verifying, and updating AWS SCT. For more information about bucket names, refer to Bucket naming rules. Select Redshift data agent , then choose OK.

Analytics

Analytics Data Warehouse Dashboards Testing

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. Means of ensuring data integrity.

Data Integration

Data Integration Testing Data Quality Data-driven

Delivering Power Platform Projects: Truths from the Field

Jen Stirrup

AUGUST 25, 2020

Just because technology is easy to use, it does not follow that the data is easy to understand. Don’t be fooled by easy-to-use technology; data can still be hard. Logic Apps are hosted in Azure and have a code view rather than a ‘business user coder’ view. Compliant or Complaint?

Advertising

Advertising Business Intelligence Data Processing Data Architecture

Boosting Object Storage Performance with Ozone Manager

Cloudera

JULY 19, 2023

Cisco has multiple reference architectures for running Ozone. Relevance of Operations per Second to Scale Ozone Manager hosts the metadata for the Objects stored within Ozone and consists of a cluster of Ozone Manager instances replicated via Ratis (a raft implementation ).

Management

Management Metadata Metrics Optimization

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

Cost and resource efficiency – This is an area where Acast observed a reduction in data duplication, and therefore cost reduction (in some accounts, removing the copy of data 100%), by reading data across accounts while enabling scaling. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

How Zurich Insurance Group built a log management solution on AWS

AWS Big Data

JULY 16, 2024

When data is moved to the Infrequent Access tier, costs are reduced by up to 40%. Similarly, when data is moved to the Archive Instant Access tier, storage costs are reduced by up to 68%. Refer to Amazon S3 pricing for current pricing, as well as for information by region.

Insurance

Insurance Management Cost-Benefit Optimization

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

An essential capability needed in such a data lake architecture is the ability to continuously understand changes in the data lakes in various other domains and make those available to data consumers. The data mesh producer account hosts the encrypted S3 bucket, which is shared with the central governance account.

Data Lake

Data Lake Data-driven Management Data Architecture

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Overall, the current architecture didn’t support workload prioritization, therefore a physical model of resources was reserved for this reason. The system had an integration with legacy backend services that were all hosted on premises. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Success criteria alignment by all stakeholders (producers, consumers, operators, auditors) is key for successful transition to a new Amazon Redshift modern data architecture. The success criteria are the key performance indicators (KPIs) for each component of the data workflow.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Solution overview The following diagram shows the overall architecture of the solution that we implement in this post. Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

Announcing Alation 4.0 with Alation Connect

Alation

FEBRUARY 20, 2020

Alation Connect synchronizes metadata, sample data, and query logs into the Alation Data Catalog. This rich usage context is what makes our Data Catalog a powerful point of reference for data consumers and data stewards. In the release of Alation 4.0, In the release of Alation 4.0,

Metadata

Metadata Enterprise Data Processing Data Architecture

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

BizAcuity

MAY 24, 2022

IaaS provides a platform for compute, data storage and networking capabilities. IaaS is mainly used for developing softwares (testing and development, batch processing), hosting web applications and data analysis. Companies develop data ecosystems on the cloud, the data architecture is now independent of on-premise systems.

Data-driven

Data-driven Cost-Benefit Digital Transformation Strategy

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

When building a scalable data architecture on AWS, giving autonomy and ownership to the data domains are crucial for the success of the platform. Solution overview In the first post of this series, we explained how Novo Nordisk and AWS Professional Services built a modern data architecture based on data mesh tenets.

Data Governance

Data Governance Management Data-driven Analytics

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

It is prudent to consolidate this data into a single customer view, serving as a primary reference for downstream applications, ranging from ecommerce platforms to CRM systems. This consolidated view acts as a liaison between the data platform and customer-centric applications.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Metadata exporter This section provides details on the AWS Glue job that exports the AWS Glue Data Catalog into an S3 location. The source code for the application is hosted the AWS Glue GitHub. He advises clients on architecting and adopting Data Architectures that best serve their Data Analytics and Machine Learning needs.

Metadata

Metadata Data Lake Machine Learning Big Data

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

Andrew White

JANUARY 9, 2022

On Thursday January 6th I hosted Gartner’s 2022 Leadership Vision for Data and Analytics webinar. Most of D&A concerns and activities are done within EA in the Info/Data architecture domain/phases. Here is a suggested note: Use Gartner’s Reference Model to Deliver Intelligent Composable Business Applications.

Analytics

Analytics Measurement Data-driven Modeling

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Data Environment First off, the solutions you consider should be compatible with your current data architecture. We have outlined the requirements that most providers ask for: Data Sources Strategic Objective Use native connectivity optimized for the data source.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

AWS Big Data

JANUARY 21, 2025

The Lambda function will invoke the Amazon Titan Text Embeddings Model hosted in Amazon Bedrock , allowing for efficient and scalable embedding creation. This architecture simplifies various use cases, including recommendation engines, personalized chatbots, and fraud detection systems.

Data Processing

Data Processing Metrics Data-driven Publishing

Design patterns for implementing Hive Metastore for Amazon EMR on EKS

AWS Big Data

FEBRUARY 28, 2025

In modern data architectures, the need to manage and query vast datasets efficiently, consistently, and accurately is paramount. For organizations that deal with big data processing, managing metadata becomes a critical concern. Any reference to HMS refers to a Standalone Hive Metastore.

Metadata

Metadata Data Lake Data Processing Data Architecture

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

AWS Big Data

OCTOBER 30, 2024

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with Amazon Redshift Spectrum and create the gold (consumption) layer. His focus areas are MLOps, feature stores, data lakes, model hosting, and generative AI.

Data Lake

Data Lake Machine Learning Data Architecture Data-driven

Data Leaders Brief

The future of data: A 5-pillar approach to modern data management

Build SAML identity federation for Amazon OpenSearch Service domains within a VPC

Webinars

Trending Sources

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

Large Language Models and Data Management

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Understanding Digital Interactions in Real-Time

Big Data Ingestion: Parameters, Challenges, and Best Practices

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Data-Centric Revolution: Report From the Front Lines

Amazon Redshift data ingestion options

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

Data Integrity, the Basis for Reliable Insights

Delivering Power Platform Projects: Truths from the Field

Boosting Object Storage Performance with Ozone Manager

Design a data mesh on AWS that reflects the envisioned organization

How Zurich Insurance Group built a log management solution on AWS

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Announcing Alation 4.0 with Alation Connect

Empowering data mesh: The tools to deliver BI excellence

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 2: Cloud Adoption

How Novo Nordisk built distributed data governance and control at scale

Create an end-to-end data strategy for Customer 360 on AWS

How Cargotec uses metadata replication to enable cross-account data sharing

The Gartner 2022 Leadership Vision for Data and Analytics Leaders Questions and Answers

What Is Embedded Analytics?

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

Design patterns for implementing Hive Metastore for Amazon EMR on EKS

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

Stay Connected