Data Lake, Data Strategy and Reference

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Refer to the detailed blog post on how you can use this to connect through various other tools. Yogesh Dhimate is a Sr.

Analytics

Analytics Visualization Data Governance Data-driven

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

Under the hood, UniForm generates Iceberg metadata files (including metadata and manifest files) that are required for Iceberg clients to access the underlying data files in Delta Lake tables. Both Delta Lake and Iceberg metadata files reference the same data files. in Delta Lake public document. Appendix 1.

Metadata

Metadata Data Warehouse Big Data Data Lake

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

AWS Big Data

SEPTEMBER 12, 2024

The new table needs to be refreshed periodically to get the latest data from the shared Data Cloud objects with this solution. Considerations when using data sharing in Amazon Redshift For a comprehensive list of considerations and limitations of data sharing, refer to Considerations when using data sharing in Amazon Redshift.

Data Lake

Data Lake Analytics Data-driven Data Strategy

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

Data can be shared with a Redshift Serverless or provisioned cluster in the same Region or with a Redshift Serverless cluster in a different Region. To get an overview of Salesforce Zero Copy integration with Amazon Redshift, please refer to this Salesforce Blog. For more details, refer to Querying the AWS Glue Data Catalog.

Data Lake

Data Lake Analytics Data-driven Management

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

A typical ask for this data may be to identify sales trends as well as sales growth on a yearly, monthly, or even daily basis. A key pillar of AWS’s modern data strategy is the use of purpose-built data stores for specific use cases to achieve performance, cost, and scale. This is achieved by partitioning the data.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

A data strategy checklist for the journey to the data-driven enterprise

BI-Survey

DECEMBER 22, 2020

While IT is happy to look after the technical storage and backup of data, they refer to line of business experts when it comes to quality and usability. Managers see data as relevant in the context of digitalization, but often think of data-related problems as minor details that have little strategic importance.

Data-driven

Data-driven Data Strategy Strategy Enterprise

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

To create your namespace and workgroup, refer to Creating a data warehouse with Amazon Redshift Serverless. Use Query Editor v2 to load customer data from Amazon S3 You can use Query Editor v2 to submit queries and load data to your data warehouse through a web interface.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Data Champions: Balancing IT and Business Needs

Cloudera

SEPTEMBER 10, 2020

With that in mind, the agency uses open-source technology and high-performance hybrid cloud infrastructure to transform how it processes demographic and economic data with an Enterprise Data Lake (EDL). This confidence and trust is key to enabling them to use data to its fullest potential and generating business value. .

IT

IT Business Objectives Digital Transformation Data-driven

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

The application gets prompt templates from an S3 data lake and creates the engineered prompt. The user interaction is stored in a data lake for downstream usage and BI analysis. Conclusion In this post, we discussed the importance of using customer data to differentiate generative AI usage in applications.

Management

Management Analytics Data Lake Interactive

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

Data producer setup In this section, we present the steps to set up the data producer. In the navigation pane, under Register and ingest , choose Data lake locations. For additional information about roles, refer to Requirements for roles used to register locations. Choose Register location.

Interactive

Interactive Snapshot Data Lake Software

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Cloudera

JUNE 13, 2024

After countless open-source innovations ushered in the Big Data era, including the first commercial distribution of HDFS (Apache Hadoop Distributed File System), commonly referred to as Hadoop, the two companies joined forces, giving birth to an entire ecosystem of technology and tech companies.

Big Data

Big Data Machine Learning Contextual Data Data Lake

Visualize Confluent data in Amazon QuickSight using Amazon Athena

AWS Big Data

MARCH 27, 2023

Businesses are using real-time data streams to gain insights into their company’s performance and make informed, data-driven decisions faster. As real-time data has become essential for businesses, a growing number of companies are adapting their data strategy to focus on data in motion.

Visualization

Visualization Data Lake Interactive Data-driven

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

By creating visual representations of data flows, organizations can gain a clear understanding of the lifecycle of personal data and identify potential vulnerabilities or compliance gaps. Note that putting a comprehensive data strategy in place is not in scope for this post.

Snapshot

Snapshot Metadata Measurement Data Warehouse

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

Patterns for enterprise data sharing at scale

AWS Big Data

FEBRUARY 27, 2023

Data sharing is becoming an important element of an enterprise data strategy. AWS services like AWS Data Exchange provide an avenue for companies to share or monetize their value-added data with other companies. Confidential or restricted data access might involve aspects of identity and authorization management.

Enterprise

Enterprise Publishing Data Lake Management

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

This allows for transparency, speed to action, and collaboration across the group while enabling the platform team to evangelize the use of data: Altron engaged with AWS to seek advice on their data strategy and cloud modernization to bring their vision to fruition.

Optimization

Optimization B2B Data Quality Sales

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Watsonx comprises of three powerful components: the watsonx.ai

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Depending on your enterprise’s culture and goals, your migration pattern of a legacy multi-tenant data platform to Amazon Redshift could use one of the following strategies: Leapfrog strategy – In this strategy, you move to an AWS modern data architecture and migrate one tenant at a time. Vijay Bagur is a Sr.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

In turn, they both must also have the data literacy skills to be able to verify the data’s accuracy, ensure its security, and provide or follow guidance on when and how it should be used. Then, it applies these insights to automate and orchestrate the data lifecycle. What are your data and AI objectives?

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

AWS Big Data

NOVEMBER 14, 2023

With data streaming, you can power data lakes running on Amazon Simple Storage Service (Amazon S3), enrich customer experiences via personalization, improve operational efficiency with predictive maintenance of machinery in your factories, and achieve better insights with more accurate machine learning (ML) models.

IoT

IoT Data-driven Data Lake Data Strategy

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

The comprehensive system which collectively includes generating data, storing the data, aggregating and analyzing the data, the tools, platforms and other softwares involved is referred to as Big Data Ecosystem. Data Management. The majority of the data a business has stored is generally unstructured.

Big Data

Big Data Data Analytics Management Analytics

What is Business Intelligence Consulting

BizAcuity

APRIL 1, 2023

Data governance and security measures are critical components of data strategy. Data strategy and management roadmap: Effective management and utilization of information has become a critical success factor for organizations. Data is susceptible to breach due to a number of reasons.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

What is Business Intelligence Consulting

BizAcuity

JANUARY 31, 2023

Data governance and security measures are critical components of data strategy. Data strategy and management roadmap: Effective management and utilization of information has become a critical success factor for organizations. Data is susceptible to breach due to a number of reasons.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

A Simple Data Capability Framework

Peter James Thomas

MAY 3, 2019

Control of Data to ensure it is Fit-for-Purpose. This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. When I first started focussing on the data arena, Data Warehouses were state of the art.

Strategy

Strategy Data Architecture Data Quality Data Strategy

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021! Discover why.

Modeling

Modeling Big Data IoT Data Warehouse

The CDO Imperative: From Process Centric to data-driven

Alation

FEBRUARY 20, 2020

Can I trust the data that I’m seeing? A Single Source of Reference. A data catalog has emerged as a core component of modern data organizations and key for CDOs making the transition from process-centric to data-driven. The catalog draws on third-party information to verify whether the data can be trusted.

Data-driven

Data-driven Internet of Things Data Lake Strategy

This Structure has Novel Features which are of Considerable Business Interest

Peter James Thomas

APRIL 3, 2020

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. In fact is is the crucial final link between an organisation’s data and the people who need to use it. In many ways how people experience data capabilities will be determined by this final link.

Dashboards

Dashboards Reporting Sales Data Lake

The Strategy Behind our Denodo Partner Program

Data Virtualization

SEPTEMBER 23, 2020

I’m referring not only to our technology partners, but also to our cloud partners that host the Denodo Platform, Denodo is a very partner-friendly company, and here I’d like to share some thoughts about how Denodo works with our partners.

Strategy

Strategy Data Processing Technology Digital Transformation

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

“Flashpoint” (2018) – GDPR went into effect, plus major data blunders happened seemingly everywhere. Data coming from machines tends to land (aka, data at rest ) in durable stores such as Amazon S3, then gets consumed by Hadoop, Spark, etc. Somehow, the gravity of the data has a geological effect that forms data lakes.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Simplify AWS Glue job orchestration and monitoring with Amazon MWAA

AWS Big Data

MAY 19, 2023

Organizations across all industries have complex data processing requirements for their analytical use cases across different analytics systems, such as data lakes on AWS , data warehouses ( Amazon Redshift ), search ( Amazon OpenSearch Service ), NoSQL ( Amazon DynamoDB ), machine learning ( Amazon SageMaker ), and more.

Machine Learning

Machine Learning Metrics Big Data Management

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

AWS Big Data

APRIL 28, 2025

Access your existing data and resources through Amazon SageMaker Unified Studio Part 1: AWS Glue Data Catalog and Amazon Redshift (This post) Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR This series primarily focuses on the UI experience. Enter the S3 prefix for Amazon S3 path.

Metadata

Metadata Data Lake Big Data Publishing

Amazon Redshift announces history mode for zero-ETL integrations to simplify historical data tracking and analysis

AWS Big Data

FEBRUARY 18, 2025

Furthermore, we increased the breadth of sources to include Aurora PostgreSQL, DynamoDB, and Amazon RDS for MySQL to Amazon Redshift integrations, solidifying our commitment to making it seamless for you to run analytics on your data. For instructions, refer to Getting started with Aurora zero-ETL integrations with Amazon Redshift.

Data Warehouse

Data Warehouse Optimization Data Lake Marketing

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

AWS Big Data

OCTOBER 30, 2024

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with Amazon Redshift Spectrum and create the gold (consumption) layer. The following diagram illustrates the different layers of the data lake.

Data Lake

Data Lake Machine Learning Data Architecture Data-driven

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Webinars

Trending Sources

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Create an end-to-end data strategy for Customer 360 on AWS

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Lake Formation 2022 year in review

Data governance in the age of generative AI

What is a data architect? Skills, salaries, and how to become a data framework master

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

A data strategy checklist for the journey to the data-driven enterprise

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Data Champions: Balancing IT and Business Needs

Differentiate generative AI applications with your data using AWS analytics and managed databases

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Addressing the Elephant in the Room – Welcome to Today’s Cloudera

Visualize Confluent data in Amazon QuickSight using Amazon Athena

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

Unstructured data management and governance using AWS AI/ML and analytics services

Patterns for enterprise data sharing at scale

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Data science vs data analytics: Unpacking the differences

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Data democratization: How data architecture can drive business decisions and AI initiatives

Amazon Kinesis Data Streams: celebrating a decade of real-time data innovation

How Data Management and Big Data Analytics Speed Up Business Growth

What is Business Intelligence Consulting

What is Business Intelligence Consulting

A Simple Data Capability Framework

Building Better Data Models to Unlock Next-Level Intelligence

The CDO Imperative: From Process Centric to data-driven

This Structure has Novel Features which are of Considerable Business Interest

The Strategy Behind our Denodo Partner Program

Themes and Conferences per Pacoid, Episode 8

Simplify AWS Glue job orchestration and monitoring with Amazon MWAA

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

Amazon Redshift announces history mode for zero-ETL integrations to simplify historical data tracking and analysis

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

Stay Connected