Data Lake, Reference and Structured Data

Data Lake

Reference

Structured Data

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Today, Amazon Redshift is used by customers across all industries for a variety of use cases, including data warehouse migration and modernization, near real-time analytics, self-service analytics, data lake analytics, machine learning (ML), and data monetization.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. For more examples and references to other posts, refer to the following GitHub repository.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. Deploying Data Lakes in the cloud. Best practices to build a Data Lake.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Jet Global

NOVEMBER 5, 2020

That stands for “bring your own database,” and it refers to a model in which core ERP data are replicated to a separate standalone database used exclusively for reporting. Option 3: Azure Data Lakes. This leads us to Microsoft’s apparent long-term strategy for D365 F&SCM reporting: Azure Data Lakes.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Jet Global

SEPTEMBER 4, 2020

Its solution was to replicate data from the production database, using data entities, into a traditional relational database. Microsoft referred to this approach as “bring your own database” (BYOD). There is an established body of practice around creating, managing, and accessing OLAP data (known as “cubes”). Data Lakes.

Data Lake

Data Lake OLAP Data Warehouse Unstructured Data

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on.

Metadata

Metadata Data Lake Modeling Data Warehouse

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Refer to the Amazon Redshift Database Developer Guide for more details. Refer to API Dimensions & Metrics for details.

Analytics

Analytics Data Warehouse Big Data Metrics

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

Ingestion: Data lake batch, micro-batch, and streaming Many organizations land their source data into their data lake in various ways, including batch, micro-batch, and streaming jobs. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Amazon DataZone announces custom blueprints for AWS services

AWS Big Data

JUNE 26, 2024

New feature: Custom AWS service blueprints Previously, Amazon DataZone provided default blueprints that created AWS resources required for data lake, data warehouse, and machine learning use cases. If you’re new to Amazon DataZone, refer to Getting started.

Data Lake

Data Lake Data Warehouse Unstructured Data Data Governance

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

By changing the cost structure of collecting data, it increased the volume of data stored in every organization. Additionally, Hadoop removed the requirement to model or structure data when writing to a physical store. You did not have to understand or prepare the data to get it into Hadoop, so people rarely did.

Data Lake

Data Lake Metadata Structured Data Big Data

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Data Warehouse Consulting

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

Amazon Redshift Spectrum enables querying structured and semi-structured data in Amazon Simple Storage Service (Amazon S3) without having to load the data into Redshift tables. To get an overview of Salesforce Zero Copy integration with Amazon Redshift, please refer to this Salesforce Blog.

Data Lake

Data Lake Analytics Data-driven Management

Set up cross-account AWS Glue Data Catalog access using AWS Lake Formation and AWS IAM Identity Center with Amazon Redshift and Amazon QuickSight

AWS Big Data

AUGUST 5, 2024

These business units have varying landscapes, where a data lake is managed by Amazon Simple Storage Service (Amazon S3) and analytics workloads are run on Amazon Redshift , a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data.

Data Lake

Data Lake Finance Sales Management

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

For instance, a Data Cloud-triggered flow could update an account manager in Slack when shipments in an external data lake are marked as delayed. Sharing Customer 360 insights back without data replication. Currently, Data Cloud leverages live SQL queries to access data from external data platforms via zero copy.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

These services enable you to collect and analyze data in near real time and put a comprehensive data governance framework in place that uses granular access control to secure sensitive data from unauthorized users. To create an AWS HealthLake data store, refer to Getting started with AWS HealthLake.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

Automate schema evolution at scale with Apache Hudi in AWS Glue

AWS Big Data

FEBRUARY 7, 2023

Business needs often drive table structure, such as schema evolution (the addition of new columns, removal of existing columns, update of column names, and so on) for some of these tables in one business function that requires other business functions to replicate the same. Launch the following stack and provide your stack name.

Data Lake

Data Lake Testing Big Data Structured Data

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Solution overview Amazon Redshift is an industry-leading cloud data warehouse.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users.

Analytics

Analytics Data Warehouse Data Lake Metadata

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

To learn more about RAG, refer to Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart. A RAG-based generative AI application can only produce generic responses based on its training data and the relevant documents in the knowledge base.

Data Lake

Data Lake Unstructured Data Management Snapshot

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams. The queries on these large datasets read vast amounts of data and can perform complex join operations on multiple datasets.

Statistics

Statistics Data Lake Optimization Data-driven

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

AWS Big Data

JUNE 21, 2023

The challenge comes when we need to ask more complex questions of our data, for example, what was the year-on-year quarterly sales growth by product broken down by country? The case for a data warehouse A data warehouse is ideally suited to answer OLAP queries. To house our data, we need to define a data model.

Data Warehouse

Data Warehouse Data Lake OLAP Cost-Benefit

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

AWS Big Data

OCTOBER 2, 2023

For a deeper exploration on configuring and using streaming ingestion in Amazon Redshift , refer to Real-time analytics with Amazon Redshift streaming ingestion. For streams that contain the raw binary data encoded in JSON format, Amazon Redshift provides a variety of tools for parsing and managing the data.

Cost-Benefit

Cost-Benefit Metadata Structured Data Data-driven

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

Build a decentralized semantic search engine on heterogeneous data stores using autonomous agents

AWS Big Data

MAY 28, 2024

The details of each step are as follows: Populate the Amazon Redshift Serverless data warehouse with company stock information stored in Amazon Simple Storage Service (Amazon S3). Redshift Serverless is a fully functional data warehouse holding data tables maintained in real time.

Unstructured Data

Unstructured Data Data Warehouse Structured Data Testing

Unstructured data management and governance using AWS AI/ML and analytics services

AWS Big Data

OCTOBER 25, 2023

Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data.

Unstructured Data

Unstructured Data Metadata Management Analytics

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Let’s explore the continued relevance of data modeling and its journey through history, challenges faced, adaptations made, and its pivotal role in the new age of data platforms, AI, and democratized data access. Embracing the future In the dynamic world of data, data modeling remains an indispensable tool.

Data-driven

Data-driven Modeling Enterprise Structured Data

A Look at Data Entities and BYOD for Accountants

Jet Global

OCTOBER 30, 2020

That means many of the reporting tools that customers previously used to access Microsoft Dynamics AX data will no longer work with D365 F&SCM. We refer to the first as “data entities.” You can think of data entities as a kind of translation layer or gatekeeper. Introducing Data Lakes. CustomerName.

Data Lake

Data Lake Unstructured Data Reporting Finance

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

AWS Big Data

JULY 19, 2023

We’ve seen that there is a demand to design applications that enable data to be portable across cloud environments and give you the ability to derive insights from one or more data sources. With this connector, you can bring the data from Google Cloud Storage to Amazon S3. A Secrets Manager secret to store a Google Cloud secret.

Big Data

Big Data Software Consulting Unstructured Data

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

AWS Big Data

JULY 28, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed for analyzing large volumes of data and performing complex queries on structured and semi-structured data. For more information about tagging, refer to Tagging resources in Amazon Redshift.

Snapshot

Snapshot Metadata Measurement Data Warehouse

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

It is prudent to consolidate this data into a single customer view, serving as a primary reference for downstream applications, ranging from ecommerce platforms to CRM systems. This consolidated view acts as a liaison between the data platform and customer-centric applications.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Successfully conduct a proof of concept in Amazon Redshift

AWS Big Data

MARCH 27, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. This also makes sure that end-users have the latest data available in Amazon Redshift shortly after the source files are available.

Testing

Testing Data Warehouse Metrics Cost-Benefit

Understanding Structured and Unstructured Data

Sisense

APRIL 26, 2020

Companies and businesses focus a lot on data collection in order to make sure they can get valuable insights out of it. Understanding data structure is a key to unlocking its value. A data’s “structure” refers to a particular way of organizing and storing it in a database or warehouse so that it can be accessed and analyzed.

Unstructured Data

Unstructured Data Data Warehouse Structured Data Data mining

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

Data Storage The data storage component of a pipeline provides secure, scalable storage for the data. Various data storage methods are available, including data warehouses for structured data or data lakes for unstructured, semi-structured, and structured data.

Data Lake

Data Lake Data Governance Data Warehouse Data Processing

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

JANUARY 22, 2024

They classified the metrics and indicators in the following categories: Data usage – A clear understanding of who is consuming what data source, materialized with a mapping of consumers and producers. In this approach, teams responsible for generating data are referred to as producers.

Data-driven

Data-driven Advertising Metadata Data Architecture

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Data analytics is a task that resides under the data science umbrella and is done to query, interpret and visualize datasets.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021!

Modeling

Modeling Big Data IoT Data Warehouse

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

As referenced above, modern data catalogs often support an asset type, like an article or a project page, which allows data scientists to capture their work as its own discoverable entity. Cataloging data science projects in this way is critical to helping them generate value for the company.

Metadata

Metadata Data Quality Statistics Data Science

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Data governance is traditionally applied to structured data assets that are most often found in databases and information systems. This blog focuses on governing spreadsheets that contain data, information, and metadata, and must themselves be governed.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

AWS Big Data

JULY 12, 2023

Amazon Redshift helps you break down the data silos and allows you to run unified, self-service, real-time, and predictive analytics on all data across your operational databases, data lake, data warehouse, and third-party datasets with built-in governance. This is often a laborious and error-prone process.

Data Warehouse

Data Warehouse Modeling Dashboards Data Lake

Recap of Amazon Redshift key product announcements in 2024

Run Apache XTable in AWS Lambda for background conversion of open table formats

Webinars

Trending Sources

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Webinars

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Data Lakes on Cloud & it’s Usage in Healthcare

Complexity Drives Costs: A Look Inside BYOD and Azure Data Lakes

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Navigating Data Entities, BYOD, and Data Lakes in Microsoft Dynamics

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

Data governance in the age of generative AI

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Amazon DataZone announces custom blueprints for AWS services

Data Cataloging in the Data Lake: Alation + Kylo

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

Set up cross-account AWS Glue Data Catalog access using AWS Lake Formation and AWS IAM Identity Center with Amazon Redshift and Amazon QuickSight

Salesforce debuts Zero Copy Partner Network to ease data integration

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

Automate schema evolution at scale with Apache Hudi in AWS Glue

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Exploring real-time streaming for generative AI Applications

Enhance query performance using AWS Glue Data Catalog column-level statistics

Build an Amazon Redshift data warehouse using an Amazon DynamoDB single-table design

Non-JSON ingestion using Amazon Kinesis Data Streams, Amazon MSK, and Amazon Redshift Streaming Ingestion

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Build a decentralized semantic search engine on heterogeneous data stores using autonomous agents

Unstructured data management and governance using AWS AI/ML and analytics services

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

A Look at Data Entities and BYOD for Accountants

Migrate data from Google Cloud Storage to Amazon S3 using AWS Glue

Five actionable steps to GDPR compliance (Right to be forgotten) with Amazon Redshift

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Create an end-to-end data strategy for Customer 360 on AWS

Successfully conduct a proof of concept in Amazon Redshift

Understanding Structured and Unstructured Data

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Design a data mesh on AWS that reflects the envisioned organization

Data science vs data analytics: Unpacking the differences

Building Better Data Models to Unlock Next-Level Intelligence

The Data Scientist’s Guide to the Data Catalog

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

Stay Connected