Data Lake, Data Warehouse and Statistics

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

AWS Big Data

OCTOBER 1, 2024

Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries.

Data Lake

Data Lake Statistics Broadcasting Optimization

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

Analytics

Analytics Data Lake Metadata Data Warehouse

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

This book is not available until January 2022, but considering all the hype around the data mesh, we expect it to be a best seller. In the book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today’s organizations.

Data-driven

Data-driven Data Governance Big Data Data Science

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Outside of his work, Naidu practices yoga and goes trekking often.

Metadata

Metadata Data Lake Modeling Data Warehouse

What is Data Pipeline? A Detailed Explanation

Smart Data Collective

OCTOBER 17, 2022

A point of data entry in a given pipeline. Examples of an origin include storage systems like data lakes, data warehouses and data sources that include IoT devices, transaction processing applications, APIs or social media. The final point to which the data has to be eventually transferred is a destination.

Data Warehouse

Data Warehouse Data Lake Visualization Big Data

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

In these instances, data feeds come largely from various advertising channels, and the reports they generate are designed to help marketers spend wisely. All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. SAS Data Management. Of course, marketing also works.

Management

Management Advertising Data Lake Sales

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures. Are data architects in demand?

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

Because Gilead is expanding into biologics and large molecule therapies, and has an ambitious goal of launching 10 innovative therapies by 2030, there is heavy emphasis on using data with AI and machine learning (ML) to accelerate the drug discovery pipeline. Loading data is a key process for any analytical system, including Amazon Redshift.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

AWS Big Data

AUGUST 21, 2024

The current scaling approach of Amazon Redshift Serverless increases your compute capacity based on the query queue time and scales down when the queuing reduces on the data warehouse. This post also includes example SQLs, which you can run on your own Redshift Serverless data warehouse to experience the benefits of this feature.

Optimization

Optimization Data Lake Data Warehouse Cost-Benefit

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

What are the benefits of data management platforms? Modern, data-driven marketing teams must navigate a web of connected data sources and formats. All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Of course, marketing also works.

Management

Management Advertising Data Lake Sales

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

Use case A typical workload for AWS Glue for Apache Spark jobs is to load data from a relational database to a data lake with SQL-based transformations. On the Graphed metrics tab, configure your preferred statistic, period, and so on. When the example job ran, the workerUtilization metrics showed the following trend.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Four Topics That Should Be Top of Mind for SAP Partners

Timo Elliott

JUNE 19, 2023

All of the statistics from IDC and the others show that there’s a massive market for digital services. The next area is data. There’s a huge disruption around data. Increasingly now, we can bring the technology to the data rather than the other way around. The first is the new digital opportunities.

Data Lake

Data Lake Digital Transformation Recreation/Entertainment Technology

Quantitative and Qualitative Data: A Vital Combination

Sisense

OCTOBER 6, 2020

Let’s consider the differences between the two, and why they’re both important to the success of data-driven organizations. Digging into quantitative data. This is quantitative data. It’s “hard,” structured data that answers questions such as “how many?” Qualitative data benefits: Unlocking understanding.

Statistics

Statistics Unstructured Data Data-driven Visualization

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

SEPTEMBER 25, 2023

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP

OLAP Data Lake Data-driven Online Analytical Processing

Successfully conduct a proof of concept in Amazon Redshift

AWS Big Data

MARCH 27, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Complete the implementation tasks such as data ingestion and performance testing. Analyze the data and then optimize as necessary.

Testing

Testing Data Warehouse Metrics Cost-Benefit

Data science vs data analytics: Unpacking the differences

IBM Big Data Hub

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Data Analytics Prescriptive Analytics Analytics

Data Visualization and Visual Analytics: Seeing the World of Data

Sisense

JUNE 30, 2020

Data is usually visualized in a pictorial or graphical form such as charts, graphs, lists, maps, and comprehensive dashboards that combine these multiple formats. Data visualization is used to make the consuming, interpreting, and understanding data as simple as possible, and to make it easier to derive insights from data.

Visualization

Visualization Analytics Dashboards Data-driven

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

AWS Big Data

OCTOBER 17, 2024

You can then run enhanced analysis on this DynamoDB data with the rich capabilities of Amazon Redshift, such as high-performance SQL, built-in machine learning (ML) and Spark integrations, materialized views (MV) with automatic and incremental refresh, data sharing, and the ability to join data across multiple data stores and data lakes.

Metrics

Metrics Dashboards Data Warehouse Statistics

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Cloudera Data Warehouse (CDW) running Hive has previously supported creating materialized views against Hive ACID source tables. release and the matching CDW Private Cloud Data Services release, Hive also supports creating, using, and rebuilding materialized views for Iceberg table format.

Snapshot

Snapshot Metadata Cost-Benefit Data Warehouse

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

To verify the data quality of the sources through statistically-relevant metrics, AWS Glue Data Quality runs data quality tasks on relevant AWS Glue tables. Foundations for a data lake with data governance controls and data quality checks.

Optimization

Optimization B2B Data Quality Sales

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

In this way, a data scientist benefits from business knowledge that they might not otherwise have access to. The catalog facilitates the synergy of the domain experts’ subject matter expertise with the data scientists statistical and coding expertise. Modern data catalogs surface a wide range of data asset types.

Metadata

Metadata Data Quality Statistics Data Science

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

The purpose of this step is to understand our data quality statistics at the table level as well as at the ruleset level. Use the queries in this section to analyze your data quality metrics and create an Athena view to use to build a QuickSight dashboard in the next step.

Data Quality

Data Quality Metrics Visualization Dashboards

New Thinking, Old Thinking and a Fairytale

Peter James Thomas

JUNE 20, 2019

Of course it can be argued that you can use statistics (and Google Trends in particular) to prove anything [1] , but I found the above figures striking. An obvious parallel in my world is to consider another business activity that reached peak popularity in the 2000s, Data Warehouse programmes [4]. Source: Google Trends.

Cost-Benefit

Cost-Benefit Data Warehouse Data Science Consulting

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Consider the problematic issue of manually mapping source system fields (typically source files or database tables) to target system fields (such as different tables in target data warehouses or data marts). Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Accelerating model velocity through Snowflake Java UDF integration

Domino Data Lab

JUNE 15, 2021

At a certain point, as the demand keeps growing, the data volumes rapidly increase. Data is no longer stored in CSV files, but in a dedicated, purpose built data lake / data warehouse. F-statistic: 599.7 The challenges surface once the company hits the scalability wall.

Modeling

Modeling Data Science Data-driven Data Warehouse

Business Intelligence Dashboard (BI Dashboard): Best Practices and Examples

FineReport

APRIL 11, 2023

Additionally, they provide tabs, pull-down menus, and other navigation features to assist in accessing data. Data Visualizations : Dashboards are configured with a variety of data visualizations such as line and bar charts, bubble charts, heat maps, and scatter plots to show different performance metrics and statistics.

Dashboards

Dashboards Business Intelligence Metrics Cost-Benefit

How Data Governance Supports Analytics

Alation

JANUARY 27, 2022

What Are the Top Data Challenges to Analytics? The proliferation of data sources means there is an increase in data volume that must be analyzed. Large volumes of data have led to the development of data lakes , data warehouses, and data management systems.

Data Governance

Data Governance Analytics Cost-Benefit Data-driven

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

Data and Analytics Governance: Whats Broken, and What We Need To Do To Fix It. Link Data to Business Outcomes. Does Data warehouse as a software tool will play role in future of Data & Analytics strategy? Data lakes don’t offer this nor should they. E.g. Data Lakes in Azure – as SaaS.

Data Analytics

Data Analytics Analytics Data-driven Finance

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. This is the essence of Convergent Evolution. In Closing.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Corinium

APRIL 25, 2019

And it’s become a hyper-competitive business, so enhancing customer service through data is critical for maintaining customer loyalty. For example auto insurance companies offering to capture real-time driving statistics from policy-holders’ cars to encourage and reward safe driving. In data-driven organizations, data is flowing.

Insurance

Insurance Risk IoT Data-driven

Breaking down Business Intelligence

BizAcuity

MAY 16, 2022

He went on to be the head brewer of Guinness and we thank him for not just great hand-crafted beers but subsequent research breakthroughs in statistical research as well. Data allowed Guinness to hold their market dominance for long. Data mining. That was in the 1900’s.

Business Intelligence

Business Intelligence Data mining Visualization Data Lake

Your Data Architecture Holds the Key to Unlocking AI’s Full Potential

CIO Business Intelligence

APRIL 4, 2023

Let’s look at the data architecture journey to understand why and how data lakehouses help to solve complexity, value and security. Traditionally, data warehouses have stored curated, structured data to support analytics and business intelligence, with fast, easy access to data. Want to learn more?

Data Architecture

Data Architecture Data Lake Data Warehouse Cost-Benefit

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Recap of Amazon Redshift key product announcements in 2024

Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Webinars

Trending Sources

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

Webinars

Choosing an open table format for your transactional data lake on AWS

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Top analytics announcements of AWS re:Invent 2024

2021 Gift Giving Guide for Data Nerds

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

What is Data Pipeline? A Detailed Explanation

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Top 15 data management platforms

What is a data architect? Skills, salaries, and how to become a data framework master

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization

Top 15 data management platforms available today

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Glue Data Quality is Generally Available

Four Topics That Should Be Top of Mind for SAP Partners

Quantitative and Qualitative Data: A Vital Combination

Demystifying Modern Data Platforms

Unleashing the power of Presto: The Uber case study

Successfully conduct a proof of concept in Amazon Redshift

Data science vs data analytics: Unpacking the differences

Data Visualization and Visual Analytics: Seeing the World of Data

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

Materialized Views in Hive for Iceberg Table Format

How AWS helped Altron Group accelerate their vision for optimized customer engagement

The Data Scientist’s Guide to the Data Catalog

Visualize data quality scores and metrics generated by AWS Glue Data Quality

New Thinking, Old Thinking and a Fairytale

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Accelerating model velocity through Snowflake Java UDF integration

Business Intelligence Dashboard (BI Dashboard): Best Practices and Examples

How Data Governance Supports Analytics

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Convergent Evolution

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Breaking down Business Intelligence

Your Data Architecture Holds the Key to Unlocking AI’s Full Potential

What is a Data Pipeline?

Stay Connected