Data Lake, Data Quality, Data Warehouse and Modeling

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning.

Data Quality

Data Quality Statistics Data Lake Visualization

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Webinars

The AI Superhero Approach to Product Management

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

MORE WEBINARS

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored. From the humble database through to data warehouses , data stores have grown both in scale and complexity to keep pace with the businesses they serve, and the data analysis now required to remain competitive.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Business units access clean, standardized data.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

Hence the drive to provide ML as a service to the Data & Tech team’s internal customers. All they would have to do is just build their model and run with it,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. The offensive side?

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

The aim was to bolster their analytical capabilities and improve data accessibility while ensuring a quick time to market and high data quality, all with low total cost of ownership (TCO) and no need for additional tools or licenses. AWS Glue – AWS Glue is used to load files into Amazon Redshift through the S3 data lake.

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

Most innovation platforms make you rip the data out of your existing applications and move it to some another environment—a data warehouse, or data lake, or data lake house or data cloud—before you can do any innovation. Business Context. Business Content.

Data Lake

Data Lake Recreation/Entertainment Data Warehouse Metadata

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless.

Analytics

Analytics IT Data Lake Visualization

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. After all, Alex may not be aware of all the data available to her.

Metadata

Metadata Data Quality Data-driven Data Governance

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Even back then, these were used for activities such as Analytics , Dashboards , Statistical Modelling , Data Mining and Advanced Visualisation. This is the essence of Convergent Evolution. In Closing.

Data Lake

Data Lake Data Warehouse Data mining Statistics

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Data mesh solves this by promoting data autonomy, allowing users to make decisions about domains without a centralized gatekeeper. It also improves development velocity with better data governance and access with improved data quality aligned with business needs.

Metadata

Metadata Data-driven Data Quality Data Architecture

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Unless, of course, the rest of their data also resides in the Google Cloud. In this post we showcase how we used AWS Glue to move siloed digital analytics data, with inconsistent arrival times, to AWS S3 (our Data Lake) and our central data warehouse (DWH), Snowflake.

Analytics

Analytics Data Lake Testing Optimization

Better, faster decisions: Why businesses thrive on real-time data

CIO Business Intelligence

SEPTEMBER 8, 2022

In Foundry’s 2022 Data & Analytics Study , 88% of IT decision-makers agree that data collection and analysis have the potential to fundamentally change their business models over the next three years. The ability to pivot quickly to address rapidly changing customer or market demands is driving the need for real-time data.

Cost-Benefit

Cost-Benefit Internet of Things Data-driven Data Lake

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

It’s only when companies take their first stab at manually cataloging and documenting operational systems, processes and the associated data, both at rest and in motion, that they realize how time-consuming the entire data prepping and mapping effort is, and why that work is sure to be compounded by human error and data quality issues.

Data Governance

Data Governance Risk Metadata Management

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

With the size of data and dropping attention spans of online users, digital personalization has become one of the top priorities for companies’ business models. As such, most large financial organizations have moved their data to a data lake or a data warehouse to understand and manage financial risk in one place.

Enterprise

Enterprise Knowledge Discovery Risk Machine Learning

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Risk Management

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Building a Beautiful Data Lakehouse

CIO Business Intelligence

MARCH 9, 2022

But the data repository options that have been around for a while tend to fall short in their ability to serve as the foundation for big data analytics powered by AI. Traditional data warehouses, for example, support datasets from multiple sources but require a consistent data structure. Meet the data lakehouse.

Data Lake

Data Lake Unstructured Data Data Warehouse Data Quality

Data Management Predictions for 2024: Five Trends

Data Virtualization

MARCH 7, 2024

Reading Time: 3 minutes As we move deeper into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies. One thing is clear; if data-centric organizations want to succeed in.

Management

Management Data Integration Strategy Data Lake

Data Management Predictions for 2024: Five Trends

Data Virtualization

JANUARY 25, 2024

Reading Time: 3 minutes As we head into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies.

Management

Management Data Integration Strategy Data Lake

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. The response times for these data sources are critical to our key stakeholders.

Optimization

Optimization Forecasting Data Lake Metadata

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

AWS Big Data

JULY 12, 2023

With a wide array of data sources, including transactional databases, log files, and event streams, you need a simple-to-use solution capable of efficiently ingesting and transforming large volumes of data in real time, ensuring data cleanliness, structural integrity, and data team collaboration.

Data Warehouse

Data Warehouse Modeling Dashboards Data Lake

Modeling, Modernization and Automation

BI-Survey

APRIL 27, 2023

While most continue to struggle with data quality issues and cumbersome manual processes, best-in-class companies are making improvements with commercial automation tools. The data vault has strong adherents among best-in-class companies, even though its usage lags the alternative approaches of third-normal-form and star schema.

Modeling

Modeling Data Warehouse Data Quality Business Driver

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

Machine Learning Data pipelines feed all the necessary data into machine learning algorithms, thereby making this branch of Artificial Intelligence (AI) possible. Data Quality When using a data pipeline, data consistency, quality, and reliability are often greatly improved.

Data Lake

Data Lake Data Governance Data Warehouse Data Processing

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Accelerating model velocity through Snowflake Java UDF integration

Domino Data Lab

JUNE 15, 2021

Over the next decade, the companies that will beat competitors will be “model-driven” businesses. These companies often undertake large data science efforts in order to shift from “data-driven” to “model-driven” operations, and to provide model-underpinned insights to the business. anomaly detection).

Modeling

Modeling Data Science Data-driven Data Warehouse

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

To reflect the needs of their customers spread across different geographies and industries, Altron has organized their operating model across individual Operating Companies (OpCos). Data quality for account and customer data – Altron wanted to enable data quality and data governance best practices.

Optimization

Optimization B2B Data Quality Sales

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

It proposes a technological, architectural, and organizational approach to solving data management problems by breaking up the monolithic data platform and de-centralizing data management across different domain teams and services. Some examples of data products are data sets, tables, machine learning models, and APIs.

IT

IT Metadata Data Quality Data Lake

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

What is Business Intelligence Consulting

BizAcuity

APRIL 1, 2023

Thanks to the recent technological innovations and circumstances to their rapid adoption, having a data warehouse has become quite common in various enterprises across sectors. This also applies to businesses that may not have a data warehouse and operate with the help of a backend database system.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

Across the country, data scientists have an unemployment rate of 2% and command an average salary of nearly $100,000. As they attempt to put machine learning models into production, data science teams encounter many of the same hurdles that plagued data analytics teams in years past: Finding trusted, valuable data is time-consuming.

Metadata

Metadata Data Quality Statistics Data Science

What is Business Intelligence Consulting

BizAcuity

JANUARY 31, 2023

Thanks to the recent technological innovations and circumstances to their rapid adoption, having a data warehouse has become quite common in various enterprises across sectors. This also applies to businesses that may not have a data warehouse and operate with the help of a backend database system.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

An IT-managed BI delivery model, Goris explains, requires a lot of effort and process, which wouldn’t work for some parts of the business. Lionel LLC, for instance, the American designer and importer of toy trains and model railroads based in Concord, N.C., What Gartner is writing about is the concept of a data fabric.”

IT

IT Business Intelligence Sales Key Performance Indicator

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Since its uniquely metadata-driven, the abstraction layer of a data fabric makes it easier to model, integrate and query any data sources, build data pipelines, and integrate data in real-time. Data fabric vs data mesh: How does a data fabric relate to a data mesh?

Management

Management Metadata Data Architecture Data Lake

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

Machine Learning Data pipelines feed all the necessary data into machine learning algorithms, thereby making this branch of Artificial Intelligence (AI) possible. Data Quality When using a data pipeline, data consistency, quality, and reliability are often greatly improved.

Data Lake

Data Lake Data Governance Data Warehouse Data Processing

AWS Glue Data Quality is Generally Available

Data architecture strategy for data quality

Webinars

Trending Sources

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Webinars

Data Lakes: What Are They and Who Needs Them?

Data governance in the age of generative AI

What is a Data Mesh?

How Knowledge Graphs Power Data Mesh and Data Fabric

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Lake Formation 2022 year in review

Straumann Group is transforming dentistry with data, AI

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Putting the Business Back Into Business Innovation

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Five benefits of a data catalog

Case study: Policy Enforcement Automation With Semantics

Convergent Evolution

What is Data Mesh?

How SumUp made digital analytics more accessible using AWS Glue

Better, faster decisions: Why businesses thrive on real-time data

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How BMO improved data security with Amazon Redshift and AWS Lake Formation

Governing data in relational databases using Amazon DataZone

Building a Beautiful Data Lakehouse

Data Management Predictions for 2024: Five Trends

Data Management Predictions for 2024: Five Trends

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

Modeling, Modernization and Automation

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Create an end-to-end data strategy for Customer 360 on AWS

Accelerating model velocity through Snowflake Java UDF integration

How AWS helped Altron Group accelerate their vision for optimized customer engagement

Data Mesh 101: What it is and Why You Should Care

Data Strategies for Getting Greater Business Value from Distributed Data

What is Business Intelligence Consulting

The Data Scientist’s Guide to the Data Catalog

What is Business Intelligence Consulting

6 BI challenges IT teams must address

Augmented data management: Data fabric versus data mesh

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Stay Connected