Data Lake, Data Quality and Data Transformation

Data Lake

Data Quality

Data Transformation

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

AWS Big Data

NOVEMBER 22, 2024

The need for streamlined data transformations As organizations increasingly adopt cloud-based data lakes and warehouses, the demand for efficient data transformation tools has grown. Using Athena and the dbt adapter, you can transform raw data in Amazon S3 into well-structured tables suitable for analytics.

Data Lake

Data Lake Data Warehouse Cost-Benefit Data Transformation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment.

Data Quality

Data Quality Testing Data Lake Data Integration

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Straumann Group is transforming dentistry with data, AI

CIO Business Intelligence

FEBRUARY 16, 2023

“Digitizing was our first stake at the table in our data journey,” he says. That step, primarily undertaken by developers and data architects, established data governance and data integration. That step, primarily undertaken by developers and data architects, established data governance and data integration.

Unstructured Data

Unstructured Data Data Lake Prescriptive Analytics Data Warehouse

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

NOVEMBER 9, 2023

In this post, we delve into a case study for a retail use case, exploring how the Data Build Tool (dbt) was used effectively within an AWS environment to build a high-performing, efficient, and modern data platform. It does this by helping teams handle the T in ETL (extract, transform, and load) processes. usr/local/airflow/.local/bin/dbt

Data Warehouse

Data Warehouse Testing Data Quality Reporting

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Data engineers are crucial for schema conversion and data transformation, and DBAs can handle cluster configuration and workload monitoring. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless. Select the secret you created, and on the Actions menu, choose Delete.

Analytics

Analytics IT Data Lake Visualization

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

It’s common to ingest multiple data sources into Amazon Redshift to perform analytics. Often, each data source will have its own processes of creating and maintaining data, which can lead to data quality challenges within and across sources. Answering questions as simple as “How many unique customers do we have?”

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Additionally, the scale is significant because the multi-tenant data sources provide a continuous stream of testing activity, and our users require quick data refreshes as well as historical context for up to a decade due to compliance and regulatory demands. Finally, data integrity is of paramount importance.

Software

Software Data Lake Testing Cost-Benefit

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. So questions linger about whether transformed data can be trusted.

Data Governance

Data Governance Risk Metadata Management

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. By using DataOps, organizations can improve. Query> When do DataOps?

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

As the latest iteration in this pursuit of high-quality data sharing, DataOps combines a range of disciplines. It synthesizes all we’ve learned about agile, data quality , and ETL/ELT. This produces end-to-end lineage so business and technology users alike can understand the state of a data lake and/or lake house.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. But there are only so many data engineers available in the market today; there’s a big skills shortage. Let’s take data privacy as an example.

Metadata

Metadata Data Warehouse Data Quality Data Lake

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

This is especially beneficial when teams need to increase data product velocity with trust and data quality, reduce communication costs, and help data solutions align with business objectives. In most enterprises, data is needed and produced by many business units but owned and trusted by no one.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Smarten

AUGUST 4, 2023

If your team has easy-to-use tools and features, you are much more likely to experience the user adoption you want and to improve data literacy and data democratization across the organization. Machine learning capability determines the best techniques, and the best fit transformations for data so that the outcome is clear and concise.

Data Lake

Data Lake Machine Learning Data Integration Data Quality

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

“If each tool tells a different story because it has different data, we won’t have alignment within the business on what this data means.” The company also used the opportunity to reimagine its data pipeline and architecture.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

The quick and dirty definition of data mapping is the process of connecting different types of data from various data sources. Data mapping is a crucial step in data modeling and can help organizations achieve their business goals by enabling data integration, migration, transformation, and quality.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Data Leaders Brief

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Straumann Group is transforming dentistry with data, AI

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

An AI Chat Bot Wrote This Blog Post …

Turnkey Cloud DataOps: Solution from Alation and Accenture

Tackling AI’s data challenges with IBM databases on AWS

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

What is a Data Pipeline?

What is Data Mapping?

Stay Connected

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Navigating the Chaos of Unruly Data: Solutions for Data Teams

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Straumann Group is transforming dentistry with data, AI

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

An AI Chat Bot Wrote This Blog Post …

Turnkey Cloud DataOps: Solution from Alation and Accenture

Tackling AI’s data challenges with IBM databases on AWS

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Self-Serve Data Prep CAN Be Easy AND Sophisticated!

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

What is a Data Pipeline?

What is Data Mapping?

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift