Data Lake, Data Processing and Data Quality

Data Lake

Data Processing

Data Quality

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

cycle_end"', "sagemakedatalakeenvironment_sub_db", ctas_approach=False) A similar approach is used to connect to shared data from Amazon Redshift, which is also shared using Amazon DataZone. The data science and AI teams are able to explore and use new data sources as they become available through Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Metadata

Metadata Data Governance Data Quality Data-driven

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

“All of a sudden, you’re trying to give this data to somebody who’s not a data person,” he says, “and it’s really easy for them to draw erroneous or misleading insights from that data.” As more companies use the cloud and cloud-native development, normalizing data has become more complicated.

Data Lake

Data Lake Data-driven Finance Data Architecture

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization. AWS services like AWS Lake Formation in conjunction with Atlan help govern data access and policies.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). About the authors Sivaprasad Mahamkali is a Senior Streaming Data Engineer at AWS Professional Services.

Metadata

Metadata Data Lake Visualization Data Quality

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

You need to determine if you are going with an on-premise or cloud-hosted strategy. You will need to continually return to your business dashboard to make sure that it’s working, the data is accurate and it’s still answering the right questions in the most effective way. Ensure the quality of production.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. Data: the foundation of your foundation model Data quality matters. When objectionable data is identified, we remove it, retrain the model, and repeat.

Enterprise

Enterprise Technology Modeling Cost-Benefit

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Data has become an invaluable asset for businesses, offering critical insights to drive strategic decision-making and operational optimization. Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure.

Data Governance

Data Governance Publishing Data-driven Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI).

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents. There are several things you need to report attached to that number.”

Reporting

Reporting Data Quality Strategy Data-driven

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

We also have a blended architecture of deep process capabilities in our SAP system and decision-making capabilities in our Microsoft tools, and a great base of information in our integrated data hub, or data lake, which is all Microsoft-based. That’s what we’re running our AI and our machine learning against.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . Optimization Data lakehouse is the platform wherein the data assets reside. Application-based datasets — i.e. billing or contact center support systems .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Deep Thoughts on Data Flow with Alation & Trifacta

Alation

FEBRUARY 20, 2020

Data lakes, while useful in helping you to capture all of your data, are only the first step in extracting the value of that data. Additionally, because of the collaborative features found in the Alation Data Catalog, you also gain the ability for data to be easily shared, used and reused.

Data Lake

Data Lake Data Processing Data Quality Visualization

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Start where your data is Using your own enterprise data is the major differentiator from open access gen AI chat tools, so it makes sense to start with the provider already hosting your enterprise data. It’s the contextual information supporting the use of these tools,” Curran says.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern? Here’s an example.

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

It is also hard to know whether one can trust the data within a spreadsheet. And they rarely, if ever, host the most current data available. Sathish Raju, cofounder & CTO, Kloudio and senior director of engineering, Alation: This presents challenges for both business users and data teams.

Metadata

Metadata Enterprise Cost-Benefit Finance

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. Does Data warehouse as a software tool will play role in future of Data & Analytics strategy?

Data Analytics

Data Analytics Analytics Data-driven Finance

CIOs weigh where to place AI bets — and how to de-risk them

CIO Business Intelligence

MARCH 18, 2024

Though a multicloud environment, the agency has most of its cloud implementations hosted on Microsoft Azure, with some on AWS and some on ServiceNow’s 311 citizen information platform. For a typical project that will likely involve a Snowflake data lake hosted currently on Azure, Menon stresses that quality of data is critical. “AI

Risk

Risk Cost-Benefit Data Processing Testing

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. There are multiple tables related to customers and order data in the RDS database.

Metadata

Metadata Visualization Data-driven Data Lake

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

The quick and dirty definition of data mapping is the process of connecting different types of data from various data sources. Data mapping is a crucial step in data modeling and can help organizations achieve their business goals by enabling data integration, migration, transformation, and quality.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Prioritizing AI investments: Balancing short-term gains with long-term vision

CIO Business Intelligence

FEBRUARY 18, 2025

Start with data as an AI foundation Data quality is the first and most critical investment priority for any viable enterprise AI strategy. Data trust is simply not possible without data quality. A decision made with AI based on bad data is still the same bad decision without it.

Machine Learning

Machine Learning Data Quality Enterprise Sales

Data Leaders Brief

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

The essential check list for effective data democratization

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Governing data in relational databases using Amazon DataZone

Accomplish Agile Business Intelligence & Analytics For Your Business

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

HEMA accelerates their data governance journey with Amazon DataZone

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

CIOs rise to the ESG reporting challenge

Create an end-to-end data strategy for Customer 360 on AWS

How data literacy allows gen AI to drive productivity at Dow

Modern Data Architecture for Telecommunications

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Deep Thoughts on Data Flow with Alation & Trifacta

Why enterprise CIOs need to plan for Microsoft gen AI

Data Governance for Dummies: Your Questions, Answered

What Is Alation Connected Sheets? Q&A with the Creators

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

CIOs weigh where to place AI bets — and how to de-risk them

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

What is Data Mapping?

Prioritizing AI investments: Balancing short-term gains with long-term vision

Stay Connected