Blog, Data Lake and Data-driven - Data Leaders Brief

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

A Drug Launch Case Study in the Amazing Efficiency of a Data Team Using DataOps How a Small Team Powered the Multi-Billion Dollar Acquisition of a Pharma Startup When launching a groundbreaking pharmaceutical product, the stakes and the rewards couldnt be higher. data engineers delivered over 100 lines of code and 1.5

Data Quality

Data Quality Data Lake Testing Statistics

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

2021 Gift Giving Guide for Data Nerds

DataKitchen

DECEMBER 7, 2021

Back by popular demand, we’ve updated our data nerd Gift Giving Guide to cap off 2021. We’ve kept some classics and added some new titles that are sure to put a smile on your data nerd’s face. Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI, by Randy Bean.

Data-driven

Data-driven Data Governance Big Data Data Science

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

ChatGPT> DataOps, or data operations, is a set of practices and technologies that organizations use to improve the speed, quality, and reliability of their data analytics processes. The goal of DataOps is to help organizations make better use of their data to drive business decisions and improve outcomes.

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Using Amazon DataZone lets us avoid building and maintaining an in-house platform, allowing our developers to focus on tailored solutions.

Analytics

Analytics Visualization Data Governance Data-driven

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. In addition, organizations rely on an increasingly diverse array of digital systems, data fragmentation has become a significant challenge.

Data Integration

Data Integration Data Lake Statistics Data-driven

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. DataOps helps the data mesh deliver greater business agility by enabling decentralized domains to work in concert. . But first, let’s define the data mesh design pattern.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

The Analytics specialty practice of AWS Professional Services (AWS ProServe) helps customers across the globe with modern data architecture implementations on the AWS Cloud. In this post, we discuss a common use case in relation to operational data processing and the solution we built using Apache Hudi and AWS Glue.

Data Lake

Data Lake Data Processing Metadata Snapshot

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. Many in the data industry recognize the serious impact of AI bias and seek to take active steps to mitigate it. Data Gets Meshier. Companies Commit to Remote.

Testing

Testing Data Lake Data Architecture Manufacturing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Q: Is data modeling cool again? In today’s fast-paced digital landscape, data reigns supreme. The data-driven enterprise relies on accurate, accessible, and actionable information to make strategic decisions and drive innovation. A: It always was and is getting cooler!!

Data-driven

Data-driven Modeling Enterprise Structured Data

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

AWS Big Data

NOVEMBER 22, 2024

Organizations run millions of Apache Spark applications each month on AWS, moving, processing, and preparing data for analytics and machine learning. Data practitioners need to upgrade to the latest Spark releases to benefit from performance improvements, new features, bug fixes, and security enhancements. Original code (Glue 2.0)

Cost-Benefit

Cost-Benefit Data-driven Software Testing

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

Analytics are prone to frequent data errors and deployment of analytics is slow and laborious. When internal resources fall short, companies outsource data engineering and analytics. There’s no shortage of consultants who will promise to manage the end-to-end lifecycle of data from integration to transformation to visualization. .

Consulting

Consulting Testing Data Lake Data Quality

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data.

Analytics

Analytics Sales Testing Cost-Benefit

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

This blog post is co-written with Ori Nakar from Imperva. Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. Imperva’s data lake has a few dozen different datasets, in the scale of petabytes.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

AWS Big Data

NOVEMBER 14, 2024

The landscape of big data management has been transformed by the rising popularity of open table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. These formats, designed to address the limitations of traditional data storage systems, have become essential in modern data architectures.

Metadata

Metadata Data Warehouse Big Data Data Lake

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

It’s necessary to say that these processes are recurrent and require continuous evolution of reports, online data visualization , dashboards, and new functionalities to adapt current processes and develop new ones. Discover the available data sources. ” “What do our users actually need?”. Determine BI funding and support.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

How Salesforce optimized their detection and response platform using AWS managed services

AWS Big Data

APRIL 18, 2024

This is a guest blog post co-authored with Atul Khare and Bhupender Panwar from Salesforce. The platform ingests more than 1 PB of data per day, more than 10 million events per second, and more than 200 different log types. The data lake consumers then use Apache Presto running on Amazon EMR cluster to perform one-time queries.

Optimization

Optimization Data Lake Management Key Performance Indicator

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

Data tables from IT and other data sources require a large amount of repetitive, manual work to be used in analytics. The data analytics function in large enterprises is generally distributed across departments and roles. Figure 1: Data analytics challenge – distributed teams must deliver value in collaboration.

Business Analytics

Business Analytics Analytics Testing Dashboards

IBM showcases Gen AI-driven Concert to monitor and manage enterprise applications

CIO Business Intelligence

MAY 21, 2024

IBM has showcased its new generative AI -driven Concert offering that is designed to help enterprises monitor and manage their applications. Showcased at the ongoing annual Think conference, IBM Concert will be generally available in June and is underpinned by the watsonx platform.

Enterprise

Enterprise Management Data Lake Risk

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. This need has generated a market opportunity for a universal data distribution service. Why does every organization need it when using a modern data stack?

Enterprise

Enterprise Data Lake Data Collection Data-driven

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

So if you’re going to move from your data from on-premise legacy data stores and warehouse systems to the cloud, you should do it right the first time. And as you make this transition, you need to understand what data you have, know where it is located, and govern it along the way. Then you must bulk load the legacy data.

Data Governance

Data Governance Metadata Testing Data Lake

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

erwin

AUGUST 15, 2022

This blog is based upon a recent webcast that can be viewed here. For NoSQL, data lakes, and data lake houses—data modeling of both structured and unstructured data is somewhat novel and thorny. As with the part 1 and part 2 of this data modeling blog series, the cloud is not nirvana.

Data Lake

Data Lake Modeling Unstructured Data Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes. Iterations of the lakehouse.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

DataKitchen

SEPTEMBER 18, 2024

A DataOps Approach to Data Quality The Growing Complexity of Data Quality Data quality issues are widespread, affecting organizations across industries, from manufacturing to healthcare and financial services. 73% of data practitioners do not trust their data (IDC). The challenge is not simply a technical one.

Scorecard

Scorecard Data Quality Measurement Testing

Munich Re Launches Enterprise-Wide Data-Driven Platform for Analytics

Alation

FEBRUARY 13, 2020

. - Andreas Kohlmaier, Head of Data Engineering at Munich Re 1. --> Ron Powell, independent analyst and industry expert for the BeyeNETWORK and executive producer of The World Transformed FastForward Series, interviews Andreas Kohlmaier, Head of Data Engineering at Munich Re. Sometimes they didn’t really know about each other.

Data-driven

Data-driven Data Lake Enterprise Analytics

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

In the final part of this three-part series, we’ll explore ho w data mesh bolsters performance and helps organizations and data teams work more effectively. Usually, organizations will combine different domain topologies, depending on the trade-offs, and choose to focus on specific aspects of data mesh.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. Use case Consider a large ecommerce company that relies heavily on data-driven insights to optimize its operations, marketing strategies, and customer experiences.

Analytics

Analytics Data-driven Data Integration Data Lake

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

Implementing the right data strategy spurs innovation and outstanding business outcomes by recognizing data as a critical asset that provides insights for better and more informed decision-making. By taking advantage of data, enterprises can shape business decisions, minimize risk for stakeholders, and gain competitive advantage.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it. Separates control and data plane enabling high performance.

Data Lake

Data Lake Cost-Benefit Metadata Big Data

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

AWS Big Data

JULY 25, 2024

This blog post is co-written with Raj Samineni from ATPCO. In today’s data-driven world, companies across industries recognize the immense value of data in making decisions, driving innovation, and building new products to serve their customers.

Data Lake

Data Lake Metadata Sales Publishing

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

Drug Launch Case Study: Amazing Efficiency Using DataOps

Recap of Amazon Redshift key product announcements in 2024

Webinars

Trending Sources

2021 Gift Giving Guide for Data Nerds

Webinars

Data Lakes on Cloud & it’s Usage in Healthcare

An AI Chat Bot Wrote This Blog Post …

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

What is a Data Mesh?

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Centralize Your Data Processes With a DataOps Process Hub

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Eight Top DataOps Trends for 2022

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Fire Your Super-Smart Data Consultants with DataOps

How DataOps is Transforming Commercial Pharma Analytics

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

Use open table format libraries on AWS Glue 5.0 for Apache Spark

Scaling RISE with SAP data and AWS Glue

Expand data access through Apache Iceberg using Delta Lake UniForm on AWS

Accomplish Agile Business Intelligence & Analytics For Your Business

How Salesforce optimized their detection and response platform using AWS managed services

DataOps For Business Analytics Teams

IBM showcases Gen AI-driven Concert to monitor and manage enterprise applications

Moving Enterprise Data From Anywhere to Any System Made Easy

Doing Cloud Migration and Data Governance Right the First Time

Data Modeling 301 for the cloud: data lake and NoSQL data modeling and design

The Future of the Data Lakehouse – Open

Data Quality Power Moves: Scorecards & Data Checks for Organizational Impact

Munich Re Launches Enterprise-Wide Data-Driven Platform for Analytics

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Overcome these six data consumption challenges for a more data-driven enterprise

Apache Ozone and Dense Data Nodes

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Stay Connected