Blog - Data Leaders Brief

Strategies on Implementing a Data Catalog

Alation

MAY 10, 2021

Why Implement a Data Catalog? Nowadays, businesses have more data than they know what to do with. Cutting-edge enterprises use their data to glean insights, make decisions, and drive value. In other words, they have a system in place for a data-driven strategy. data headache.”. Data Headache.

Strategy

Strategy Enterprise Data Strategy Data Governance

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

To achieve this, they aimed to break down data silos and centralize data from various business units and countries into the BMW Cloud Data Hub (CDH). However, the initial version of CDH supported only coarse-grained access control to entire data assets, and hence it was not possible to scope access to data asset subsets.

Data Lake

Data Lake Sales Metadata Machine Learning

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Use case Amazon DataZone addresses your data sharing challenges and optimizes data availability.

Analytics

Analytics Visualization Data Governance Data-driven

Are You Content with Your Organization’s Content Strategy?

Rocket-Powered Data Science

JULY 6, 2021

What attributes of your organization’s strategies can you attribute to successful outcomes? If you include the title of this blog, you were just presented with 13 examples of heteronyms in the preceding paragraphs. Seriously now, what do these word games have to do with content strategy? Can you find them all?

Strategy

Strategy Machine Learning Metadata Knowledge Discovery

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Source: [link] Every business wants to get on board with ChatGPT, to implement it, operationalize it, and capitalize on it. Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. Industry-leading price-performance Amazon Redshift offers up to three times better price-performance than alternative cloud data warehouses. large instances.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

AWS Big Data

OCTOBER 14, 2024

Enterprise data is brought into data lakes and data warehouses to carry out analytical, reporting, and data science use cases using AWS analytical services like Amazon Athena , Amazon Redshift , Amazon EMR , and so on. Subsequently, we’ll explore strategies for overcoming these challenges.

Metadata

Metadata Data Lake Modeling Data Warehouse

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Testing and Data Observability. Download the 2021 DataOps Vendor Landscape here.

Testing

Testing Machine Learning Consulting Data Science

Dark Data: How to Find It and What to Do with It

Timo Elliott

JANUARY 6, 2022

Like the proverbial man looking for his keys under the streetlight , when it comes to enterprise data, if you only look at where the light is already shining, you can end up missing a lot. Remember that dark data is the data you have but don’t understand. So how do you find your dark data? Create a catalog.

IT

IT Metadata Data-driven Data Governance

Use open table format libraries on AWS Glue 5.0 for Apache Spark

AWS Big Data

DECEMBER 4, 2024

Open table formats are emerging in the rapidly evolving domain of big data management, fundamentally altering the landscape of data storage and analysis. By providing a standardized framework for data representation, open table formats break down data silos, enhance data quality, and accelerate analytics at scale.

Snapshot

Snapshot Metadata Data Lake Optimization

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

erwin

FEBRUARY 6, 2020

As organizations deal with managing ever more data, the need to automate data management becomes clear. Last week erwin issued its 2020 State of Data Governance and Automation (DGA) Report. One piece of the research that stuck with me is that 70% of respondents spend 10 or more hours per week on data-related activities.

Management

Management Data Governance Cost-Benefit Metadata

Why data observability is essential to AI governance

erwin

DECEMBER 9, 2024

When it comes to using AI and machine learning across your organization, there are many good reasons to provide your data and analytics community with an intelligent data foundation. For instance, Large Language Models (LLMs) are known to ultimately perform better when data is structured.

Metadata

Metadata Data Quality Sales Modeling

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. and later supports the Apache Iceberg framework for data lakes. Apache Iceberg is designed to support these features on cost-effective petabyte-scale data lakes on Amazon S3.

Data Lake

Data Lake Data Processing Metadata Snapshot

Migrating Data to the Cloud: Things You Need to Know

Alation

APRIL 19, 2021

The cloud supports this new workforce, connecting remote workers to vital data, no matter their location. Data Cloud Migration Challenges and Solutions. Cloud migration is the process of moving enterprise data and infrastructure from on premise to off premise. However, cloud data migration can be difficult.

Cost-Benefit

Cost-Benefit Strategy Enterprise Data Strategy

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. Use case Consider a large ecommerce company that relies heavily on data-driven insights to optimize its operations, marketing strategies, and customer experiences.

Analytics

Analytics Data-driven Data Integration Data Lake

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

As they continue to implement their Digital First strategy for speed, scale and the elimination of complexity, they are always seeking ways to innovate, modernize and also streamline data access control in the Cloud. Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Management Risk

What’s the Current State of Data Governance and Automation?

erwin

JANUARY 30, 2020

I’m excited to share the results of our new study with Dataversity that examines how data governance attitudes and practices continue to evolve. Defining Data Governance: What Is Data Governance? . 1 reason to implement data governance. Constructing a Digital Transformation Strategy: How Data Drives Digital.

Data Governance

Data Governance Metadata Cost-Benefit Digital Transformation

Data Governance Maturity and Tracking Progress

erwin

APRIL 16, 2021

Data governance is best defined as the strategic, ongoing and collaborative processes involved in managing data’s access, availability, usability, quality and security in line with established internal policies and relevant data regulations. Data Governance Is Business Transformation. Predictability. Synchronicity.

Data Governance

Data Governance Metadata Cost-Benefit Data-driven

How to establish lineage transparency for your machine learning initiatives

IBM Big Data Hub

MAY 20, 2024

Machine learning (ML) has become a critical component of many organizations’ digital transformation strategy. The answer lies in the data used to train these models and how that data is derived. The answer lies in the data used to train these models and how that data is derived.

Machine Learning

Machine Learning Modeling Metadata Strategy

Why I Joined Alation: A Former Customer’s Story

Alation

JULY 26, 2021

How do you initiate change within a system containing many thousands of people and millions of bytes of data? During my time as a data specialist at American Family Insurance, it became clear that we had to move away from the way things had been done in the past. So you can probably imagine: The company manages a lot of data.

Insurance

Insurance Digital Transformation Enterprise Data Governance

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? It turns out that Apache Impala scales down with data just as well as it scales up. The entire collection is available here.

Optimization

Optimization Metadata Statistics Cost-Benefit

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

AI users say that AI programming (66%) and data analysis (59%) are the most needed skills. Generative AI has been the biggest technology story of 2023. Almost everybody’s played with ChatGPT, Stable Diffusion, GitHub Copilot, or Midjourney. A few have even tried out Bard or Claude, or run LLaMA 1 on their laptop.

Enterprise

Enterprise Testing Modeling Reporting

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

CFM takes a scientific approach to finance, using quantitative and systematic techniques to develop the best investment strategies. Using social network data has also often been cited as a potential source of data to improve short-term investment decisions. It was first opened to investors in 1995.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Data Intelligence and Its Role in Combating Covid-19

erwin

MARCH 30, 2020

Data intelligence has a critical role to play in the supercomputing battle against Covid-19. While leveraging supercomputing power is a tremendous asset in our fight to combat this global pandemic, in order to deliver life-saving insights, you really have to understand what data you have and where it came from.

Metadata

Metadata IT Data Governance Data Quality

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

AWS Big Data

MAY 28, 2024

Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). AWS recently announced that Apache Flink is generally available for Amazon EMR on Amazon Elastic Kubernetes Service (EKS).

Data Processing

Data Processing Cost-Benefit Metadata Optimization

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

erwin

MAY 31, 2019

It’s time to consider data-driven enterprise architecture. The traditional approach to enterprise architecture – the analysis, design, planning and implementation of IT capabilities for the successful execution of enterprise strategy – seems to be missing something … data. That’s right.

Data-driven

Data-driven Enterprise Metadata Strategy

Four Steps to Building a Data-Driven Culture

erwin

JULY 22, 2020

Fostering organizational support for a data-driven culture might require a change in the organization’s culture. Recently, I co-hosted a webinar with our client E.ON , a global energy company that reinvented how it conducts business from branding to customer engagement – with data as the conduit. As an example, E.ON Avoiding Hurdles.

Data-driven

Data-driven Data Governance Digital Transformation Data Processing

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. When the catalog property s3.delete-enabled Amazon S3 deletes expired objects on your behalf. With the s3.delete.tags

Data Lake

Data Lake Snapshot Metadata Optimization

Data Intelligence in the Next Normal; Why, Who and When?

erwin

JANUARY 14, 2021

When the pandemic first hit, there was some negative impact on big data and analytics spending. Digital transformation was accelerated, and budgets for spending on big data and analytics increased. But data without intelligence is just data, and this is WHY data intelligence is required.

Digital Transformation

Digital Transformation Metadata Big Data Data-driven

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. However, throughout history, data services have held dominion over their customers’ data.

Data Lake

Data Lake Metadata Snapshot Analytics

In Times of Rapid Change, Business Process Modeling Becomes a Critical Tool

erwin

JULY 8, 2020

As part of their transformations, businesses are moving quickly from on premise to the cloud and therefore need to create business process models available to everyone within the organization so they understand what data is tied to what applications and what processes are in place. BPM for Regulatory Compliance.

Slice and Dice

Slice and Dice Modeling Digital Transformation Visualization

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

AWS Big Data

FEBRUARY 21, 2024

AWS Glue is a serverless data integration service that makes it straightforward to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Furthermore, each node (driver or worker) in an AWS Glue job requires an IP address assigned from the subnet.

Strategy

Strategy Visualization Management IT

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists. Key Design Goals .

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Building an effective data approach in a hybrid cloud world – part 2

Cloudera

AUGUST 24, 2020

In the last blog with Deloitte’s Marc Beierschoder, we talked about what the hybrid cloud is, why it can benefit a business and what the key blockers often are in implementation. When building your data foundation, how can you prioritize innovation within a hybrid cloud strategy? You can read it here. .

Strategy

Strategy Modeling Data-driven Metadata

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Cloudera

AUGUST 20, 2021

How CDP Enables and Accelerates Data Product Ecosystems. A multi-purpose platform focused on diverse value propositions for data products. As a result, CDP-enabled data products can meet multiple and varying functional and non-functional requirements that correspond to product attributes, each fulfilling specific customer needs.

Strategy

Strategy Cost-Benefit Visualization Data Warehouse

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

Building a data lake on Amazon Simple Storage Service (Amazon S3) provides numerous benefits for an organization. However, many use cases, like performing change data capture (CDC) from an upstream relational database to an Amazon S3-based data lake, require handling data at a record level.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Business Process Can Make or Break Data Governance

erwin

SEPTEMBER 19, 2019

Data governance isn’t a one-off project with a defined endpoint. Data governance, today, comes back to the ability to understand critical enterprise data within a business context, track its physical existence and lineage, and maximize its value while ensuring quality and security. Passing the Data Governance Ball.

Data Governance

Data Governance Cost-Benefit Enterprise Modeling

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

MARCH 23, 2023

We recently announced support for AWS Lake Formation fine-grained access control policies in Amazon Athena queries for data stored in any supported file format using table formats such as Apache Iceberg, Apache Hudi and Apache Hive. Lake Formation provides an authorization and governance layer on data stored in Amazon S3.

Interactive

Interactive Snapshot Data Lake Software

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

AWS Big Data

SEPTEMBER 26, 2024

In today’s rapidly evolving digital landscape, enterprises across regulated industries face a critical challenge as they navigate their digital transformation journeys: effectively managing and governing data from legacy systems that are being phased out or replaced.

Data Governance

Data Governance Enterprise Management Data Lake

Strategies on Implementing a Data Catalog

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

Trending Sources

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Webinars

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Are You Content with Your Organization’s Content Strategy?

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Recap of Amazon Redshift key product announcements in 2024

Scaling RISE with SAP data and AWS Glue

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

The DataOps Vendor Landscape, 2021

Dark Data: How to Find It and What to Do with It

Use open table format libraries on AWS Glue 5.0 for Apache Spark

The Benefits of Data Management Automation: 8 Tips to Automate Data Management

Why data observability is essential to AI governance

Use Apache Iceberg in a data lake to support incremental data processing

Migrating Data to the Cloud: Things You Need to Know

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

How BMO improved data security with Amazon Redshift and AWS Lake Formation

What’s the Current State of Data Governance and Automation?

Data Governance Maturity and Tracking Progress

How to establish lineage transparency for your machine learning initiatives

Why I Joined Alation: A Former Customer’s Story

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Generative AI in the Enterprise

Migrate an existing data lake to a transactional data lake using Apache Iceberg

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Data Intelligence and Its Role in Combating Covid-19

Introducing Amazon EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform

Data-Driven Enterprise Architecture: Why Enterprise Architects Need to Look at Data First

Four Steps to Building a Data-Driven Culture

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Data Intelligence in the Next Normal; Why, Who and When?

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

In Times of Rapid Change, Business Process Modeling Becomes a Critical Tool

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Introducing Apache Iceberg in Cloudera Data Platform

Building an effective data approach in a hybrid cloud world – part 2

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Business Process Can Make or Break Data Governance

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

Stay Connected