Data Lake, Marketing and Metadata

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code.

Metadata

Metadata Snapshot Cost-Benefit Optimization

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

To achieve this, they aimed to break down data silos and centralize data from various business units and countries into the BMW Cloud Data Hub (CDH). This led to inefficiencies in data governance and access control. For example, a global sales dataset is created by a team of data engineers with the data provider role.

Data Lake

Data Lake Sales Metadata Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

An extract, transform, and load (ETL) process using AWS Glue is triggered once a day to extract the required data and transform it into the required format and quality, following the data product principle of data mesh architectures. From here, the metadata is published to Amazon DataZone by using AWS Glue Data Catalog.

IoT

IoT Machine Learning Metadata Data-driven

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. About the Authors Dave Horne is a Sr.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

With improved access and collaboration, you’ll be able to create and securely share analytics and AI artifacts and bring data and AI products to market faster. This innovation drives an important change: you’ll no longer have to copy or move data between data lake and data warehouses.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake. What’s in a Data Lake? All the while, your marketing team is relying on marketing automation or CRM software they find the most productive.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. The Institutional Data & AI platform adopts a federated approach to data while centralizing the metadata to facilitate simpler discovery and sharing of data products.

Metadata

Metadata Data Governance Data Quality Data-driven

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. Amazon Athena is used to query, and explore the data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

The Increasing Importance of Open Table Formats

David Menninger's Analyst Perspectives

OCTOBER 31, 2024

I previously wrote about the importance of open table formats to the evolution of data lakes into data lakehouses. The concept of the data lake was initially proposed as a single environment where data could be combined from multiple sources to be stored and processed to enable analysis by multiple users for multiple purposes.

Data Lake

Data Lake Unstructured Data Data Warehouse Software

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

But even with the “need for speed” to market, new applications must be modeled and documented for compliance, transparency and stakeholder literacy. With all these diverse metadata sources, it is difficult to understand the complicated web they form much less get a simple visual flow of data lineage and impact analysis.

Data Governance

Data Governance Metadata Testing Data Lake

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

Lately, however, the term has been adopted by marketing teams, and many of the data management platforms vendors currently offer are tuned to their needs. In these instances, data feeds come largely from various advertising channels, and the reports they generate are designed to help marketers spend wisely.

Management

Management Advertising Data Lake Sales

Informatica’s new data management clouds target health, finance services

CIO Business Intelligence

MAY 24, 2022

These specific connectivity integrations are meant to allow healthcare providers to have a 360-degree view of all their important data and run analytics on them to take faster decisions and reduce time to market, Informatica said. Cloud Computing, Data Management, Financial Services Industry, Healthcare Industry

Finance

Finance Management Metadata Machine Learning

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights. Connection to Amazon Redshift is established by deploying a data stream in Salesforce Data Cloud.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

Then the data is consumed by SaaS-based computational tools, but it still sits within our organization and sits within the controls of our cloud-based solutions.” Much of Regeneron’s data, of course, is confidential. For that reason, many of its data tools — and even its data lake — were built in-house using AWS. “We

Data Lake

Data Lake IT Experimentation Data-driven

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

AWS-powered data lakes, supported by the unmatched availability of Amazon Simple Storage Service (Amazon S3), can handle the scale, agility, and flexibility required to combine different data and analytics approaches. It will never remove files that are still required by a non-expired snapshot.

Snapshot

Snapshot Data Lake Metadata Optimization

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

AWS Big Data

MARCH 5, 2025

Quick setup enables two default blueprints and creates the default environment profiles for the data lake and data warehouse default blueprints. The script creates a table with sample marketing and sales data. You will then publish the data assets from these data sources. AS wholesale_cost, 45.0

Analytics

Analytics Publishing Metadata Sales

Query AWS Glue Data Catalog views using Amazon Athena and Amazon Redshift

AWS Big Data

AUGUST 8, 2024

Today’s data lakes are expanding across lines of business operating in diverse landscapes and using various engines to process and analyze data. Traditionally, SQL views have been used to define and share filtered data sets that meet the requirements of these lines of business for easier consumption.

Data Lake

Data Lake Sales Marketing Big Data

Salesforce readies Einstein Copilot to unleash generative AI across its offerings

CIO Business Intelligence

SEPTEMBER 12, 2023

What’s changed since then, apart from Shih’s title, is Salesforce has rearchitected its underlying Data Cloud and Einstein AI framework to use an improved metadata framework, creating a new platform it calls Einstein 1. This means we now have a hyperscale data engine directly inside of Salesforce to connect all of your data,” he said.

IT

IT Metadata Data Lake Cost-Benefit

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

Zero-ETL integration also enables you to load and analyze data from multiple operational database clusters in a new or existing Amazon Redshift instance to derive holistic insights across many applications. Use one click to access your data lake tables using auto-mounted AWS Glue data catalogs on Amazon Redshift for a simplified experience.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Deploying a DMP can be a great way for companies to navigate a business world dominated by data, and these platforms have become the lifeblood of digital marketing today. In these instances, data feeds come largely from advertising channels, and the reports they generate are designed to help marketers spend wisely.

Management

Management Advertising Data Lake Sales

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices. Without those templates, it’s hard to add such information after the fact.”

Analytics

Analytics Data Lake Metadata Cost-Benefit

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

AWS Big Data

APRIL 6, 2023

In this post, Morningstar’s Data Lake Team Leads discuss how they utilized tag-based access control in their data lake with AWS Lake Formation and enabled similar controls in Amazon Redshift. This way, our existing data lake consumers could easily transition to Amazon Redshift.

Data Warehouse

Data Warehouse Data Lake Management Data-driven

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Major market indexes, such as S&P 500, are subject to periodic inclusions and exclusions for reasons beyond the scope of this post (for an example, refer to CoStar Group, Invitation Homes Set to Join S&P 500; Others to Join S&P 100, S&P MidCap 400, and S&P SmallCap 600 ).

Snapshot

Snapshot Data Lake Testing Strategy

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

It provides personal and commercial banking, global markets, and investment banking services to 13 million customers. As they continue to implement their Digital First strategy for speed, scale and the elimination of complexity, they are always seeking ways to innovate, modernize and also streamline data access control in the Cloud.

Data Lake

Data Lake Data Warehouse Management Risk

Operational Database Security – Part 2

Cloudera

SEPTEMBER 23, 2020

Access audits are mastered centrally in Apache Ranger which provides comprehensive non-repudiable audit log for every access event to every resource with rich access event metadata such as: IP. Both fine-grained access control of database objects and access to metadata is provided. Sensitive data identification.

Data Lake

Data Lake Metadata IoT Enterprise

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

smava believes in and takes advantage of data-driven decisions in order to become the market leader. The Data Platform team is responsible for supporting data-driven decisions at smava by providing data products across all departments and branches of the company.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Introducing data products in Amazon DataZone: Simplify discovery and subscription with business use case based grouping

AWS Big Data

AUGUST 5, 2024

We are excited to announce a new feature in Amazon DataZone that allows data producers to group data assets into well-defined, self-contained packages (data products) tailored for specific business use cases. This eliminates the need for multiple permissions, speeding up the initiation of data analysis.

Metadata

Metadata Sales Data Lake Publishing

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

Most innovation platforms make you rip the data out of your existing applications and move it to some another environment—a data warehouse, or data lake, or data lake house or data cloud—before you can do any innovation.

Data Lake

Data Lake Recreation/Entertainment Data Warehouse Metadata

Get started with the new Amazon DataZone enhancements for Amazon Redshift

AWS Big Data

JULY 29, 2024

Amazon DataZone is a powerful data management service that empowers data engineers, data scientists, product managers, analysts, and business users to seamlessly catalog, discover, analyze, and govern data across organizational boundaries, AWS accounts, data lakes, and data warehouses.

Data Warehouse

Data Warehouse Sales Metadata Publishing

How Fujitsu implemented a global data mesh architecture and democratized data

AWS Big Data

MAY 1, 2024

To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market. Responsibilities include: Load raw data from the data source system at the appropriate frequency.

Dashboards

Dashboards Publishing Data-driven Cost-Benefit

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Our position as a Visionary in the Gartner Magic Quadrant for Cloud DBMS market speaks to our product excellence and market-leading-vision of a hybrid, multifunction integrated platform with built-in security and governance. . We were named a Visionary having a strong market understanding and a robust roadmap for the cloud DBMS market.

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) models with Redshift Serverless. Choose Save changes.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

6 BI challenges IT teams must address

CIO Business Intelligence

DECEMBER 21, 2022

“By 2025, it’s estimated we’ll have 463 million terabytes of data created every day,” says Lisa Thee, data for good sector lead at Launch Consulting Group in Seattle. BI software helps companies do just that by shepherding the right data into analytical reports and visualizations so that users can make informed decisions.

IT

IT Business Intelligence Sales Key Performance Indicator

Build a high-performance quant research platform with Apache Iceberg

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Choosing an open table format for your transactional data lake on AWS

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

How EUROGATE established a data mesh architecture using Amazon DataZone

Enrich your serverless data lake with Amazon Bedrock

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Data Lakes: What Are They and Who Needs Them?

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

The Increasing Importance of Open Table Formats

Salesforce debuts Zero Copy Partner Network to ease data integration

How to modernize data lakes with a data lakehouse architecture

Doing Cloud Migration and Data Governance Right the First Time

Top 15 data management platforms

Informatica’s new data management clouds target health, finance services

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Regeneron turns to IT to accelerate drug discovery

Unlock data across organizational boundaries using Amazon DataZone – now generally available

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Cross-account data collaboration with Amazon DataZone and AWS analytical tools

Query AWS Glue Data Catalog views using Amazon Athena and Amazon Redshift

Salesforce readies Einstein Copilot to unleash generative AI across its offerings

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Top 15 data management platforms available today

Lay the groundwork now for advanced analytics and AI

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

How BMO improved data security with Amazon Redshift and AWS Lake Formation

Operational Database Security – Part 2

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Introducing data products in Amazon DataZone: Simplify discovery and subscription with business use case based grouping

Putting the Business Back Into Business Innovation

Get started with the new Amazon DataZone enhancements for Amazon Redshift

How Fujitsu implemented a global data mesh architecture and democratized data

What is an open data lakehouse and why you should care?

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Data Mesh vs. Data Fabric: A Love Story

6 BI challenges IT teams must address

Stay Connected