Data Lake, Data Processing and Publishing

Data Lake

Data Processing

Publishing

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To incorporate this third-party data, AWS Data Exchange is the logical choice.

Sales

Sales Data-driven Data Processing Key Performance Indicator

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Plug-and-play integration : A seamless, plug-and-play integration between data producers and consumers should facilitate rapid use of new data sets and enable quick proof of concepts, such as in the data science teams. As part of the required data, CHE data is shared using Amazon DataZone.

IoT

IoT Machine Learning Metadata Data-driven

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

On your project, in the navigation pane, choose Data. For Add data source , choose Add connection. For Host , enter your host name of your Aurora PostgreSQL database cluster. format(connection_properties["HOST"],connection_properties["PORT"],connection_properties["DATABASE"]) df.write.format("jdbc").option("url",

Visualization

Visualization Data Processing Testing Publishing

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

From our unique vantage point in the evolution toward DataOps automation, we publish an annual prediction of trends that most deeply impact the DataOps enterprise software industry as a whole. For example, managing ordered data dependencies, inter-domain communication, shared infrastructure, and incoherent workflows.

Testing

Testing Data Lake Data Architecture Manufacturing

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. For more information, see Changing the default settings for your data lake.

Data Lake

Data Lake Visualization Dashboards Insurance

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account. With Security Lake, you can get a more complete understanding of your security data across your entire organization.

Publishing

Publishing Dashboards Visualization Management

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. This strategy supports each division’s autonomy to implement their own data catalogs and decide which data products to publish to the group-level catalog.

Metadata

Metadata Data Governance Data Quality Data-driven

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen.

Data Lake

Data Lake Enterprise Management Metadata

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Typically, you have multiple accounts to manage and run resources for your data pipeline. Publish the QuickSight dashboard When the analysis is ready, complete the following steps to publish the dashboard: Choose PUBLISH. Select Publish new dashboard as , and enter GlueObservabilityDashboard.

Metrics

Metrics Visualization Dashboards Publishing

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure. Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog.

Data Governance

Data Governance Publishing Data-driven Metadata

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. Well, let’s find out. Artificial intelligence (AI). SNS supports push messaging to mobile devices. Easy to use.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

You need to determine if you are going with an on-premise or cloud-hosted strategy. It allows you to easily publish reports: the whole point of agile is to get the product out there. During this stage, you are also researching and vetting which online business intelligence software to use. Construction Iterations.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

We also celebrated the first-ever winner of the Data Impact Achievement Award — a new award category that recognizes one customer who has consistently achieved transformation across their business, pursuing a diverse set of use cases and creating a culture of data-driven innovation. . Data Impact Achievement Award.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines. Data quality at rest focuses on validating the data stored in data lakes, databases, or data warehouses. It ensures that the data meets specific quality standards before it is consumed.

Data Quality

Data Quality Data Lake Visualization Data-driven

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Metadata

Metadata Cost-Benefit Enterprise Interactive

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

AWS Big Data

JULY 14, 2023

This enabled producers to publish data products that were curated and authoritative assets for their domain. For example, the AR team created and governed their cash application dataset in their AWS account AWS Glue Data Catalog. Data source locations are registered with Lake Formation.

Finance

Finance Metadata Big Data Recreation/Entertainment

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. This could be your data lake or application S3 bucket.

Reporting

Reporting Data Lake Management Optimization

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

AWS Big Data

FEBRUARY 16, 2023

In QuickSight, you analyze and visualize your data in analyses. When you’re finished, you can publish your analysis as a dashboard to share with others in your organization. Create an Amazon Redshift data source in AWS CloudFormation In this step, we add the AWS::QuickSight::DataSource section of the CloudFormation template.

Data Warehouse

Data Warehouse Sales Visualization Data Processing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

We can determine the following are needed: An open data format ingestion architecture processing the source dataset and refining the data in the S3 data lake. This requires a dedicated team of 3–7 members building a serverless data lake for all data sources. Vijay Bagur is a Sr.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Data Lake

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. The job runs in the target account.

Metadata

Metadata Data Lake Machine Learning Big Data

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Fun fact : I co-founded an e-commerce company (realistically, a mail-order catalog hosted online) in December 1992 using one of those internetworking applications called Gopher , which was vaguely popular at the time. Somehow, the gravity of the data has a geological effect that forms data lakes. Upcoming Events.

Machine Learning

Machine Learning Data Governance Metadata Data Science

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access. 1 When comparing published 2023 list prices normalized for VPC hours of watsonx.data to several major cloud data warehouse vendors.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Administrators can publish QuickSight applications on the Keycloak Admin console. Insert your specific host domain name where the Keycloak application resides in the following URL: [link] /realms/aws-realm/protocol/saml/descriptor. Vamsi Bhadriraju is a Data Architect at AWS. Change the IdP initiated SSO Relay State to [link].

Metadata

Metadata Dashboards Business Intelligence Data Lake

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

It is also hard to know whether one can trust the data within a spreadsheet. And they rarely, if ever, host the most current data available. Sathish Raju, cofounder & CTO, Kloudio and senior director of engineering, Alation: This presents challenges for both business users and data teams.

Metadata

Metadata Enterprise Cost-Benefit Finance

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

AWS Big Data

AUGUST 28, 2023

Optionally, specify the Amazon S3 storage class for the data in Amazon Security Lake. For more information, refer to Lifecycle management in Security Lake. Review the details and create the data lake. For example, choosing DNS Activity will give you dashboards of all DNS activity published in Amazon Security Lake.

Dashboards

Dashboards Visualization Metadata Management

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. Gartner also published the same piece of research for other roles, such as Application and Software Engineering.

Data Analytics

Data Analytics Analytics Data-driven Finance

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

An on-premise solution provides a high level of control and customization as it is hosted and managed within the organization’s physical infrastructure, but it can be expensive to set up and maintain. Next, identify the data sources that will be involved in the mapping.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

AWS Big Data

MARCH 13, 2025

Flexible deployment options for StarTree Cloud StarTree offers multiple deployment options, including a StarTree hosted software as a service (SaaS) or customer hosted SaaS. StarTrees customer hosted SaaS provides flexibility for customers interested in deploying the solution within their AWS environment or other platform of choice.

Management

Management Analytics OLAP Online Analytical Processing

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

AWS Big Data

APRIL 28, 2025

When the user interacts with resources within SageMaker Unified Studio, it generates IAM session credentials based on the users effective profile in the specific project context, and then users can use tools such as Amazon Athena or Amazon Redshift to query the relevant data. SageMaker Unified Studio supports Lake Formation hybrid mode.

Metadata

Metadata Data Lake Big Data Publishing

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Trending Sources

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Webinars

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

How EUROGATE established a data mesh architecture using Amazon DataZone

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Eight Top DataOps Trends for 2022

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Top 15 data management platforms

Governing data in relational databases using Amazon DataZone

Data Management Requirements for the Enterprise Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

HEMA accelerates their data governance journey with Amazon DataZone

10 Things AWS Can Do for Your SaaS Company

Top 15 data management platforms available today

Accomplish Agile Business Intelligence & Analytics For Your Business

Announcing the 2020 Data Impact Award Winners

Access Amazon Athena in your applications using the WebSocket API

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

How Cloudera Data Flow Enables Successful Data Mesh Architectures

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Automate deployment of an Amazon QuickSight analysis connecting to an Amazon Redshift data warehouse with an AWS CloudFormation template

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

How Cargotec uses metadata replication to enable cross-account data sharing

Themes and Conferences per Pacoid, Episode 8

Exploring the AI and data capabilities of watsonx

Federate Amazon QuickSight access with open-source identity provider Keycloak

What Is Alation Connected Sheets? Q&A with the Creators

Generate security insights from Amazon Security Lake data using Amazon OpenSearch Ingestion

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

What is Data Mapping?

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

Stay Connected