Data Lake, Data Processing and Enterprise

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

MORE WEBINARS

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. Data Gets Meshier. Hub-Spoke Enterprise Architectures.

Testing

Testing Data Lake Data Architecture Manufacturing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

Data is the most significant asset of any organization. However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

The rise of generative AI (GenAI) felt like a watershed moment for enterprises looking to drive exponential growth with its transformative potential. However, this enthusiasm may be tempered by a host of challenges and risks stemming from scaling GenAI. That’s why many enterprises are adopting a two-pronged approach to GenAI.

Strategy

Strategy Modeling Management Data Lake

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. But these powerful technologies also introduce new risks and challenges for enterprises. Data: the foundation of your foundation model Data quality matters.

Enterprise

Enterprise Technology Modeling Cost-Benefit

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Between building gen AI features into almost every enterprise tool it offers, adding the most popular gen AI developer tool to GitHub — GitHub Copilot is already bigger than GitHub when Microsoft bought it — and running the cloud powering OpenAI, Microsoft has taken a commanding lead in enterprise gen AI.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. Ingest and Delivery.

Data Lake

Data Lake Enterprise Management Metadata

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

The workflow consists of the following initial steps: OpenSearch Service is hosted in the primary Region, and all the active traffic is routed to the OpenSearch Service domain in the primary Region. Samir works directly with enterprise customers to design and build customized solutions catered to their data analytics and cybersecurity needs.

Snapshot

Snapshot Strategy Dashboards Data Lake

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Starting on a solid data foundation Before choosing a platform for sharing data, an organization needs to understand what data it already has and strip it of errors and duplicates. Data formats and data architectures are often inconsistent, and data might even be incomplete. They have data swamps,” he says.

Data Lake

Data Lake Data-driven Finance Data Architecture

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Enterprises moving their artificial intelligence projects into full scale development are discovering escalating costs based on initial infrastructure choices. Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. Verify all table metadata is stored in the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Unlocking the Value of Enterprise AI with Data Engineering Capabilities. Tune in to the podcast to know more about the evolving industry and how new technologies are transforming the enterprise AI landscape. PODCAST: Making AI Real.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Metadata

Metadata Data Governance Data Quality Data-driven

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

CIO Business Intelligence

DECEMBER 7, 2023

As a global company with more than 6,000 employees, BMC faces many of the same data challenges that other large enterprises face. The organization has 500 applications for business services, 80,000 VMs, 3,000 hosts, and more than 100,000 containers. Given the sheer volume of enterprise data, it’s impossible to do this manually.

IT

IT Data Lake Business Services Data Processing

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

In fact, each of the 29 finalists represented organizations running cutting-edge use cases that showcase a winning enterprise data cloud strategy. The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. Data for Enterprise AI.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

AWS Glue crawlers support cross-account crawling to support data mesh architecture

AWS Big Data

MARCH 27, 2023

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business.

Data Lake

Data Lake Data-driven Management Data Architecture

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud. Here, Dickson sees data generated from its industrial machines being very productive.

Manufacturing

Manufacturing Data Lake Digital Transformation Machine Learning

Alation’s Role in the Sentient Enterprise

Alation

FEBRUARY 20, 2020

Imagine a new type of business, one in which the fabric of data is so woven throughout the enterprise that it becomes almost a living, breathing entity that one day may even be able to make the right decisions for you. All of these Alation customers are experimenting with the beginnings of the Sentient Enterprise.

Enterprise

Enterprise Data Processing Data Lake Insurance

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. BMS’s EDLS platform hosts over 5,000 jobs and is growing at 15% YoY (year over year). Pavan leads enterprise metadata capabilities at BMS.

Metadata

Metadata Data Lake Visualization Data Quality

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

And, in his experience, the public cloud is “not quite” as infinitely horizontally scalable as many think — though only a handful of enterprises come even close to reaching the barrier, he says. Randich, who came to FINRA.org in 2013 after stints as co-CIO of Citigroup and former CIO of Nasdaq, is no stranger to the public cloud.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. Langchain) and LLM evaluations (e.g.

Management

Management Metrics Data Processing Data Lake

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

Data is at the heart of everything we do today, from AI to machine learning or generative AI. We’ve been leveraging predictive technologies, or what I call traditional AI, across our enterprise for nearly two decades with R&D and manufacturing, for example, all partnering with IT. This work is not new to Dow.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

From establishing an enterprise-wide data inventory and improving data discoverability, to enabling decentralized data sharing and governance, Amazon DataZone has been a game changer for HEMA. HEMA has a bespoke enterprise architecture, built around the concept of services.

Data Governance

Data Governance Publishing Data-driven Metadata

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

It’s necessary to say that these processes are recurrent and require continuous evolution of reports, online data visualization , dashboards, and new functionalities to adapt current processes and develop new ones. You need to determine if you are going with an on-premise or cloud-hosted strategy. Construction Iterations.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Do You Know Where Your Sensitive Data Is?

Data Governance

Data Governance Cost-Benefit Metadata Risk

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Implement alerts in Amazon OpenSearch Service with PagerDuty

AWS Big Data

JUNE 8, 2023

Create a service on PagerDuty To create a service on PagerDuty, complete the following steps: Log in to PagerDuty using your personal or enterprise account that is being used to enable the integration with OpenSearch Service. For Host , enter events.PagerDuty.com. A PagerDuty account with access to create a service and integration.

Data Lake

Data Lake Dashboards Metrics Testing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. This unlocks scalable analytics while maintaining data governance, compliance, and access control.

Analytics

Analytics Data-driven Data Integration Data Lake

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

Small and midsize enterprises (SMEs) are the fastest-growing segment in the market due to reliability, scalability, integration, flexibility and improved productivity. As a small- to medium-sized enterprise (SME), TDC Digital needed a transparent billing system to predict its expenses and price its services effectively.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Cloudera’s Data Warehouse service allows raw data to be stored in the cloud storage of your choice (S3, ADLSg2). It will be stored in your own namespace, and not force you to move data into someone else’s proprietary file formats or hosted storage. Proprietary file formats mean no one else is invited in! Separate compute.

Data Warehouse

Data Warehouse Data Lake IT Analytics

Integrate Amazon MWAA with Microsoft Entra ID using SAML authentication

AWS Big Data

JULY 30, 2024

Two private subnets are used to set up the Amazon MWAA environment, and the third private subnet is used to host the AWS Lambda authorizer function. Create the Azure AD service, users, groups, and enterprise application For the SSO integration with Azure, an enterprise application is required, which acts as the IdP for the SAML flow.

Metadata

Metadata Enterprise Management Data Lake

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Typically, you have multiple accounts to manage and run resources for your data pipeline. He has a track record of more than 18 years innovating and delivering enterprise products that unlock the power of data for users. Outside of work, Xiaorun enjoys exploring new places in the Bay Area.

Metrics

Metrics Visualization Dashboards Publishing

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Webinars

Eight Top DataOps Trends for 2022

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Enrich your serverless data lake with Amazon Bedrock

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

The success of GenAI models lies in your data management strategy

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

Why enterprise CIOs need to plan for Microsoft gen AI

Scaling RISE with SAP data and AWS Glue

Data Management Requirements for the Enterprise Data Lake

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

The essential check list for effective data democratization

Your New Cloud for AI May Be Inside a Colo

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Build a data lake with Apache Flink on Amazon EMR

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

Announcing the 2020 Data Impact Award Winners

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Glue crawlers support cross-account crawling to support data mesh architecture

DS Smith sets a single-cloud agenda for sustainability

Alation’s Role in the Sentient Enterprise

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

FINRA CIO Steve Randich pushes the public cloud forward

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

How data literacy allows gen AI to drive productivity at Dow

HEMA accelerates their data governance journey with Amazon DataZone

Addressing the Three Scalability Challenges in Modern Data Platforms

Accomplish Agile Business Intelligence & Analytics For Your Business

How Data Governance Protects Sensitive Data

Use AWS Glue to streamline SFTP data processing

Implement alerts in Amazon OpenSearch Service with PagerDuty

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Integrate Amazon MWAA with Microsoft Entra ID using SAML authentication

Access Amazon Athena in your applications using the WebSocket API

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Stay Connected