Data Lake, Data Processing and Modeling

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

However, this enthusiasm may be tempered by a host of challenges and risks stemming from scaling GenAI. As the technology subsists on data, customer trust and their confidential information are at stake—and enterprises cannot afford to overlook its pitfalls. An example is Dell Technologies Enterprise Data Management.

Strategy

Strategy Modeling Management Data Lake

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. We recommend testing your use case and data with different models.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

In 2022, data organizations will institute robust automated processes around their AI systems to make them more accountable to stakeholders. Model developers will test for AI bias as part of their pre-deployment testing. Continuous testing, monitoring and observability will prevent biased models from deploying or continuing to operate.

Testing

Testing Data Lake Data Architecture Manufacturing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.

IoT

IoT Machine Learning Metadata Data-driven

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

Your New Cloud for AI May Be Inside a Colo

CIO Business Intelligence

MAY 23, 2022

Many companies whose AI model training infrastructure is not proximal to their data lake incur steeper costs as the data sets grow larger and AI models become more complex. The cloud is great for experimentation when data sets are smaller and model complexity is light.

Experimentation

Experimentation Cost-Benefit Data Lake Data Science

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In this regard, the enterprise data product catalog acts as a federated portal, facilitating cross-domain access and interoperability while maintaining alignment with governance principles. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Metadata

Metadata Data Governance Data Quality Data-driven

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

Business intelligence is moving away from the traditional engineering model: analysis, design, construction, testing, and implementation. In the traditional model communication between developers and business users is not a priority. You need to determine if you are going with an on-premise or cloud-hosted strategy.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. We stand on the frontier of an AI revolution.

Enterprise

Enterprise Technology Modeling Cost-Benefit

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

Its digital transformation began with an application modernization phase, in which Dickson and her IT teams determined which applications should be hosted in the public cloud and which should remain on a private cloud. Here, Dickson sees data generated from its industrial machines being very productive.

Manufacturing

Manufacturing Data Lake Digital Transformation Machine Learning

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. AWS also offers a variety of AI model development and delivery platforms , as well as packaged AI-based applications.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Recent research by McGuide Research Services for Avanade found 91% of organisations in the sector believe they need to shift to an AI-first operating model within the next 12 months, while 87% of employees feel generative AI tools will make them more efficient, and more innovative.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

CIO Business Intelligence

DECEMBER 7, 2023

As a global company with more than 6,000 employees, BMC faces many of the same data challenges that other large enterprises face. The organization has 500 applications for business services, 80,000 VMs, 3,000 hosts, and more than 100,000 containers. Given the sheer volume of enterprise data, it’s impossible to do this manually.

IT

IT Data Lake Business Services Data Processing

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Therefore, organizations have come to host huge volumes of metadata of their structured datasets in the Hive metastore.

Data Lake

Data Lake Metadata Data Processing Big Data

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. The image above demonstrates a KMS built using the llama3 model from Meta.

Management

Management Metrics Data Processing Data Lake

What a quarter century of digital transformation at PayPal looks like

CIO Business Intelligence

OCTOBER 4, 2023

At the lowest layer is the infrastructure, made up of databases and data lakes. These applications live on innumerable servers, yet some technology is hosted in the public cloud. PayPal’s deep learning models can be trained and put into production in two weeks, and even quicker for simpler algorithms.

Digital Transformation

Digital Transformation Deep Learning Data Lake Risk

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

We’ve built digital twins for several furnaces we operate across the globe, and we currently have 70 AI models running on those furnaces. These models allow us to predict failures early, and we forecast a 20% reduction in furnace unplanned events, improving repair times by at least two days. So AI helps us have fewer emergencies.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog.

Metadata

Metadata Data Lake Data Processing Data-driven

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

By separating the compute, the metadata, and data storage, CDW dynamically adapts to changing workloads and resource requirements, speeding up deployment while effectively managing costs – while preserving a shared access and governance model. Proprietary file formats mean no one else is invited in! Separate compute.

Data Warehouse

Data Warehouse Data Lake IT Analytics

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. This unlocks scalable analytics while maintaining data governance, compliance, and access control.

Analytics

Analytics Data-driven Data Integration Data Lake

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. CDP Data Lake cluster versions – CM 7.4.0, Pre-Check: Data Lake Cluster. Understanding Ranger Policies in Data Lake Cluster. Runtime 7.2.8.

Data Lake

Data Lake Metadata Unstructured Data Management

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

Deploying new data types for machine learning Mai-Lan Tomsen-Bukovec, vice president of foundational data services at AWS, sees the cloud giant’s enterprise customers deploying more unstructured data, as well as wider varieties of data sets, to inform the accuracy and training of ML models of late.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Previously head of cybersecurity at Ingersoll-Rand, Melby started developing neural networks and machine learning models more than a decade ago. I was literally just waiting for commercial availability [of LLMs] but [services] like Azure Machine Learning made it so you could easily apply it to your data.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

They recently needed to do a monthly load of 140 TB of uncompressed healthcare claims data in under 24 hours after receiving it to provide analysts and data scientists with up-to-date information on a patient’s healthcare journey. This data volume is expected to increase monthly and is fully refreshed each month.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data.

Data Lake

Data Lake Visualization Optimization Interactive

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Cloudera

DECEMBER 13, 2022

The attack targeted a host of public and private sector organizations (18,000 customers) including NASA, the Justice Department, and Homeland Security, and it is believed the attackers persisted on SolarWinds systems for 14 months prior to discovery. Operationalize ML with the Cloudera Data Platform.

Machine Learning

Machine Learning Experimentation Data Lake Data Processing

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Limited flexibility to use more complex hosting models (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

It’s embedded in the applications we use every day and the security model overall is pretty airtight. Microsoft has also made investments beyond OpenAI, for example in Mistral and Meta’s LLAMA models, in its own small language models like Phi, and by partnering with providers like Cohere, Hugging Face, and Nvidia. That’s risky.”

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

The company operates a global hybrid model—both on site and through a work-at-home platform—and provides a range of solutions from customer care to financial solutions and customized digital services. Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data.

IT

IT Interactive Marketing Consulting

Access Amazon Athena in your applications using the WebSocket API

AWS Big Data

MARCH 2, 2023

Many organizations are building data lakes to store and analyze large volumes of structured, semi-structured, and unstructured data. In addition, many teams are moving towards a data mesh architecture, which requires them to expose their data sets as easily consumable data products.

Data Lake

Data Lake Testing Interactive Unstructured Data

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

According to the research, organizations are adopting cloud ERP models to identify the best alignment with their strategy, business development, workloads and security requirements. Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

The success of GenAI models lies in your data management strategy

Webinars

Trending Sources

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Webinars

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Enrich your serverless data lake with Amazon Bedrock

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Eight Top DataOps Trends for 2022

How EUROGATE established a data mesh architecture using Amazon DataZone

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Scaling RISE with SAP data and AWS Glue

Your New Cloud for AI May Be Inside a Colo

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Accomplish Agile Business Intelligence & Analytics For Your Business

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

DS Smith sets a single-cloud agenda for sustainability

10 Things AWS Can Do for Your SaaS Company

Preparing the foundations for Generative AI

BMC on BMC: How the company enables IT observability with BMC Helix and AIOps

Query your Apache Hive metastore with AWS Lake Formation permissions

Announcing the 2020 Data Impact Award Winners

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

What a quarter century of digital transformation at PayPal looks like

How data literacy allows gen AI to drive productivity at Dow

Governing data in relational databases using Amazon DataZone

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Migrate Hive data from CDH to CDP public cloud

FINRA CIO Steve Randich pushes the public cloud forward

Dairyland powers up for a generative AI edge

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Run Spark SQL on Amazon Athena Spark

How Agencies Can Gain the Cyber Edge with Smart Data Solutions

Addressing the Three Scalability Challenges in Modern Data Platforms

Why enterprise CIOs need to plan for Microsoft gen AI

Running both IT and digital at Alorica

Access Amazon Athena in your applications using the WebSocket API

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Stay Connected