Data Lake, Data Processing and Strategy

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Open AWS Glue Studio. Choose ETL Jobs.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

The success of GenAI models lies in your data management strategy

CIO Business Intelligence

OCTOBER 9, 2024

However, this enthusiasm may be tempered by a host of challenges and risks stemming from scaling GenAI. As the technology subsists on data, customer trust and their confidential information are at stake—and enterprises cannot afford to overlook its pitfalls.

Strategy

Strategy Modeling Management Data Lake

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

MORE WEBINARS

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

This led to inefficiencies in data governance and access control. AWS Lake Formation is a service that streamlines and centralizes the data lake creation and management process. The Solution: How BMW CDH solved data duplication The CDH is a company-wide data lake built on Amazon Simple Storage Service (Amazon S3).

Data Lake

Data Lake Sales Metadata Machine Learning

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. The process creates a JSON file with the original_content and summary fields.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

Disaster recovery is vital for organizations, offering a proactive strategy to mitigate the impact of unforeseen events like system failures, natural disasters, or cyberattacks. In Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for Recovery in the Cloud , we introduced four major strategies for disaster recovery (DR) on AWS.

Snapshot

Snapshot Strategy Dashboards Data Lake

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

However, enterprises often encounter challenges with data silos, insufficient access controls, poor governance, and quality issues. Embracing data as a product is the key to address these challenges and foster a data-driven culture. To achieve this, they plan to use machine learning (ML) models to extract insights from data.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

When it comes to implementing and managing a successful BI strategy we have always proclaimed: start small, use the right BI tools , and involve your team. You need to determine if you are going with an on-premise or cloud-hosted strategy. You want an organization-wide buy-in of your business intelligence strategy.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

DS Smith sets a single-cloud agenda for sustainability

CIO Business Intelligence

DECEMBER 6, 2023

British multinational packaging giant DS Smith has committed itself to ambitious sustainability goals, and its IT strategy to standardize on a single cloud will be a key enabler. The single-cloud platform strategy will include SaaS partners used for automation of more than 40 enterprise applications, Dickson says.

Manufacturing

Manufacturing Data Lake Digital Transformation Machine Learning

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps. Well, let’s find out. Artificial intelligence (AI). Easy to use. Hopefully, it was informative and helpful to you.

Cost-Benefit

Cost-Benefit Data Lake Software Machine Learning

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

A Gartner Marketing survey found only 14% of organizations have successfully implemented a C360 solution, due to lack of consensus on what a 360-degree view means, challenges with data quality, and lack of cross-functional governance structure for customer data.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

In fact, each of the 29 finalists represented organizations running cutting-edge use cases that showcase a winning enterprise data cloud strategy. The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform.

Internet Publishing and Broadcasting

Internet Publishing and Broadcasting Data-driven Broadcasting Digital Transformation

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

Amazon DataZone allows you to simply and securely govern end-to-end data assets stored in your Amazon Redshift data warehouses or data lakes cataloged with the AWS Glue data catalog. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. This enables data-driven decision-making across the organization. Lake Formation – Lake Formation emerged as a cornerstone in Bluestone’s data governance strategy.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In this post, we explore how AWS Glue can serve as the data integration service to bring the data from Snowflake for your data integration strategy, enabling you to harness the power of your data ecosystem and drive meaningful outcomes across various use cases.

Analytics

Analytics Data-driven Data Integration Data Lake

What a quarter century of digital transformation at PayPal looks like

CIO Business Intelligence

OCTOBER 4, 2023

One strategy, five keys From a technological point of view, the brand’s strategic engine is divided into five investment areas. At the lowest layer is the infrastructure, made up of databases and data lakes. These applications live on innumerable servers, yet some technology is hosted in the public cloud.

Digital Transformation

Digital Transformation Deep Learning Data Lake Risk

How data literacy allows gen AI to drive productivity at Dow

CIO Business Intelligence

JULY 31, 2024

For Melanie Kalmar, the answer is data literacy and a strong foundation in tech. How do data and digital technologies impact your business strategy? At the core, digital at Dow is about changing how we work, which includes how we interact with systems, data, and each other to be more productive and to grow.

Manufacturing

Manufacturing Cost-Benefit Digital Transformation Forecasting

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

HEMA accelerates their data governance journey with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Each service is hosted in a dedicated AWS account and is built and maintained by a product owner and a development team, as illustrated in the following figure. Delta tables technical metadata is stored in the Data Catalog, which is a native source for creating assets in the Amazon DataZone business catalog.

Data Governance

Data Governance Publishing Data-driven Metadata

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

While managing unstructured data remains a challenge for 36% of organizations, according to the 2022 Foundry Data and Analytics Research survey, many IT leaders are actively seeking ways of harnessing all types of data stored in data lakes.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Dairyland powers up for a generative AI edge

CIO Business Intelligence

APRIL 9, 2024

Beginning in 2021, the Minneapolis-based Microsoft partner helped Dairyland migrate from several custom legacy applications to a commercial implementation of Dynamics 365 and an Azure data lake, which set the stage for the power company’s early foray into AI, according to the systems integrator.

Digital Transformation

Digital Transformation Machine Learning Data Lake Software

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

The term “data management platform” can be confusing because, while it sounds like a generalized product that works with all forms of data as part of generalized data management strategies, the term has been more narrowly defined of late as one targeted to marketing departments’ needs. Of course, marketing also works.

Management

Management Advertising Data Lake Sales

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

In one institution we recently spoke with, they told us it took them over 30 weeks to procure and deploy a new data warehouse, while with CDW they got everything up and running in just a few seconds (after, of course, a few days obtaining the data migration and policy clearances involved). . Central control of security and governance.

Data Warehouse

Data Warehouse Data Lake IT Analytics

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI. Many of our service-based offerings include hosting and executing our customers’ omnichannel platforms.

IT

IT Interactive Marketing Consulting

The Strategy Behind our Denodo Partner Program

Data Virtualization

SEPTEMBER 23, 2020

I’m referring not only to our technology partners, but also to our cloud partners that host the Denodo Platform, Denodo is a very partner-friendly company, and here I’d like to share some thoughts about how Denodo works with our partners.

Strategy

Strategy Data Processing Technology Digital Transformation

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents. There are several things you need to report attached to that number.”

Reporting

Reporting Data Quality Strategy Data-driven

CDP Private Cloud is a Game-changer for Partners

Cloudera

SEPTEMBER 2, 2020

Recently, Cloudera announced the release of Cloudera CDP Private Cloud, delivering the final component of our hybrid cloud strategy. Additionally, lines of business (LOBs) are able to gain access to a shared data lake that is secured and governed by the use of Cloudera Shared Data Experience (SDX).

Cost-Benefit

Cost-Benefit Data Warehouse Data Lake Machine Learning

Capital Group invests big in talent development

CIO Business Intelligence

JULY 29, 2022

That focus includes not only the firm’s customer-facing strategies but also its commitment to investing in the development of its employees, a strategy that is paying off, as evidenced by Capital Group’s No. The bootcamp broadened my understanding of key concepts in data engineering.

Data Lake

Data Lake Software Data Processing Structured Data

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants. Cloudera Manager (CM) 6.2

Metadata

Metadata Data Lake Optimization Strategy

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

According to the research, organizations are adopting cloud ERP models to identify the best alignment with their strategy, business development, workloads and security requirements. Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

To develop your disaster recovery plan, you should complete the following tasks: Define your recovery objectives for downtime and data loss (RTO and RPO) for data and metadata. Identify recovery strategies to meet the recovery objectives. Using backups Backing up data is an important part of data management.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

5 ways to maximize your cloud investment

CIO Business Intelligence

JANUARY 10, 2024

FinOps is part of the equation, but from a CIO perspective, you need a top-down view that starts with the strategy before you talk about the components of it,” McMasters says. What’s the business case for use of the technology, and the strategy for a two- to three-year period, and where do we need to be two to three years from now?

Cost-Benefit

Cost-Benefit Measurement Optimization Metrics

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Putting your data to work with generative AI – Innovation Talk Thursday, November 30 | 12:30 – 1:30 PM PST | The Venetian Join Mai-Lan Tomsen Bukovec, Vice President, Technology at AWS to learn how you can turn your data lake into a business advantage with generative AI. Reserve your seat now! Reserve your seat now!

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Previously, there were three types of data structures in telco: . Entity data sets — i.e. marketing data lakes . The result has been an extraordinary volume of data redundancy across the business, leading to disaggregated data strategy, unknown compliance exposures, and inconsistencies in data-based processes. .

Data Architecture

Data Architecture Cost-Benefit Digital Transformation Business Driver

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Effective planning, thorough risk assessment, and a well-designed migration strategy are crucial to mitigating these challenges and implementing a successful transition to the new data warehouse environment on Amazon Redshift. Organic strategy – This strategy uses a lift and shift data schema using migration tools.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes.

Data Lake

Data Lake Dashboards Metrics Metadata

10 Keys to a Secure Cloud Data Lakehouse

Cloudera

OCTOBER 25, 2022

The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Data Processing

Data Processing Data Lake Cost-Benefit Risk

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

The success of GenAI models lies in your data management strategy

Webinars

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Enrich your serverless data lake with Amazon Bedrock

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Scaling RISE with SAP data and AWS Glue

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Accomplish Agile Business Intelligence & Analytics For Your Business

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

DS Smith sets a single-cloud agenda for sustainability

10 Things AWS Can Do for Your SaaS Company

Create an end-to-end data strategy for Customer 360 on AWS

Announcing the 2020 Data Impact Award Winners

Governing data in relational databases using Amazon DataZone

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

What a quarter century of digital transformation at PayPal looks like

How data literacy allows gen AI to drive productivity at Dow

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

HEMA accelerates their data governance journey with Amazon DataZone

FINRA CIO Steve Randich pushes the public cloud forward

Dairyland powers up for a generative AI edge

Top 15 data management platforms available today

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Running both IT and digital at Alorica

The Strategy Behind our Denodo Partner Program

CIOs rise to the ESG reporting challenge

CDP Private Cloud is a Game-changer for Partners

Capital Group invests big in talent development

Improving Multi-tenancy with Virtual Private Clusters

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Implement disaster recovery with Amazon Redshift

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

5 ways to maximize your cloud investment

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Modern Data Architecture for Telecommunications

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

10 Keys to a Secure Cloud Data Lakehouse

Stay Connected