2023, Data Lake and Data Processing

2023

Data Lake

Data Processing

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Trending Sources

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

The workflow consists of the following initial steps: OpenSearch Service is hosted in the primary Region, and all the active traffic is routed to the OpenSearch Service domain in the primary Region. In this query, the repository name is os-snapshot-repo and the snapshot name is 2023-11-18.

Snapshot

Snapshot Strategy Dashboards Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Save the date: AWS re:Invent 2023 is happening from November 27 to December 1 in Las Vegas, and you cannot miss it. In today’s data-driven landscape, the quality of data is the foundation upon which the success of organizations and innovations stands. Reserve your seat now! Your questions are welcome and encouraged.

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Preparing the foundations for Generative AI

CIO Business Intelligence

FEBRUARY 20, 2024

Data also needs to be sorted, annotated and labelled in order to meet the requirements of generative AI. No wonder CIO’s 2023 AI Priorities study found that data integration was the number one concern for IT leaders around generative AI integration, above security and privacy and the user experience.

Cost-Benefit

Cost-Benefit Data Lake Data Warehouse Data Processing

Accelerate your data warehouse migration to Amazon Redshift – Part 7

AWS Big Data

OCTOBER 17, 2023

Tens of thousands of customers use Amazon Redshift to gain business insights from their data. With Amazon Redshift, you can use standard SQL to query data across your data warehouse, operational data stores, and data lake. After you install the data extraction agent, register it in AWS SCT.

Data Warehouse

Data Warehouse Data Processing Data Lake Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Typically, you have multiple accounts to manage and run resources for your data pipeline. Looking at the Skewness Job per Job visualization, there was spike on November 1, 2023. Visualize and analyze with AWS Glue job observability metrics Let’s use the dashboard to make AWS Glue usage more performant. Let’s drill down into details.

Metrics

Metrics Visualization Dashboards Interactive

Aaand the New NiFi Champion is…

Cloudera

JUNE 5, 2023

On May 3, 2023, Cloudera kicked off a contest called “Best in Flow” for NiFi developers to compete to build the best data pipelines. RK built some simple flows to pull streaming data into Google Cloud Storage and Snowflake. This blog is to congratulate our winner and review the top submissions.

Testing

Testing Data Lake Data Processing IT

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

Set up EMR Studio In this step, we demonstrate the actions needed from the data lake administrator to set up EMR Studio enabled for trusted identity propagation and with IAM Identity Center integration. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane.

Analytics

Analytics Data Lake Management Enterprise

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Microsoft itself claims half of Fortune 500 companies use its Copilot tools and the number of daily users doubled in Q4 2023, although without saying how widely they’re deployed in those organizations. Organizations with experience building enterprise data lakes connecting to many different data sources have AI advantages.

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

CIOs rise to the ESG reporting challenge

CIO Business Intelligence

JANUARY 30, 2024

“Always the gatekeepers of much of the data necessary for ESG reporting, CIOs are finding that companies are even more dependent on them,” says Nancy Mentesana, ESG executive director at Labrador US, a global communications firm focused on corporate disclosure documents.

Reporting

Reporting Data Quality Strategy Data-driven

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

The DataRobot AI Platform seamlessly integrates with Azure cloud services, including Azure Machine Learning, Azure Data Lake Storage Gen 2 (ADLS), Azure Synapse Analytics, and Azure SQL database. Models trained in DataRobot can also be easily deployed to Azure Machine Learning, allowing users to host models easier in a secure way.

Data-driven

Data-driven Machine Learning Experimentation Data Lake

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access. 1 When comparing published 2023 list prices normalized for VPC hours of watsonx.data to several major cloud data warehouse vendors.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Simplify data access for your enterprise using Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

The use of separate data warehouses and lakes has created data silos, leading to problems such as lack of interoperability, duplicate governance efforts, complex architectures, and slower time to value. You can use Amazon SageMaker Lakehouse to achieve unified access to data in both data warehouses and data lakes.

Data Lake

Data Lake Enterprise Data Warehouse Sales

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix. This is a new way to interact with the web and search.

IT Insurance Cost-Benefit Testing

UAB IT helps fuel genomic breakthroughs

CIO Business Intelligence

MARCH 10, 2022

Next up: AI and data lake decisions. To that end, UAB’s next step is to tackle big decisions around expanding its AI and data analytics platforms, says Carver, who is not handling the long-term planning alone. UAB is a big Microsoft customer but also has master service agreements with Amazon and Google, Carver says.

IT Data Lake Digital Transformation Data Governance

Data Leaders Brief

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Webinars

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Preparing the foundations for Generative AI

Accelerate your data warehouse migration to Amazon Redshift – Part 7

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Aaand the New NiFi Champion is…

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Why enterprise CIOs need to plan for Microsoft gen AI

CIOs rise to the ESG reporting challenge

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

Exploring the AI and data capabilities of watsonx

Simplify data access for your enterprise using Amazon SageMaker Lakehouse

CIO 100 Award winners drive business results with IT

UAB IT helps fuel genomic breakthroughs

Stay Connected