Consulting, Data Lake and Data Processing

Consulting

Data Lake

Data Processing

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. To start the job, choose Run. format(dbname)).config("spark.sql.catalog.glue_catalog.catalog-impl",

Data Lake

Data Lake Data Processing Optimization Machine Learning

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

Verify all table metadata is stored in the AWS Glue Data Catalog. Consume data with Athena or Amazon EMR Trino for business analysis. Update and delete source records in Amazon RDS for MySQL and validate the reflection of the data lake tables. the Flink table API/SQL can integrate with the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

AWS Big Data

NOVEMBER 11, 2024

The workflow consists of the following initial steps: OpenSearch Service is hosted in the primary Region, and all the active traffic is routed to the OpenSearch Service domain in the primary Region. Sesha Sanjana Mylavarapu is an Associate Data Lake Consultant at AWS Professional Services.

Snapshot

Snapshot Strategy Dashboards Data Lake

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The essential check list for effective data democratization

CIO Business Intelligence

JANUARY 20, 2023

Of course, cost is a big consideration, says Orlandini, as well as deciding where to host the data, and having it available in a fiscally responsible way. An organization might also question if the data should be maintained on-premises due to security concerns in the public cloud. They have data swamps,” he says.

Data Lake

Data Lake Data-driven Finance Data Architecture

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making. Smooth, hassle-free deployment in just six weeks. ” Prior to the upgrade, HBL’s 27 node cluster ran on CDH 6.1

Management

Management Data Lake Consulting Unstructured Data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Running both IT and digital at Alorica

CIO Business Intelligence

JUNE 1, 2022

Finally, make sure you understand your data, because no machine learning solution will work for you if you aren’t working with the right data. Data lakes have a new consumer in AI. Many of our service-based offerings include hosting and executing our customers’ omnichannel platforms.

IT Interactive Marketing Consulting

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

IBM Big Data Hub

MAY 19, 2023

Furthermore, TDC Digital had not used any cloud storage solution and experienced latency and downtime while hosting the application in its data center. TDC Digital is excited about its plans to host its IT infrastructure in IBM data centers, offering better scalability, performance and security.

Unstructured Data

Unstructured Data Data Processing Manufacturing Data Lake

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

The challenge is to do it right, and a crucial way to achieve it is with decisions based on data and analysis that drive measurable business results. This was the key learning from the Sisense event heralding the launch of Periscope Data in Tel Aviv, Israel — the beating heart of the startup nation. What VCs want from startups.

Data Lake

Data Lake Big Data Sales Data-driven

Attribute Amazon EMR on EC2 costs to your end-users

AWS Big Data

AUGUST 27, 2024

About the Authors Raj Patel is AWS Lead Consultant for Data Analytics solutions based out of India. His background is in data warehouse/data lake – architecture, development and administration. He is in data and analytical field for over 14 years. He is in data and analytical field for over 14 years.

Metrics

Metrics Dashboards Data Lake Optimization

Implement disaster recovery with Amazon Redshift

AWS Big Data

JUNE 27, 2024

Set up a custom domain with Amazon Redshift in the primary Region In the hosted zone that Route 53 created when you registered the domain, create records to tell Route 53 how you want to route traffic to Redshift endpoint by completing the following steps: On the Route 53 console, choose Hosted zones in the navigation pane.

Snapshot

Snapshot Data Warehouse Data Processing Strategy

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

bridgei2i

MARCH 3, 2021

Unlocking the Value of Enterprise AI with Data Engineering Capabilities. In this episode of the AI to Impact Podcast, host Pavan Kumar speaks to Prinkan Pal about the evolution of data engineering and ML-operations from a closed team into a tech consulting unit. I’m your host – Pawan Kumar.

Enterprise

Enterprise Digital Transformation Data-driven Interactive

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

We can determine the following are needed: An open data format ingestion architecture processing the source dataset and refining the data in the S3 data lake. This requires a dedicated team of 3–7 members building a serverless data lake for all data sources. Vijay Bagur is a Sr.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Why enterprise CIOs need to plan for Microsoft gen AI

CIO Business Intelligence

AUGUST 14, 2024

Start where your data is Using your own enterprise data is the major differentiator from open access gen AI chat tools, so it makes sense to start with the provider already hosting your enterprise data. A data leakage plan helps here too. “As

Enterprise

Enterprise Cost-Benefit Experimentation Modeling

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2007: Amazon launches SimpleDB, a non-relational (NoSQL) database that allows businesses to cheaply process vast amounts of data with minimal effort. The platform is built on S3 and EC2 using a hosted Hadoop framework. An efficient big data management and storage solution that AWS quickly took advantage of. To be continued.

Data-driven

Data-driven IoT Unstructured Data Data Lake

How Novo Nordisk built distributed data governance and control at scale

AWS Big Data

APRIL 28, 2023

In this example, the analytics tool accesses the data lake on Amazon Simple Storage Service (Amazon S3) through Athena queries. As the data mesh pattern expands across domains covering more downstream services, we need a mechanism to keep IdPs and IAM role trusts continuously updated.

Data Governance

Data Governance Management Data-driven Analytics

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Insert your specific host domain name where the Keycloak application resides in the following URL: [link] /realms/aws-realm/protocol/saml/descriptor. Vamsi Bhadriraju is a Data Architect at AWS. He works closely with enterprise customers to build data lakes and analytical applications on the AWS Cloud.

Metadata

Metadata Dashboards Business Intelligence Data Lake

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

It is also hard to know whether one can trust the data within a spreadsheet. And they rarely, if ever, host the most current data available. Sathish Raju, cofounder & CTO, Kloudio and senior director of engineering, Alation: This presents challenges for both business users and data teams.

Metadata

Metadata Enterprise Cost-Benefit Finance

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

On January 4th I had the pleasure of hosting a webinar. It was titled, The Gartner 2021 Leadership Vision for Data & Analytics Leaders. This was for the Chief Data Officer, or head of data and analytics. As with any good consulting response, “it depends.” Data lakes don’t offer this nor should they.

Data Analytics

Data Analytics Analytics Data-driven Finance

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix.

IT Insurance Cost-Benefit Testing

CIOs weigh where to place AI bets — and how to de-risk them

CIO Business Intelligence

MARCH 18, 2024

“By investing in the development of our full-time equivalents [FTEs] and equipping our technologists with the requisite expertise, we aim to minimize reliance on external consultants and maximize our ability to drive innovation from within,” says Nafde. AI tools rely on the data in use in these solutions.

Risk

Risk Cost-Benefit Data Processing Testing

CIOs look beyond ‘Big 3’ cloud providers for AI innovation

CIO Business Intelligence

FEBRUARY 13, 2025

The sheer variety and volume of data used for precision therapeutics requires Athos to build its own AI algorithms and AI models, which it may commercialize to other biotechnology and pharmaceutical companies when fully baked. These providers thrive in areas where specialization, flexibility, and cost efficiency matter most.

Cost-Benefit

Cost-Benefit Data Lake Modeling Enterprise

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

AWS Big Data

OCTOBER 30, 2024

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with Amazon Redshift Spectrum and create the gold (consumption) layer. The following diagram illustrates the different layers of the data lake.

Data Lake

Data Lake Machine Learning Data Architecture Data-driven

Data Leaders Brief

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Build a data lake with Apache Flink on Amazon EMR

Webinars

Trending Sources

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

Webinars

The essential check list for effective data democratization

Use AWS Glue to streamline SFTP data processing

Habib Bank manages data at scale with Cloudera Data Platform

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Running both IT and digital at Alorica

TDC Digital leverages IBM Cloud for transparent billing and improved customer satisfaction

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Attribute Amazon EMR on EC2 costs to your end-users

Implement disaster recovery with Amazon Redshift

PODCAST: Making AI Real – Episode 4: Unlocking the Value of Enterprise AI with Data Engineering Capabilities

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Why enterprise CIOs need to plan for Microsoft gen AI

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

How Novo Nordisk built distributed data governance and control at scale

Federate Amazon QuickSight access with open-source identity provider Keycloak

What Is Alation Connected Sheets? Q&A with the Creators

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Data Governance for Dummies: Your Questions, Answered

CIO 100 Award winners drive business results with IT

CIOs weigh where to place AI bets — and how to de-risk them

CIOs look beyond ‘Big 3’ cloud providers for AI innovation

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

Stay Connected