Analytics, Data Lake and Technology

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

AWS Big Data

OCTOBER 30, 2024

This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake ( Apache Iceberg ) using AWS Glue. Delete the bucket.

Data Lake

Data Lake Data Processing Optimization Machine Learning

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

AWS Big Data

MARCH 13, 2025

At AWS re:Invent 2024, we announced the next generation of Amazon SageMaker , the center for all your data, analytics, and AI. It enables teams to securely find, prepare, and collaborate on data assets and build analytics and AI applications through a single experience, accelerating the path from data to value.

Analytics

Analytics Data Lake Data Warehouse Data-driven

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

AWS Big Data

DECEMBER 4, 2024

This week on the keynote stages at AWS re:Invent 2024, you heard from Matt Garman, CEO, AWS, and Swami Sivasubramanian, VP of AI and Data, AWS, speak about the next generation of Amazon SageMaker , the center for all of your data, analytics, and AI. The relationship between analytics and AI is rapidly evolving.

Data Analytics

Data Analytics Analytics Data Lake Data Quality

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Choose Connect with JDBC. In the JDBC parameters dialog box, select Using IDC auth and copy the JDBC URL.

Analytics

Analytics Visualization Data Governance Data-driven

Building Best-in-Class Enterprise Analytics

Speaker: Anthony Roach, Director of Product Management at Tableau Software, and Jeremiah Morrow, Partner Solution Marketing Director at Dremio

Tableau works with Strategic Partners like Dremio to build data integrations that bring the two technologies together, creating a seamless and efficient customer experience. As a result, these two solutions come together to deliver: Lightning-fast BI and interactive analytics directly on data wherever it is stored.

Analytics

Use External Data Platform to Improve Analytics

David Menninger's Analyst Perspectives

OCTOBER 19, 2021

Our research shows that external data sources are also a routine part of data preparation processes, with 80% of organizations incorporating one or more external data sources. And a similar proportion of participants in our research (84%) include external data in their data lakes.

Data Lake

Data Lake Analytics Machine Learning Marketing

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

Amazon Redshift , launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of data warehousing and SQL analytics. This allowed customers to scale read analytics workloads and offered isolation to help maintain SLAs for business-critical applications.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used. The landscape of data technology is swiftly advancing, driven frequently by projects led by the open source community in general and the Apache foundation specifically.

Data Lake

Data Lake Metadata Snapshot Analytics

Incorta Streamlines Analytics with Direct Data Access

David Menninger's Analyst Perspectives

APRIL 21, 2021

Trying to get ahead of the backlog with incremental improvements to existing approaches and traditional technologies alone can be frustrating. This can potentially slow down the organizational functions which depend on the analysis results.

Analytics

Analytics Technology Data Lake Data Governance

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

Good Data Governance Improves Business Processes

David Menninger's Analyst Perspectives

FEBRUARY 17, 2022

For example, I have previously written about the valuable connection between the use of data catalogs and satisfaction with an organization’s data lake. Our most recent Analytics and Data Benchmark Research demonstrates some of the beneficial links between data governance and analytics.

Data Governance

Data Governance Data Lake Analytics Technology

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight.

Data Lake

Data Lake Sales Metadata Machine Learning

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. In practice, OTFs are used in a broad range of analytical workloads, from business intelligence to machine learning.

Metadata

Metadata Data Lake Snapshot Data Warehouse

AtScale Universal Semantic Layer Democratizes and Scales Analytics

David Menninger's Analyst Perspectives

FEBRUARY 10, 2022

Migrating to the cloud does not solve the problems associated with performing analytics and business intelligence on data stored in disparate systems. Also, the computing power needed to process large volumes of data consists of clusters of servers with hundreds or thousands of nodes that can be difficult to administer.

Analytics

Analytics Business Intelligence Metrics Reporting

Accomplish Agile Business Intelligence & Analytics For Your Business

datapine

APRIL 15, 2020

When encouraging these BI best practices what we are really doing is advocating for agile business intelligence and analytics. In our opinion, both terms, agile BI and agile analytics, are interchangeable and mean the same. What Is Agile Analytics And BI? Agile Business Intelligence & Analytics Methodology.

Business Intelligence

Business Intelligence Analytics Testing Dashboards

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Tens of thousands of customers today use Amazon Redshift to analyze exabytes of data and run analytical queries, making it the most widely used cloud data warehouse. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

AWS Big Data

JULY 18, 2024

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks.

Data Lake

Data Lake Publishing Metadata Data-driven

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

AWS Big Data

OCTOBER 9, 2023

Data lakes have been gaining popularity for storing vast amounts of data from diverse sources in a scalable and cost-effective way. As the number of data consumers grows, data lake administrators often need to implement fine-grained access controls for different user profiles.

Data Lake

Data Lake Testing Big Data Management

Enrich your serverless data lake with Amazon Bedrock

AWS Big Data

SEPTEMBER 26, 2024

For many organizations, this centralized data store follows a data lake architecture. Although data lakes provide a centralized repository, making sense of this data and extracting valuable insights can be challenging. Clean up To avoid incurring future charges, delete the resources you created.

Data Lake

Data Lake Cost-Benefit Unstructured Data Modeling

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With this new functionality, customers can create up-to-date replicas of their data from applications such as Salesforce, ServiceNow, and Zendesk in an Amazon SageMaker Lakehouse and Amazon Redshift. SageMaker Lakehouse gives you the flexibility to access and query your data in-place with all Apache Iceberg compatible tools and engines.

Data Integration

Data Integration Data Lake Statistics Data-driven

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially.

IoT

IoT Machine Learning Metadata Data-driven

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Use cases for Hive metastore federation for Amazon EMR Hive metastore federation for Amazon EMR is applicable to the following use cases: Governance of Amazon EMR-based data lakes – Producers generate data within their AWS accounts using an Amazon EMR-based data lake supported by EMRFS on Amazon Simple Storage Service (Amazon S3)and HBase.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

The requirement to integrate enormous quantities and varieties of data coupled with extreme pressure on analytics cycle time has driven the pharmaceutical industry to lead in DataOps adoption. The bottom line is how to attain analytic agility? It often takes months to progress from a data lake to the final delivery of insights.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.

Data Architecture

Data Architecture Management Consulting Internet of Things

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

DataOps adoption continues to expand as a perfect storm of social, economic, and technological factors drive enterprises to invest in process-driven innovation. As a result, enterprises will examine their end-to-end data operations and analytics creation workflows. Data Gets Meshier. Rise of the DataOps Engineer.

Testing

Testing Data Lake Data Architecture Manufacturing

Top analytics announcements of AWS re:Invent 2024

AWS Big Data

FEBRUARY 26, 2025

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data.

Analytics

Analytics Data Lake Metadata Data Warehouse

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

AWS Big Data

MARCH 28, 2023

As organizations across the globe are modernizing their data platforms with data lakes on Amazon Simple Storage Service (Amazon S3), handling SCDs in data lakes can be challenging.

Data Lake

Data Lake Testing Snapshot Big Data

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

AWS Big Data

FEBRUARY 26, 2025

Many thousands of customers across various industries are using these services to transform, operationalize, and manage their data across data lakes and data warehouses. This includes the data integration capabilities mentioned above, with support for both structured and unstructured data.

Data Integration

Data Integration Data Lake Data Warehouse Unstructured Data

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

AWS Big Data

JUNE 23, 2023

Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake. Imperva harnesses data to improve their business outcomes. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

Insights hidden in your data are essential for optimizing business operations, finetuning your customer experience, and developing new products — or new lines of business, like predictive maintenance. And as businesses contend with increasingly large amounts of data, the cloud is fast becoming the logical place where analytics work gets done.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Data Lake

Data Lake Data Warehouse Marketing Management

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Amazon Redshift Serverless, generally available since 2021, allows you to run and scale analytics without having to provision and manage the data warehouse.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

What I Learned At Gartner Data & Analytics 2022

Timo Elliott

MAY 27, 2022

I was at the Gartner Data & Analytics conference in London a couple of weeks ago and I’d like to share some thoughts on what I think was interesting, and what I think I learned…. First, data is by default, and by definition, a liability , because it costs money and has risks associated with it.

Data Analytics

Data Analytics Analytics Recreation/Entertainment Data Lake

TIBCO Broadens Portfolio for Improved Analytics Efficiency

David Menninger's Analyst Perspectives

NOVEMBER 30, 2021

TIBCO is a large, independent cloud-computing and data analytics software company that offers integration, analytics, business intelligence and events processing software. It enables organizations to analyze streaming data in real time and provides the capability to automate analytics processes.

Analytics

Analytics Data Warehouse Business Intelligence Software

Introduction of Microsoft Fabric

Analytics Vidhya

OCTOBER 6, 2023

Enter Microsoft Fabric, a cutting-edge solution designed to revolutionize how we interact with technology. This article will explore the key features and benefits, identify the ideal users for this solution, and guide you on when and how to […] The post Introduction of Microsoft Fabric appeared first on Analytics Vidhya.

Interactive

Interactive Technology Analytics Data Lake

The Internet of Things: Real-Time Data and Analytics Enable Business Innovation

David Menninger's Analyst Perspectives

MAY 20, 2019

This innovation means that virtually any appropriately designed device can generate and transmit data about its operations, which can facilitate monitoring and a range of automatic functions.

Internet of Things

Internet of Things IoT Analytics Optimization

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. He is devoted to designing and building end-to-end solutions to address customers data analytic and processing needs with cloud-based, data-intensive technologies.

Data Integration

Data Integration Visualization Data Processing Big Data

Look Out: Computer Vision in AI is Coming Into Sight

David Menninger's Analyst Perspectives

FEBRUARY 21, 2024

Unstructured data has been a significant factor in data lakes and analytics for some time. Twelve years ago, nearly a third of enterprises were working with large amounts of unstructured data. As I’ve pointed out previously , unstructured data is really a misnomer.

Unstructured Data

Unstructured Data Data Lake Enterprise Technology

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. Data ingestion is the process of getting data to Amazon Redshift.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg

Accelerate analytics and AI innovation with the next generation of Amazon SageMaker

Webinars

Trending Sources

The next generation of Amazon SageMaker: The center for all your data, analytics, and AI

Webinars

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Building Best-in-Class Enterprise Analytics

Use External Data Platform to Improve Analytics

Recap of Amazon Redshift key product announcements in 2024

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Incorta Streamlines Analytics with Direct Data Access

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Good Data Governance Improves Business Processes

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Run Apache XTable in AWS Lambda for background conversion of open table formats

AtScale Universal Semantic Layer Democratizes and Scales Analytics

Accomplish Agile Business Intelligence & Analytics For Your Business

Choosing an open table format for your transactional data lake on AWS

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

Using AWS AppSync and AWS Lake Formation to access a secure data lake through a GraphQL API

Enrich your serverless data lake with Amazon Bedrock

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

How EUROGATE established a data mesh architecture using Amazon DataZone

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Centralize Your Data Processes With a DataOps Process Hub

What is data architecture? A framework to manage data

Eight Top DataOps Trends for 2022

Top analytics announcements of AWS re:Invent 2024

Build a real-time GDPR-aligned Apache Iceberg data lake

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

7 key Microsoft Azure analytics services (plus one extra)

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

What I Learned At Gartner Data & Analytics 2022

TIBCO Broadens Portfolio for Improved Analytics Efficiency

Introduction of Microsoft Fabric

The Internet of Things: Real-Time Data and Analytics Enable Business Innovation

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Look Out: Computer Vision in AI is Coming Into Sight

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Stay Connected