Business Intelligence, Data Lake and Reference

Run Apache XTable in AWS Lambda for background conversion of open table formats

AWS Big Data

NOVEMBER 26, 2024

Initially, data warehouses were the go-to solution for structured data and analytical workloads but were limited by proprietary storage formats and their inability to handle unstructured data. Eventually, transactional data lakes emerged to add transactional consistency and performance of a data warehouse to the data lake.

Metadata

Metadata Data Lake Snapshot Data Warehouse

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Statistics Optimization

Webinars

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

AWS Big Data

DECEMBER 12, 2024

One-time and complex queries are two common scenarios in enterprise data analytics. Complex queries, on the other hand, refer to large-scale data processing and in-depth analysis based on petabyte-level data warehouses in massive data scenarios. file, enter the preprocessing code for the raw lineage data.

Snapshot

Snapshot Recreation/Entertainment Experimentation Data Lake

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

With this integration, you can now seamlessly query your governed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools, including partner solutions like Tableau. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

APRIL 27, 2023

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable).

Data Lake

Data Lake Snapshot Optimization Data Transformation

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

AWS Big Data

JANUARY 26, 2023

AWS Glue provides an extensible architecture that enables users with different data processing use cases. A common use case is building data lakes on Amazon Simple Storage Service (Amazon S3) using AWS Glue extract, transform, and load (ETL) jobs. AWS Glue version Hudi Delta Lake Iceberg AWS Glue 3.0 AWS Glue 4.0

Data Lake

Data Lake Big Data Software Interactive

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

AWS Big Data

OCTOBER 30, 2024

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. The S3 object path can reference a set of folders that have the same key prefix.

Data Warehouse

Data Warehouse Sales Data Lake Recreation/Entertainment

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance. Apache Hudi connector for AWS Glue For this post, we use AWS Glue 4.0,

Data Lake

Data Lake Visualization Dashboards Insurance

Reference guide to build inventory management and forecasting solutions on AWS

AWS Big Data

APRIL 11, 2023

By collecting data from store sensors using AWS IoT Core , ingesting it using AWS Lambda to Amazon Aurora Serverless , and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.

Forecasting

Forecasting Management IoT Data-driven

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3 Access Grants

AWS Big Data

MAY 29, 2024

Refer to Configure SAML and SCIM with Okta and IAM Identity Center for instructions. You need to reference the bucket name and the certificate bundle.zip file in AWS CloudFormation. Refer to the following table for a list of important parameters. In this post, we use the us-east-1 Region. In this post, we grant access to Group1.

Data Lake

Data Lake Enterprise Management Business Intelligence

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Data Lake

Data Lake Insurance Data-driven Data Processing

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The product data is stored on Amazon Aurora PostgreSQL-Compatible Edition. Their existing business intelligence (BI) tool runs queries on Athena. Furthermore, they have a data pipeline to perform extract, transform, and load (ETL) jobs when moving data from the Aurora PostgreSQL database cluster to other data stores.

Visualization

Visualization Data Processing Testing Publishing

What is Business Intelligence Consulting

BizAcuity

APRIL 1, 2023

Several large organizations have faltered on different stages of BI implementation, from poor data quality to the inability to scale due to larger volumes of data and extremely complex BI architecture. This is where business intelligence consulting comes into the picture. What is Business Intelligence?

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

What is Business Intelligence Consulting

BizAcuity

JANUARY 31, 2023

Several large organizations have faltered on different stages of BI implementation, from poor data quality to the inability to scale due to larger volumes of data and extremely complex BI architecture. This is where business intelligence consulting comes into the picture. What is Business Intelligence?

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

AWS Big Data

AUGUST 27, 2024

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence (BI) tools. The Amazon Redshift service must be running in the same Region where the Salesforce Data Cloud is running.

Data Lake

Data Lake Analytics Data-driven Management

7 key Microsoft Azure analytics services (plus one extra)

CIO Business Intelligence

JUNE 29, 2022

If you’re used to using SQL Server Analysis Services for business intelligence, Analysis Services offers that enterprise-grade analytics engine as a cloud service that you can also connect to Power BI. Azure Data Lake Analytics. The reason Azure has so many analytics services is so you can build your entire stack there.

Data Lake

Data Lake Analytics Data Warehouse Machine Learning

Einstein Studio 1: What it is and what to expect

CIO Business Intelligence

JULY 31, 2024

With this platform, Salesforce seeks to help organizations apply the cleverness of LLMs to the customer data they have squirreled away in Salesforce data lakes in the hopes of selling more. Einstein 1 Studio handles the piping so the data from your Einstein 1 platform instance will flow smoothly into the AI.

Data Lake

Data Lake IT Sales Experimentation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Automate schema evolution at scale with Apache Hudi in AWS Glue

AWS Big Data

FEBRUARY 7, 2023

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. Apache Hudi supports ACID transactions and CRUD operations on a data lake. You don’t alter queries separately in the data lake.

Data Lake

Data Lake Testing Big Data Structured Data

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Analytics Data Lake Machine Learning

Using SQL to democratize streaming data

Cloudera

MARCH 2, 2021

Many of the existing visual business intelligence and dashboard tools also use SQL as a standard language. Democratizing data refers to a mechanism that provides a self-serve paradigm and culture for an ever-growing internal audience to get the data they need to add value to the business.

Data Lake

Data Lake Data Warehouse Dashboards Enterprise

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Flexible and easy to use – The solutions should provide less restrictive, easy-to-access, and ready-to-use data. A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users.

Analytics

Analytics Data Warehouse Data Lake Metadata

Why Business Intelligence is Top of Mind for CFOs for 2022

Jet Global

DECEMBER 3, 2021

The term “ business intelligence ” (BI) has been in common use for several decades now, referring initially to the OLAP systems that drew largely upon pre-processed information stored in data warehouses. As the cost benefit ratio of BI has become more and more attractive, the pace of global business has also accelerated.

Business Intelligence

Business Intelligence OLAP Sales Data Warehouse

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

For instance, a Data Cloud-triggered flow could update an account manager in Slack when shipments in an external data lake are marked as delayed. Sharing Customer 360 insights back without data replication. With zero-copy support, the insurance company wouldn’t have to load that weather data into their platform.

Data Integration

Data Integration Data Lake Data Warehouse Metadata

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

AWS Big Data

JULY 28, 2023

Today, tens of thousands of customers run business-critical workloads on Amazon Redshift to cost-effectively and quickly analyze their data using standard SQL and existing business intelligence (BI) tools. Change the settings for existing Data Catalog resources.

Data Lake

Data Lake Data Governance Data Warehouse Data-driven

Your 5-Step Journey from Analytics to AI

CIO Business Intelligence

MARCH 22, 2022

Which type(s) of storage consolidation you use depends on the data you generate and collect. . One option is a data lake—on-premises or in the cloud—that stores unprocessed data in any type of format, structured or unstructured, and can be queried in aggregate. Set up unified data governance rules and processes.

Analytics

Analytics Key Performance Indicator Data Warehouse Data-driven

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. Additionally, data is extracted from vendor APIs that includes data related to product, marketing, and customer experience.

Data Warehouse

Data Warehouse Analytics Data Lake Data Science

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. The automated orchestration published the data to an AWS S3 Data Lake. Historic Balance – compares current data to previous or expected values.

Testing

Testing Metadata Dashboards Statistics

Visualize Confluent data in Amazon QuickSight using Amazon Athena

AWS Big Data

MARCH 27, 2023

To visualize data stored in Confluent, you can use one of over 120 pre-built connectors, provided by Confluent, to write streaming data to a destination data store of your choice. Next, you connect your business intelligence (BI) tool to the data store to begin visualizing the data.

Visualization

Visualization Data Lake Interactive Data-driven

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

AWS Big Data

MAY 16, 2023

Application log challenges: Data management and compliance Application logs are an essential component of any application; they provide valuable information about the usage and performance of the system. Zoom and the AWS team (collectively referred to as “the team” going forward) identified two major workflows for data ingestion and deletion.

Data Lake

Data Lake Cost-Benefit Optimization Testing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications. Choose Next.

Analytics

Analytics Data-driven Data Integration Data Lake

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

AWS Big Data

MARCH 18, 2024

To create your namespace and workgroup, refer to Creating a data warehouse with Amazon Redshift Serverless. Use Query Editor v2 to load customer data from Amazon S3 You can use Query Editor v2 to submit queries and load data to your data warehouse through a web interface.

Data Warehouse

Data Warehouse Visualization Snapshot Data-driven

Differentiate generative AI applications with your data using AWS analytics and managed databases

AWS Big Data

SEPTEMBER 12, 2024

You can extend the solution in directions such as the business intelligence (BI) domain with customer 360 use cases, and the risk and compliance domain with transaction monitoring and fraud detection use cases. The application gets prompt templates from an S3 data lake and creates the engineered prompt.

Management

Management Analytics Data Lake Interactive

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

They can connect to multiple data streams and pull data directly into Amazon Redshift without staging it in Amazon Simple Storage Service (Amazon S3). After running analytics, the insights can be made available broadly across the organization with Amazon QuickSight , a cloud-native, serverless business intelligence service.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Attribute Amazon EMR on EC2 costs to your end-users

AWS Big Data

AUGUST 27, 2024

With Amazon EMR, you can take advantage of the power of these big data tools to process, analyze, and gain valuable business intelligence from vast amounts of data. Refer to How do I set up a NAT gateway for a private subnet in Amazon VPC? For more information, refer to Prerequisites. manylinux2014_x86_64.whl

Metrics

Metrics Dashboards Data Lake Optimization

Making the gen AI and data connection work

CIO Business Intelligence

AUGUST 9, 2024

If after anonymization the level of information in the data is the same, the data is still useful. But once personal or sensitive references are removed, and the data is no longer effective, a problem arises. Synthetic data avoids these difficulties, but they’re not exempt from the need of a trade-off.

Risk

Risk Measurement Data Collection Data Lake

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

AWS Big Data

JULY 25, 2023

It automatically provisions and intelligently scales data warehouse compute capacity to deliver fast performance, and you pay only for what you use. Just load your data and start querying right away in the Amazon Redshift Query Editor or in your favorite business intelligence (BI) tool.

Metrics

Metrics Data Warehouse Dashboards Snapshot

Run Apache XTable in AWS Lambda for background conversion of open table formats

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Webinars

Trending Sources

Choosing an open table format for your transactional data lake on AWS

Webinars

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Reference guide to build inventory management and forecasting solutions on AWS

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and Amazon S3 Access Grants

Compose your ETL jobs for MongoDB Atlas with AWS Glue

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

What is Business Intelligence Consulting

What is Business Intelligence Consulting

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1

7 key Microsoft Azure analytics services (plus one extra)

Einstein Studio 1: What it is and what to expect

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Automate schema evolution at scale with Apache Hudi in AWS Glue

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Using SQL to democratize streaming data

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Why Business Intelligence is Top of Mind for CFOs for 2022

Salesforce debuts Zero Copy Partner Network to ease data integration

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

Your 5-Step Journey from Analytics to AI

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

What is a data architect? Skills, salaries, and how to become a data framework master

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

A Day in the Life of a DataOps Engineer

Visualize Confluent data in Amazon QuickSight using Amazon Athena

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Enrich your customer data with geospatial insights using Amazon Redshift, AWS Data Exchange, and Amazon QuickSight

Differentiate generative AI applications with your data using AWS analytics and managed databases

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Governing data in relational databases using Amazon DataZone

Attribute Amazon EMR on EC2 costs to your end-users

Making the gen AI and data connection work

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

Stay Connected