Analytics, Data Processing and Data Transformation

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Zero-ETL integration with Amazon Redshift reduces the need for custom pipelines, preserves resources for your transactional systems, and gives you access to powerful analytics. The data in Amazon Redshift is transactionally consistent and updates are automatically and continuously propagated. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Data Lake

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more.

Visualization

Visualization Data Lake Testing Data Governance

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Enhance agility by localizing changes within business domains and clear data contracts. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications.

Analytics

Analytics Data-driven Data Integration Data Lake

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. On your project, in the navigation pane, choose Data. For Add data source , choose Add connection. Choose the plus sign.

Visualization

Visualization Data Processing Testing Publishing

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Data analytics – Business analysts gather operational insights from multiple data sources, including the location data collected from the vehicles. You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches.

Analytics

Analytics IoT Metadata Internet of Things

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes. When teamed together with online BI tools , these rules can be key in predicting trends and reporting analytics.

Data Quality

Data Quality Metrics Data-driven Management

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Modern business applications rely on timely and accurate data with increasing demand for real-time analytics. There is a growing need for efficient and scalable data storage solutions. It captures and applies transactional changes in real time, minimizing latency and keeping target systems synchronized with source databases.

Analytics

Analytics Big Data Software Data Integration

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. Domain-owned data assets – The domain-oriented data ownership approach distributes responsibility for data across the business units within the Institutional Division.

Metadata

Metadata Data Governance Data Quality Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Raj focuses on helping customers develop sample dashboards, embed analytics and adopt BI design patterns and best practices. Rohit Pujari is the Head of Product for Embedded Analytics at QuickSight. He is passionate about shaping the future of infusing data-rich experiences into products and applications we use every day.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics. For Host , enter the Redshift Serverless endpoint’s host URL. This is optional.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. Introduction. CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

One key component that plays a central role in modern data architectures is the data lake, which allows organizations to store and analyze large amounts of data in a cost-effective manner and run advanced analytics and machine learning (ML) at scale. To overcome these issues, Orca decided to build a data lake.

Data Lake

Data Lake Analytics Snapshot Data Quality

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

The solution provides an end-to-end automated workflow that includes data ingestion, transformation, analytics, and consumption. The data used for transformation and analysis is based on the publicly available New York Citi Bike dataset. Choose Next. Leave all other values as default and choose Next.

Data Processing

Data Processing Management Publishing Visualization

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Data ingestion must be done properly from the start, as mishandling it can lead to a host of new issues. The groundwork of training data in an AI model is comparable to piloting an airplane. ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Spark SQL is an Apache Spark module for structured data processing. host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password')

Big Data

Big Data Data Processing Interactive Testing

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is.

Data Integration

Data Integration Testing Data Quality Data-driven

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives. Angel-Johnson shares that perspective. “I

IT

IT Digital Transformation Internet of Things Strategy

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Self-Service Data’s New Frontier: The Data Catalog

Alation

FEBRUARY 20, 2020

REFLECTIONS FROM THE GARTNER BI & ANALYTICS SUMMIT I hate to admit that the last time I attended the Gartner BI & Analytics Summit, Howard Dresner was still the host. For me personally, it was an amazing return this year to the now appropriately re-named, Gartner BI & Analytics Summit held in Grapevine, Texas.

Scorecard

Scorecard ROI Data-driven Visualization

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. See JDBC connections for further details.

Visualization

Visualization Metadata Data Transformation Testing

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. At this stage, CFM data scientists can perform analytics and extract value from raw data.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Acting as a comprehensive solution, the best BI tools collect and analyze company data to generate easily interpretable graphs, reports, and charts , leveraging advanced data mining, analytics, and visualization techniques. Best BI Tools for Data Analysts 3.1

Dashboards

Dashboards Visualization Data mining Data-driven

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

As data volumes continue to grow exponentially, traditional data warehousing solutions may struggle to keep up with the increasing demands for scalability, performance, and advanced analytics. However, you might face significant challenges when planning for a large-scale data warehouse migration.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

That was the message — delivered a little more elegantly than that — at Databricks’ Data+AI Summit 2022. As a proud Databricks partner , Alation joined the data and analytics community for three days of keynotes, leadership remarks, and breakout sessions during the last week of June in San Francisco. Destination Lakehouse.

ROI

ROI Metadata Data Lake Digital Transformation

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

Simply put, enterprises are increasingly seeking ways to take better advantage of their data and analytics to make data-informed decisions, strengthen the customer experience, and capitalize on cost-saving opportunities.

B2B

B2B Digital Transformation Marketing Data Processing

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Solution overview Typically, you have multiple accounts to manage and provision resources for your data pipeline. Every time the business requirement changes (such as adding data sources or changing data transformation logic), you make changes on the AWS Glue app stack and re-provision the stack to reflect your changes.

Data Integration

Data Integration Snapshot Testing Visualization

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Debaprasun Chakraborty is an AWS Solutions Architect, specializing in the analytics domain.

Sales

Sales Visualization Software Metadata

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

This shift addresses a growing demand for data access, which the modern data stack enables with cloud-based services and integration. There has also been a paradigm shift toward agile analytics and flexible options, where data assets can be moved around more quickly and easily, and not locked into a single vendor.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods. ” Vitaly Tsivin, EVP Business Intelligence at AMC Networks.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

Simply put, enterprises are increasingly seeking ways to take better advantage of their data and analytics to make data-informed decisions, strengthen the customer experience, and capitalize on cost-saving opportunities.

B2B

B2B Digital Transformation Marketing Data Processing

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

This post will unpack the top 7 traits that successful data product managers have in common. Successful Data Product Managers Know Their Data and Analytics If a product data manager wants to excel in their field, they must analyze data and analytics effectively. Data can be complex and ever-changing.

Management

Management Data-driven Visualization Strategy

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Transformed data can be stored in Amazon S3.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand. Deepti Venuturumilli is a Sr.

Sales

Sales Data Warehouse Visualization Testing

SQL Streambuilder Data Transformations

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Trending Sources

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

How EUROGATE established a data mesh architecture using Amazon DataZone

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Gain insights from historical location data using Amazon Location Service and AWS analytics services

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Amazon Redshift data ingestion options

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Enable data analytics with Talend and Amazon Redshift Serverless

Addressing the Three Scalability Challenges in Modern Data Platforms

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Use AWS Glue to streamline SFTP data processing

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

The importance of data ingestion and integration for enterprise AI

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Data Integrity, the Basis for Reliable Insights

The 10 biggest issues IT faces today

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Self-Service Data’s New Frontier: The Data Catalog

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Best BI Tools For 2024 You Need to Know

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation Steps Up APAC Presence Following Strong Growth

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Empowering data mesh: The tools to deliver BI excellence

Cross-account integration between SaaS platforms using Amazon AppFlow

The Modern Data Stack Explained: What The Future Holds

Exploring the AI and data capabilities of watsonx

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Alation Steps Up APAC Presence Following Strong Growth

7 Things All Successful Data Product Managers Have In Common

Build a data lake with Apache Flink on Amazon EMR

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Stay Connected