Data Processing, Data Transformation and Events

Data Processing

Data Transformation

Events

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

As an essential part of ETL, as data is being consolidated, we will notice that data from different sources are structured in different formats. It might be required to enhance, sanitize, and prepare data so that data is fit for consumption by the SQL engine. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. or a later version) database.

Data Warehouse

Data Warehouse Analytics Testing Sales

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Especially when you consider how Certain Big Cloud Providers treat autoML as an on-ramp to model hosting. Is autoML the bait for long-term model hosting? Related to the previous point, a company could go from “raw data” to “it’s serving predictions on live data” in a single work day.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

You will load the event data from the SFTP site, join it to the venue data stored on Amazon S3, apply transformations, and store the data in Amazon S3. The event and venue files are from the TICKIT dataset. Access to an SFTP server with permissions to upload and download data.

Data Processing

Data Processing Visualization Data Lake Data Processing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Using EventBridge integration, filtered positional updates are published to an EventBridge event bus. Amazon Location device position events arrive on the EventBridge default bus with source: ["aws.geo"] and detail-type: ["Location Device Position Event"]. In this model, the Lambda function is invoked for each incoming event.

Analytics

Analytics IoT Metadata Internet of Things

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. Data time-to-value: evaluates how long it takes you to gain insights from a data set. This is due to the technical nature of a data system itself.

Data Quality

Data Quality Metrics Data-driven Management

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. The marketing team created leads based on the event in Adobe Marketo.

Sales

Sales Visualization Software Marketing

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Additionally, there are major rewrites to deliver developer-focused improvements, including static type checking, enhanced runtime validation, strong consistency in call patterns, and optimized event chaining. is modernized by using promises for all actions, so developers can use async and await functions for better event management.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. To create the connection string, the Snowflake host and account name is required. Choose Next.

Data Processing

Data Processing Management Publishing Visualization

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is.

Data Integration

Data Integration Testing Data Quality Data-driven

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication.

Analytics

Analytics Big Data Software Data Integration

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

Angel-Johnson says she, too, has a heightened level of concern around security issues and more specifically data protection. I thought I was hired for digital transformation but what is really needed is a data transformation,” she says. Indeed, the 2022 CIO Leadership Perspectives study from Evanta found that the No.

IT Digital Transformation Internet of Things Strategy

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.

Optimization

Optimization Experimentation Metrics Enterprise

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. See JDBC connections for further details.

Visualization

Visualization Metadata Data Transformation Testing

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Interactive

Interactive Strategy Cost-Benefit Data Governance

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. Transformed data can be stored in Amazon S3.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand. Choose Submit.

Sales

Sales Data Warehouse Visualization Testing

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

These mandates ensure that PHA and PII data are protected and managed properly, so that patients are protected in the event of data breaches. Yet this same data is critical to improving patient outcomes. Today, lawmakers impose larger and larger fines on the organizations handling this data that don’t properly protect it.

Data Governance

Data Governance Measurement Data Quality Metrics

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix.

IT Insurance Cost-Benefit Testing

How to Aggregate Global Data from the Coronavirus Outbreak

Sisense

APRIL 10, 2020

In this article, we discuss how this data is accessed, an example environment and set-up to be used for data processing, sample lines of Python code to show the simplicity of data transformations using Pandas and how this simple architecture can enable you to unlock new insights from this data yourself.

Visualization

Visualization Reporting Data Processing Dashboards

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery. This allows for easy access and analysis of these events.

Dashboards

Dashboards Testing Metrics Optimization

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

Jet Global

AUGUST 24, 2023

This approach helps mitigate risks associated with data security and compliance, while still harnessing the benefits of cloud scalability and innovation. Simplify Data Integration: Angles for Oracle offers data transformation and cleansing features that allow finance teams to clean, standardize, and format data as needed.

Finance

Finance Reporting Data Integration Data Warehouse

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Strategic Objective Create a complete, user-friendly view of the data by preparing it for analysis. Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

Amazon EC2 to host and run a Jenkins build server. Solution walkthrough The solution architecture is shown in the preceding figure and includes: Continuous integration and delivery ( CI/CD) for data processing Data engineers can define the underlying data processing job within a JSON template.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

Data Leaders Brief

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

SQL Streambuilder Data Transformations

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

Automating the Automators: Shift Change in the Robot Factory

How EUROGATE established a data mesh architecture using Amazon DataZone

Use AWS Glue to streamline SFTP data processing

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Amazon Redshift data ingestion options

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Cross-account integration between SaaS platforms using Amazon AppFlow

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Data Integrity, the Basis for Reliable Insights

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

The 10 biggest issues IT faces today

Deploy and Scale AI Applications With Cloudera AI Inference Service

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Build a data lake with Apache Flink on Amazon EMR

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

The Rising Need for Data Governance in Healthcare

CIO 100 Award winners drive business results with IT

How to Aggregate Global Data from the Coronavirus Outbreak

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

What is Data Mapping?

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

What Is Embedded Analytics?

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Stay Connected