Data Processing and Data Transformation

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

As an essential part of ETL, as data is being consolidated, we will notice that data from different sources are structured in different formats. It might be required to enhance, sanitize, and prepare data so that data is fit for consumption by the SQL engine. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. Id be cautious about going down the path of private cloud hosting or on premises, says Nag.

Data Processing

Data Processing Optimization Modeling Enterprise

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

DataOps Should Be Part of Everyone on the Data Team

DataKitchen

JUNE 30, 2021

Data Transformers podcast hosts Peggy Tsai & Ramesh Dontha chat with DataKitchen CEO Chris Bergh about how DataOps should be 10% of every data team member's job. The post DataOps Should Be Part of Everyone on the Data Team first appeared on DataKitchen.

Data Processing

Data Processing Data Transformation

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Especially when you consider how Certain Big Cloud Providers treat autoML as an on-ramp to model hosting. Is autoML the bait for long-term model hosting? Related to the previous point, a company could go from “raw data” to “it’s serving predictions on live data” in a single work day.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. On your project, in the navigation pane, choose Data. For Add data source , choose Add connection. Choose the plus sign.

Visualization

Visualization Data Processing Testing Publishing

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Modeling

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

In the Driver Properties section, enter the parameters that you captured from Amazon DataZone: CredentialsProvider : The credentials provider to authenticate requests to AWS DataZoneDomainId : The ID of your Amazon DataZone domain DataZoneDomainRegion : The AWS Region where your domain is hosted.

Visualization

Visualization Data Lake Testing Data Governance

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth.

IoT

IoT Machine Learning Metadata Data-driven

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate. Data time-to-value: evaluates how long it takes you to gain insights from a data set. This is due to the technical nature of a data system itself.

Data Quality

Data Quality Metrics Data-driven Management

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

In addition to driving operational efficiency and consistently meeting fulfillment targets, logistics providers use big data applications to provide real-time updates as well as a host of flexible pick-up, drop-off, or ordering options. Use our 14-days free trial today & transform your supply chain!

Big Data

Big Data Internet of Things Cost-Benefit Optimization

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Data ingestion must be done properly from the start, as mishandling it can lead to a host of new issues. The groundwork of training data in an AI model is comparable to piloting an airplane. ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

To create the connection string, the Snowflake host and account name is required. Using the worksheet, run the following SQL commands to find the host and account name. The account, host, user, password, and warehouse can differ based on your setup. Choose Next. For Secret name , enter airflow/connections/snowflake_accountadmin.

Data Processing

Data Processing Management Publishing Visualization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

Access to an SFTP server with permissions to upload and download data. If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2) , we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram.

Data Processing

Data Processing Visualization Data Lake Data Processing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. Refer to Editing AWS Glue managed data transform nodes for more information.

Analytics

Analytics Data-driven Data Integration Data Lake

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is.

Data Integration

Data Integration Testing Data Quality Data-driven

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password') username') echo -e "n host: $HOST_NAMEn DB: $DB_NAMEn passowrd: $PASSWORDn username: $USER_NAMEn" After connected through Session Manager, query the Hive metastore from your Amazon EMR master node.

Big Data

Big Data Data Processing Interactive Testing

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives. Angel-Johnson shares that perspective. “I

IT

IT Digital Transformation Internet of Things Strategy

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

watsonx.data is truly open and interoperable The solution leverages not just open-source technologies, but those with open-source project governance and diverse communities of users and contributors, like Apache Iceberg and Presto, hosted by the Linux Foundation.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. See JDBC connections for further details.

Visualization

Visualization Metadata Data Transformation Testing

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.

Optimization

Optimization Experimentation Metrics Enterprise

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

For Host , enter the Redshift Serverless endpoint’s host URL. As well as Talend Cloud for enterprise-level data transformation needs, you could also use Talend Stitch to handle data ingestion and data replication to Redshift Serverless. For Host , enter the Redshift Serverless endpoint’s host URL.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Self-Service Data’s New Frontier: The Data Catalog

Alation

FEBRUARY 20, 2020

REFLECTIONS FROM THE GARTNER BI & ANALYTICS SUMMIT I hate to admit that the last time I attended the Gartner BI & Analytics Summit, Howard Dresner was still the host. Alation helps analysts find, understand and use their data. Everything you need to do to prepare for analysis before data transformation and visualization.

Scorecard

Scorecard ROI Data-driven Visualization

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. GoldenGate supports flexible replication topologies such as unidirectional, bidirectional, and multi-master configurations.

Analytics

Analytics Big Data Software Data Integration

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Query the data using Athena Athena is a serverless, interactive analytics service built to analyze unstructured, semi-structured, and structured data where it is hosted.

Analytics

Analytics IoT Metadata Internet of Things

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

In this blog post, I’ll share some exciting details about how Alation is growing in APAC and what this means for data transformation more widely in the region.

B2B

B2B Digital Transformation Marketing Data Processing

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Solution overview Typically, you have multiple accounts to manage and provision resources for your data pipeline. Every time the business requirement changes (such as adding data sources or changing data transformation logic), you make changes on the AWS Glue app stack and re-provision the stack to reflect your changes.

Data Integration

Data Integration Snapshot Testing Visualization

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

The following eventNames and eventCodes are returned as part of the onChange callback when there is a change in the SDK code status. append('Unable to load Dashboard at this time.'); break; } } } } Monitor interactions in embedded dashboards Another callback supported by SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

Through this partnership, Alation will help to scale governance policies for the lakehouse and foster data democratization for all users, so people can easily find and understand projects from the lakehouse and beyond. The Power of Partnership to Accelerate Data Transformation. A Giant Partnership and a Giants Game.

ROI

ROI Metadata Data Lake Digital Transformation

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Key Features: Extensive library of pre-built connectors for diverse data sources.

Dashboards

Dashboards Visualization Data mining Data-driven

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These help data analysts visualize key insights that can help you make better data-backed decisions. ELT Data Transformation Tools: ELT data transformation tools are used to extract, load, and transform your data. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

In this blog post, I’ll share some exciting details about how Alation is growing in APAC and what this means for data transformation more widely in the region.

B2B

B2B Digital Transformation Marketing Data Processing

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Data engineers are crucial for schema conversion and data transformation, and DBAs can handle cluster configuration and workload monitoring. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Let’s take an example. The marketing team created leads based on the event in Adobe Marketo.

Sales

Sales Visualization Software Metadata

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format.

Data Lake

Data Lake Dashboards Metrics Metadata

SQL Streambuilder Data Transformations

CIOs are rethinking how they use public cloud services. Here’s why.

Webinars

Trending Sources

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

DataOps Should Be Part of Everyone on the Data Team

Automating the Automators: Shift Change in the Robot Factory

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

How EUROGATE established a data mesh architecture using Amazon DataZone

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Amazon Redshift data ingestion options

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

The importance of data ingestion and integration for enterprise AI

Addressing the Three Scalability Challenges in Modern Data Platforms

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Use Snowflake with Amazon MWAA to orchestrate data pipelines

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Use AWS Glue to streamline SFTP data processing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Data Integrity, the Basis for Reliable Insights

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

The 10 biggest issues IT faces today

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Deploy and Scale AI Applications With Cloudera AI Inference Service

Enable data analytics with Talend and Amazon Redshift Serverless

Self-Service Data’s New Frontier: The Data Catalog

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Alation Steps Up APAC Presence Following Strong Growth

How smava makes loans transparent and affordable using Amazon Redshift Serverless

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Best BI Tools For 2024 You Need to Know

The Modern Data Stack Explained: What The Future Holds

Empowering data mesh: The tools to deliver BI excellence

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Alation Steps Up APAC Presence Following Strong Growth

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Cross-account integration between SaaS platforms using Amazon AppFlow

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected