Data Processing, Data Transformation and Management

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. Judes But data-heavy workloads can be expensive, especially if constant, high-compute is required.

Data Processing

Data Processing Optimization Modeling Enterprise

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

1) What Is Data Quality Management? 4) Data Quality Best Practices. 5) How Do You Measure Data Quality? 6) Data Quality Metrics Examples. 7) Data Quality Control: Use Case. 8) The Consequences Of Bad Data Quality. 9) 3 Sources Of Low-Quality Data. 10) Data Quality Solutions: Key Attributes.

Data Quality

Data Quality Metrics Data-driven Management

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

This means you can refine your ETL jobs through natural follow-up questionsstarting with a basic data pipeline and progressively adding transformations, filters, and business logic through conversation. The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios.

Data Integration

Data Integration Visualization Data Processing Big Data

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

For each service, you need to learn the supported authorization and authentication methods, data access APIs, and framework to onboard and test data sources. This fragmented, repetitive, and error-prone experience for data connectivity is a significant obstacle to data integration, analysis, and machine learning (ML) initiatives.

Visualization

Visualization Data Processing Testing Publishing

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. With the addition of these technologies alongside existing systems like terminal operating systems (TOS) and SAP, the number of data producers has grown substantially.

IoT

IoT Machine Learning Metadata Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Ali Tore, Senior Vice President of Advanced Analytics at Salesforce, highlighting the value of this integration, says “We’re excited to partner with Amazon to bring Tableau’s powerful data exploration and AI-driven analytics capabilities to customers managing data across organizational boundaries with Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning. Create dbt models in dbt Cloud. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Organizations with legacy, on-premises, near-real-time analytics solutions typically rely on self-managed relational databases as their data store for analytics workloads. Near-real-time streaming analytics captures the value of operational data and metrics to provide new insights to create business opportunities.

Management

Management Metadata Analytics Dashboards

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Especially when you consider how Certain Big Cloud Providers treat autoML as an on-ramp to model hosting. Is autoML the bait for long-term model hosting? Related to the previous point, a company could go from “raw data” to “it’s serving predictions on live data” in a single work day.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights. This enables global discoverability and collaboration without centralizing ownership or operations.

Metadata

Metadata Data Governance Data Quality Data-driven

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Benefits Of Big Data In Logistics Before we look at our selection of practical examples and applications, let’s look at the benefits of big data in logistics – starting with the (not so) small matter of costs. A testament to the rising role of optimization in logistics. Why are logistics companies so interested in optimization?

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

With a data pipeline, which is a set of tasks used to automate the movement and transformation of data between different systems, you can reduce the time and effort needed to gain insights from the data. Apache Airflow and Snowflake have emerged as powerful technologies for data management and analysis.

Data Processing

Data Processing Management Publishing Visualization

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. As a result, alternative data integration technologies (e.g., Limited flexibility to use more complex hosting models (e.g., CRM platforms).

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. The SFTP connector is used to manage the connection to the SFTP server. Create the gateway endpoint.

Data Processing

Data Processing Visualization Data Lake Data Processing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. Snowflake credentials are securely stored in AWS Secrets Manager.

Analytics

Analytics Data-driven Data Integration Data Lake

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In collaboration with AWS, BMS identified a business need to migrate and modernize their custom extract, transform, and load (ETL) platform to a native AWS solution to reduce complexities, resources, and investment to upgrade when new Spark, Python, or AWS Glue versions are released.

Metadata

Metadata Data Lake Visualization Data Quality

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

Data product managers are in high demand these days. In 2020, Glassdoor rated product manager as the 4th best job in the US. This makes it more important for aspiring data product managers to stay ahead of the competition. So what sets data product managers apart from the pack? Sounds exciting?

Management

Management Data-driven Visualization Strategy

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

“So now there’s a focus on ‘transversal transformation,’” Hackenson adds. Market pressures continue to make customer experience a top CIO concern, says Aamer Baig, a senior partner with management consulting firm McKinsey & Co. To get there, Angel-Johnson has embarked on a master data management initiative.

IT

IT Digital Transformation Internet of Things Strategy

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

To look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust set of big data tools in the AWS Cloud to support these activities. InstanceId[]' Go to the Amazon EC2 console and connect to the master node through the Session Manager. or later installed. Instances[].InstanceId[]'

Big Data

Big Data Data Processing Interactive Testing

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is.

Data Integration

Data Integration Testing Data Quality Data-driven

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

How does watsonx.data bring disruptive innovation to data management? watsonx.data is truly open and interoperable The solution leverages not just open-source technologies, but those with open-source project governance and diverse communities of users and contributors, like Apache Iceberg and Presto, hosted by the Linux Foundation.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

AWS Glue Studio visual editor provides a low-code graphic environment to build, run, and monitor extract, transform, and load (ETL) scripts. There’s no infrastructure to manage, so you can focus on rapidly building compliant data flows between key systems. An AWS Identity and Access Management (IAM) role is used for AWS Glue.

Visualization

Visualization Metadata Data Transformation Testing

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. An AWS Identity and Access Management (IAM) user. An existing or new S3 bucket. ggsci GGSCI > add replicat rps3, exttrail./dirdat/tr/ea

Analytics

Analytics Big Data Software Data Integration

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.

Optimization

Optimization Experimentation Metrics Enterprise

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Capital Fund Management ( CFM ) is an alternative investment management company based in Paris with staff in New York City and London. CFM assets under management are now $13 billion. In this post, we share how we built a well-governed and scalable data engineering platform using Amazon EMR for financial features generation.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics. For Host , enter the Redshift Serverless endpoint’s host URL. For Port , enter 5349.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website. The QuickSight SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

This post shows how you can use Amazon Location, EventBridge, Lambda, Amazon Data Firehose , and Amazon S3 to build a location-aware data pipeline, and use this data to drive meaningful insights using AWS Glue and Athena. Overview of solution This is a fully serverless solution for location-based asset management.

Analytics

Analytics IoT Metadata Internet of Things

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

More specifically, you also need to fix bugs, resolve customer issues, and manage software changes. In addition, you need to monitor the overall system performance, security, and user experience to identify new ways to improve the existing data integration pipeline. It is the dictionary generated from default-config.yaml.

Data Integration

Data Integration Snapshot Testing Visualization

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Moreover, running advanced analytics and ML on disparate data sources proved challenging. To overcome these issues, Orca decided to build a data lake. By decoupling storage and compute, data lakes promote cost-effective storage and processing of big data. This ensures that the data is suitable for training purposes.

Data Lake

Data Lake Analytics Snapshot Data Quality

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

In 2024, business intelligence (BI) software has undergone significant advancements, revolutionizing data management and decision-making processes. In essence, the core capabilities of the best BI tools revolve around four essential functions: data integration, data transformation, data visualization, and reporting.

Dashboards

Dashboards Visualization Data mining Data-driven

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

In this blog post, I’ll share some exciting details about how Alation is growing in APAC and what this means for data transformation more widely in the region. All these businesses understand that there must be a better way to manage, find and govern trusted data.

B2B

B2B Digital Transformation Marketing Data Processing

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Extract, load, Transform (ELT) tools. Data ingestion/integration services. Data orchestration tools. These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? Reverse ETL tools.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Solution overview Considering our example of AnyCompany, let’s look at the data flow.

Sales

Sales Visualization Software Metadata

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

“Leveraging the data captured by the Unity metastore, Alation will enhance our existing integration with Databricks by easily including metadata from multiple workspaces,” said Alation director of product marketing Ibby Rahmani. The Power of Partnership to Accelerate Data Transformation. A Giant Partnership and a Giants Game.

ROI

ROI Metadata Data Lake Digital Transformation

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

The data mesh framework In the dynamic landscape of data management, the search for agility, scalability, and efficiency has led organizations to explore new, innovative approaches. One such innovation gaining traction is the data mesh framework. This empowers individual teams to own and manage their data.

Metadata

Metadata Data Quality Data Governance Modeling

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

This is a guest post by Nan Zhu, Tech Lead Manager, SafeGraph, and Dave Thibault, Sr. We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. SafeGraph found itself with a less-than-optimal Spark environment with their incumbent Spark vendor.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

In this blog post, I’ll share some exciting details about how Alation is growing in APAC and what this means for data transformation more widely in the region. All these businesses understand that there must be a better way to manage, find and govern trusted data.

B2B

B2B Digital Transformation Marketing Data Processing

SQL Streambuilder Data Transformations

CIOs are rethinking how they use public cloud services. Here’s why.

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How EUROGATE established a data mesh architecture using Amazon DataZone

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Automating the Automators: Shift Change in the Robot Factory

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Amazon Redshift data ingestion options

Addressing the Three Scalability Challenges in Modern Data Platforms

Use AWS Glue to streamline SFTP data processing

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

7 Things All Successful Data Product Managers Have In Common

The 10 biggest issues IT faces today

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Data Integrity, the Basis for Reliable Insights

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Deploy and Scale AI Applications With Cloudera AI Inference Service

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Enable data analytics with Talend and Amazon Redshift Serverless

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Best BI Tools For 2024 You Need to Know

Alation Steps Up APAC Presence Following Strong Growth

The Modern Data Stack Explained: What The Future Holds

Cross-account integration between SaaS platforms using Amazon AppFlow

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Empowering data mesh: The tools to deliver BI excellence

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Alation Steps Up APAC Presence Following Strong Growth

Stay Connected