Data Processing, Data Transformation and IT

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. What is a data transformation?

Data Transformation

Data Transformation Data Processing Data Collection Publishing

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. I dont see that evolving too much beyond where we are today.

Data Processing

Data Processing Optimization Modeling Enterprise

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. For Host , enter your host name of your Aurora PostgreSQL database cluster. Choose Add data. option("url", jdbcurl).option("dbtable",

Visualization

Visualization Data Processing Testing Publishing

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

A common task for a data scientist is to build a predictive model. You know the drill: pull some data, carve it up into features, feed it into one of scikit-learn’s various algorithms. Collectively, your attempts teach you about your data and its relation to the problem you’re trying to solve.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

Ever increasing demands for transformation. Growing cybersecurity, data privacy threats. According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives.

IT

IT Digital Transformation Internet of Things Strategy

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

The data in Amazon Redshift is transactionally consistent and updates are automatically and continuously propagated. Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization.

Data Warehouse

Data Warehouse Analytics Testing Sales

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

Their terminal operations rely heavily on seamless data flows and the management of vast volumes of data. Recently, EUROGATE has developed a digital twin for its container terminal Hamburg (CTH), generating millions of data points every second from Internet of Things (IoT)devices attached to its container handling equipment (CHE).

IoT

IoT Machine Learning Metadata Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone natively integrates with Amazon-specific options like Amazon Athena , Amazon Redshift , and Amazon SageMaker , allowing users to analyze their project governed data. Connect to Tableau Desktop Use the Athena JDBC driver to connect Tableau to Amazon DataZone and visualize your subscribed data.

Visualization

Visualization Data Lake Testing Data Governance

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

What Is Data Quality Management (DQM)? Data quality management is a set of practices that aim at maintaining a high quality of information. It goes all the way from the acquisition of data and the implementation of advanced data processes, to an effective distribution of data.

Data Quality

Data Quality Metrics Data-driven Management

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications. Did you know?

Big Data

Big Data Internet of Things Cost-Benefit Optimization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams.

Metadata

Metadata Data Governance Data Quality Data-driven

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

Amazon Redshift , a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Access to an SFTP server with permissions to upload and download data. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

To create the connection string, the Snowflake host and account name is required. Using the worksheet, run the following SQL commands to find the host and account name. The account, host, user, password, and warehouse can differ based on your setup. Choose Next. For Secret name , enter airflow/connections/snowflake_accountadmin.

Data Processing

Data Processing Management Publishing Visualization

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In this post, we explore how AWS Glue can serve as the data integration service to bring the data from Snowflake for your data integration strategy, enabling you to harness the power of your data ecosystem and drive meaningful outcomes across various use cases. You can review and customize it to suit your needs if needed.

Analytics

Analytics Data-driven Data Integration Data Lake

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

The emergence of generative AI prompted several prominent companies to restrict its use because of the mishandling of sensitive internal data. Currently, no standardized process exists for overcoming data ingestion’s challenges, but the model’s accuracy depends on it. Increased variance: Variance measures consistency.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

Open source frameworks such as Apache Impala, Apache Hive and Apache Spark offer a highly scalable programming model that is capable of processing massive volumes of structured and unstructured data by means of parallel execution on a large number of commodity computing nodes. . As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

Spark SQL is an Apache Spark module for structured data processing. To look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust set of big data tools in the AWS Cloud to support these activities. Starting from version 1.2.0, Apache Spark has supported queries written in HiveQL.

Big Data

Big Data Data Processing Interactive Testing

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive. What is data integrity?

Data Integration

Data Integration Testing Data Quality Data-driven

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Octopai

NOVEMBER 13, 2022

The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.

Enterprise

Enterprise Data Warehouse Reporting Metadata

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

You will learn how to prepare a multi-account environment to access the databases from AWS Glue, and how to model an ETL data flow that automatically masks PII as part of the transfer process, so that no sensitive information will be copied to the target database in its original form. See JDBC connections for further details. 16 10.2.10.0/24

Visualization

Visualization Metadata Data Transformation Testing

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

The lift and shift migration approach is limited in its ability to transform businesses because it relies on outdated, legacy technologies and architectures that limit flexibility and slow down productivity. It shows a call center streaming data source that sends the latest call center feed in every 15 seconds.

Management

Management Metadata Analytics Dashboards

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools. By 2023, the focus shifted towards experimentation.

Optimization

Optimization Experimentation Metrics Enterprise

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

The solution consists of the following interfaces: IoT or mobile application – A mobile application or an Internet of Things (IoT) device allows the tracking of a company vehicle while it is in use and transmits its current location securely to the data ingestion layer in AWS. The ingestion approach is not in scope of this post.

Analytics

Analytics IoT Metadata Internet of Things

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Data at times is stored in different datasets and needs to be consolidated before meaningful and complete insights can be drawn from the datasets. This is where replication tools help move the data from its source to the target systems in real time and transform it as necessary to help businesses with consolidation.

Analytics

Analytics Big Data Software Data Integration

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. It was first opened to investors in 1995. CFM assets under management are now $13 billion.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards and reports, and share these with tens of thousands of users, either within QuickSight or embedded in your application or website.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data volume is in double-digit TBs with steady growth as business and data sources evolve. smava’s Data Platform team faced the challenge to deliver data to stakeholders with different SLAs, while maintaining the flexibility to scale up and down while staying cost-efficient. Only the cluster’s storage incurs charges.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

In this post, we demonstrate how Talend easily integrates with Redshift Serverless to help you accelerate and scale data analytics with trusted data. About Redshift Serverless Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. For Port , enter 5349.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Self-Service Data’s New Frontier: The Data Catalog

Alation

FEBRUARY 20, 2020

REFLECTIONS FROM THE GARTNER BI & ANALYTICS SUMMIT I hate to admit that the last time I attended the Gartner BI & Analytics Summit, Howard Dresner was still the host. It was down to Qlik, Microsoft, Microstrategy, and Tableau to represent and discover the complexities of the College Scorecard Data from the U.S.

Scorecard

Scorecard ROI Data-driven Visualization

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

Customers often need to share data between disparate software as a service (SaaS) platforms within their organization or across organizations. On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. Let’s take an example. Choose Create stack.

Sales

Sales Visualization Software Metadata

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Identify all upstream and downstream applications, as well as business processes that rely on the data warehouse. Data transformation experts to convert database stored functions in the producer or consumer.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1

Dashboards

Dashboards Visualization Data mining Data-driven

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

Our growth and further expansion of our team in the region underscores the strong demand for global cloud services and data intelligence , highlighting the tremendous market opportunity for digital transformation. We are actively hiring to bolster our APAC expansion, with a full list of open roles available here.

B2B

B2B Digital Transformation Marketing Data Processing

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

But you need to know your data and how it can be used. And the highlight, for us data intelligence folks, was the Databricks’ announcement that Unity Catalog , its unified governance solution for all data assets on its Lakehouse platform, will soon be available on AWS and Azure in the upcoming weeks.

ROI

ROI Metadata Data Lake Digital Transformation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The Orca Platform is powered by a state-of-the-art anomaly detection system that uses cutting-edge ML algorithms and big data capabilities to detect potential security threats and alert customers in real time, ensuring maximum security for their cloud environment. Why did Orca build a data lake? Why did Orca choose Apache Iceberg?

Data Lake

Data Lake Analytics Snapshot Data Quality

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Chances are, you’ve heard of the term “modern data stack” before. In this article, I will explain the modern data stack in detail, list some benefits, and discuss what the future holds. It is known to have benefits in handling data due to its robustness, speed, and scalability. Extract, load, Transform (ELT) tools.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The Amazon EMR Flink CDC connector reads the binlog data and processes the data.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

SQL Streambuilder Data Transformations

CIOs are rethinking how they use public cloud services. Here’s why.

Webinars

Trending Sources

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Automating the Automators: Shift Change in the Robot Factory

The 10 biggest issues IT faces today

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How EUROGATE established a data mesh architecture using Amazon DataZone

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Amazon Redshift data ingestion options

Use AWS Glue to streamline SFTP data processing

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

The importance of data ingestion and integration for enterprise AI

Addressing the Three Scalability Challenges in Modern Data Platforms

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Data Integrity, the Basis for Reliable Insights

Why Enterprise Data Lineage is Critical for the Success of Your Modern Data Stack

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Deploy and Scale AI Applications With Cloudera AI Inference Service

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Enable data analytics with Talend and Amazon Redshift Serverless

Self-Service Data’s New Frontier: The Data Catalog

Cross-account integration between SaaS platforms using Amazon AppFlow

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Best BI Tools For 2024 You Need to Know

Alation Steps Up APAC Presence Following Strong Growth

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The Modern Data Stack Explained: What The Future Holds

Build a data lake with Apache Flink on Amazon EMR

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Empowering data mesh: The tools to deliver BI excellence

Stay Connected