Data Processing, Data Transformation and Information

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. Id be cautious about going down the path of private cloud hosting or on premises, says Nag.

Data Processing

Data Processing Optimization Modeling Enterprise

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications.

IoT

IoT Machine Learning Metadata Data-driven

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Port: Redshift 5439.

Data Warehouse

Data Warehouse Analytics Testing Sales

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

However, with all good things comes many challenges and businesses often struggle with managing their information in the correct way. Oftentimes, the data being collected and used is incomplete or damaged, leading to many other issues that can considerably harm the company. Enters data quality management.

Data Quality

Data Quality Metrics Data-driven Management

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

To work effectively, big data requires a large amount of high-quality information sources. Where is all of that data going to come from? Proactivity: Another key benefit of big data in the logistics industry is that it encourages informed decision-making and proactivity.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

The company stores vast amounts of transactional data, customer information, and product catalogs in Snowflake. However, they also generate and collect data from various other sources, such as web logs stored in Amazon S3, social media platforms, and third-party data providers. Choose Save.

Analytics

Analytics Data-driven Data Integration Data Lake

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

Configure a Secrets Manager secret Next, we set up Secrets Manager, which is a supported alternative database for storing Snowflake connection information and credentials. To create the connection string, the Snowflake host and account name is required. The account, host, user, password, and warehouse can differ based on your setup.

Data Processing

Data Processing Management Publishing Visualization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Duplicating data from a production database to a lower or lateral environment and masking personally identifiable information (PII) to comply with regulations enables development, testing, and reporting without impacting critical systems or exposing sensitive customer data. See AWS Glue: How it works for further details.

Visualization

Visualization Metadata Data Transformation Testing

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

Access to an SFTP server with permissions to upload and download data. If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2) , we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram.

Data Processing

Data Processing Visualization Data Lake Data Processing

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Data ingestion must be done properly from the start, as mishandling it can lead to a host of new issues. The groundwork of training data in an AI model is comparable to piloting an airplane. ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. In this article, we’ll dig into the core aspects of data integrity, what processes ensure it, and how to deal with data that doesn’t meet your standards.

Data Integration

Data Integration Testing Data Quality Data-driven

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

FINRA centralizes all its data in Amazon Simple Storage Service (Amazon S3) with a remote Hive metastore on Amazon Relational Database Service (Amazon RDS) to manage their metadata information. host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password')

Big Data

Big Data Data Processing Interactive Testing

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

offers increased visibility by providing new experience-specific information and warnings within the SDK. supports a new callback onChange , which returns eventNames along with corresponding eventCodes to indicate errors, warnings, or information from the SDK. Additionally, SDK v2.0 The QuickSight SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

By combining historical vehicle location data with information from other sources, the company can devise empirical approaches for better decision-making. For example, the company’s procurement team can use this information to make decisions about which vehicles to prioritize for replacement before policy changes go into effect.

Analytics

Analytics IoT Metadata Internet of Things

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

For Host , enter the Redshift Serverless endpoint’s host URL. For more information on how to connect to a database, refer to tDBConnection. The output component defines that the data being processed in the job’s workflow will land in Redshift Serverless. For Host , enter the Redshift Serverless endpoint’s host URL.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

A major risk is data exposure — AI systems must be designed to align with company ethics and meet strict regulatory standards without compromising functionality. Ensuring that AI systems prevent breaches of client confidentiality, personally identifiable information (PII), and data security is crucial for mitigating these risks.

Optimization

Optimization Experimentation Metrics Enterprise

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Discovery of workload and integrations Conducting discovery and assessment for migrating a large on-premises data warehouse to Amazon Redshift is a critical step in the migration process.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. Refer to Amazon EBS-optimized instance types for more information.

Analytics

Analytics Big Data Software Data Integration

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

For example, the Flink FileSystem connector has FileSystemTableFactory to read/write data in Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (Amazon S3), the Flink HBase connector has HBase2DynamicTableFactory to read/write data in HBase, and the Flink Kafka connector has KafkaDynamicTableFactory to read/write data in Kafka.

Data Lake

Data Lake Metadata Business Analysis Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Trials are short projects usually taking up to a several months; the output of a trial is a buy (or not-buy) decision if we detect information in the dataset that can help us in our investment process. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

Simply put, enterprises are increasingly seeking ways to take better advantage of their data and analytics to make data-informed decisions, strengthen the customer experience, and capitalize on cost-saving opportunities.

B2B

B2B Digital Transformation Marketing Data Processing

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Key Features: Extensive library of pre-built connectors for diverse data sources.

Dashboards

Dashboards Visualization Data mining Data-driven

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

A simple model to control access to data via a UI or SQL. Automatically tracking data lineage across queries executed in any language. An information scheme in the Lakehouse. … The Power of Partnership to Accelerate Data Transformation. Unity features include: Built-in search and discovery. and much more!

ROI

ROI Metadata Data Lake Digital Transformation

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Let’s take an example. The marketing team created leads based on the event in Adobe Marketo.

Sales

Sales Visualization Software Marketing

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The system ingests data from various sources such as cloud resources, cloud activity logs, and API access logs, and processes billions of messages, resulting in terabytes of data daily. This data is sent to Apache Kafka, which is hosted on Amazon Managed Streaming for Apache Kafka (Amazon MSK).

Data Lake

Data Lake Analytics Snapshot Data Quality

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

The Delta tables created by the EMR Serverless application are exposed through the AWS Glue Data Catalog and can be queried through Amazon Athena. Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Choose Create stack.

Data Lake

Data Lake Dashboards Metrics Metadata

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

Leaders are asking how they might use data to drive smarter decision making to support this new model and improve medical treatments that lead to better outcomes. Healthcare organizations need to manage and protect sensitive information in a consistent, secure, and organized way. More and more companies are handling such data.

Data Governance

Data Governance Measurement Data Quality Metrics

Alation Steps Up APAC Presence Following Strong Growth

Alation

MAY 11, 2023

Simply put, enterprises are increasingly seeking ways to take better advantage of their data and analytics to make data-informed decisions, strengthen the customer experience, and capitalize on cost-saving opportunities.

B2B

B2B Digital Transformation Marketing Data Processing

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Data literacy — Employees can interpret and analyze data to draw logical conclusions; they can also identify subject matter experts best equipped to educate on specific data assets. Data governance is a key use case of the modern data stack. Examples of data transformation tools include dbt and dataform.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

Having the right tools is essential for any successful data product manager focused on enterprise data transformation. When choosing the tools for a project, whether it be the CIO , CDO , or data product managers themselves, the buyers must see the big picture. They work hard to collect feedback and user data.

Management

Management Data-driven Visualization Strategy

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand.

Sales

Sales Data Warehouse Visualization Testing

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

But Barnett, who started work on a strategy in 2023, wanted to continue using Baptist Memorial’s on-premise data center for financial, security, and continuity reasons, so he and his team explored options that allowed for keeping that data center as part of the mix. It’s an exciting new way of finding information.”

IT

IT Insurance Cost-Benefit Testing

How to Aggregate Global Data from the Coronavirus Outbreak

Sisense

APRIL 10, 2020

As the rapid spread of COVID-19 continues, data managers around the world are pulling together a wide variety of global data sources to inform governments, the private sector, and the public with the latest on the spread of this disease. But this type of globally aggregated data doesn’t just appear all on its own.

Visualization

Visualization Reporting Data Processing Dashboards

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

It eliminates the need for third-party tools to ingest data into your OpenSearch service setup. You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination.

Dashboards

Dashboards Testing Metrics Optimization

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Aggregated views of information may come from a department, function, or entire organization. These systems are designed for people whose primary job is data analysis. The data may come from multiple systems or aggregated views, but the output is a centralized overview of information. Who Uses Embedded Analytics?

Analytics

Analytics Cost-Benefit Visualization Dashboards

CIOs are rethinking how they use public cloud services. Here’s why.

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Trending Sources

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Amazon Redshift data ingestion options

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Use Snowflake with Amazon MWAA to orchestrate data pipelines

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Use AWS Glue to streamline SFTP data processing

The importance of data ingestion and integration for enterprise AI

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Data Integrity, the Basis for Reliable Insights

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Enable data analytics with Talend and Amazon Redshift Serverless

Deploy and Scale AI Applications With Cloudera AI Inference Service

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Build a data lake with Apache Flink on Amazon EMR

How smava makes loans transparent and affordable using Amazon Redshift Serverless

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Alation Steps Up APAC Presence Following Strong Growth

Best BI Tools For 2024 You Need to Know

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Cross-account integration between SaaS platforms using Amazon AppFlow

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Empowering data mesh: The tools to deliver BI excellence

The Rising Need for Data Governance in Healthcare

Alation Steps Up APAC Presence Following Strong Growth

Exploring the AI and data capabilities of watsonx

The Modern Data Stack Explained: What The Future Holds

7 Things All Successful Data Product Managers Have In Common

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

CIO 100 Award winners drive business results with IT

How to Aggregate Global Data from the Coronavirus Outbreak

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

What Is Embedded Analytics?

Stay Connected