Data Processing, Data Transformation and Machine Learning

Data Processing

Data Transformation

Machine Learning

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Think about what the model results tell you: “Maybe a random forest isn’t the best tool to split this data, but XLNet is.” ” If none of your models performed well, that tells you that your dataset–your choice of raw data, feature selection, and feature engineering–is not amenable to machine learning.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The following requirements were essential to decide for adopting a modern data mesh architecture: Domain-oriented ownership and data-as-a-product : EUROGATE aims to: Enable scalable and straightforward data sharing across organizational boundaries. Eliminate centralized bottlenecks and complex data pipelines.

IoT

IoT Machine Learning Metadata Data-driven

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. Id be cautious about going down the path of private cloud hosting or on premises, says Nag.

Data Processing

Data Processing Optimization Modeling Enterprise

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

Third, some services require you to set up and manage compute resources used for federated connectivity, and capabilities like connection testing and data preview arent available in all services. To solve for these challenges, we launched Amazon SageMaker Lakehouse unified data connectivity. For Add data source , choose Add connection.

Visualization

Visualization Data Processing Testing Publishing

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Within seconds of transactional data being written into Amazon Aurora (a fully managed modern relational database service offering performance and high availability at scale), the data is seamlessly made available in Amazon Redshift for analytics and machine learning. Enter a name for your repo, such as dbt-zeroetl.

Data Warehouse

Data Warehouse Analytics Testing Sales

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product. Her special areas of interest are data analytics, machine learning/AI, and application modernization.

Metadata

Metadata Data Governance Data Quality Data-driven

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

By using AWS Glue to integrate data from Snowflake, Amazon S3, and SaaS applications, organizations can unlock new opportunities in generative artificial intelligence (AI) , machine learning (ML) , business intelligence (BI) , and self-service analytics or feed data to underlying applications.

Analytics

Analytics Data-driven Data Integration Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Quality

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, data pipelines include more and more stages, thus making it difficult for data engineers to compile, manage, and troubleshoot those analytical workloads. As a result, alternative data integration technologies (e.g., Limited flexibility to use more complex hosting models (e.g., public, private, hybrid cloud)?

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password') username') echo -e "n host: $HOST_NAMEn DB: $DB_NAMEn passowrd: $PASSWORDn username: $USER_NAMEn" After connected through Session Manager, query the Hive metastore from your Amazon EMR master node.

Big Data

Big Data Data Processing Interactive Testing

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives. Others also list data initiatives as a top issue for CIOs.

IT Digital Transformation Internet of Things Strategy

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

A lakehouse should make it easy to combine new data from a variety of different sources, with mission critical data about customers and transactions that reside in existing repositories. New insights are found in the combination of new data with existing data, and the identification of new relationships.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.

Optimization

Optimization Experimentation Metrics Enterprise

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

He is passionate about shaping the future of infusing data-rich experiences into products and applications we use every day. Rohit brings a wealth of experience in analytics and machine learning from having worked with leading data companies, and their customers.

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

One key component that plays a central role in modern data architectures is the data lake, which allows organizations to store and analyze large amounts of data in a cost-effective manner and run advanced analytics and machine learning (ML) at scale. This ensures that the data is suitable for training purposes.

Data Lake

Data Lake Analytics Snapshot Data Quality

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver fast performance for even the most demanding and unpredictable workloads, and you pay only for what you use. For Host , enter the Redshift Serverless endpoint’s host URL. For Port , enter 5349. This is optional.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Solution overview Typically, you have multiple accounts to manage and provision resources for your data pipeline. Every time the business requirement changes (such as adding data sources or changing data transformation logic), you make changes on the AWS Glue app stack and re-provision the stack to reflect your changes.

Data Integration

Data Integration Snapshot Testing Visualization

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. Automated development: Automates data preparation, model development, feature engineering and hyperparameter optimization using AutoAI.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Step Functions is used to orchestrate the data processing. Let’s take an example.

Sales

Sales Visualization Software Metadata

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Key Features: Extensive library of pre-built connectors for diverse data sources.

Dashboards

Dashboards Visualization Data mining Data-driven

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Migrating to Amazon Redshift offers organizations the potential for improved price-performance, enhanced data processing, faster query response times, and better integration with technologies such as machine learning (ML) and artificial intelligence (AI). Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

They are knowledgeable about the different tools and technologies used in the industry, such as programming languages, data processing frameworks, machine learning libraries, databases, and analytics platforms for call history. They know how to choose the best tools for the job and use them effectively.

Management

Management Data-driven Visualization Strategy

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand.

Sales

Sales Data Warehouse Visualization Testing

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

To counter bad actors, TCS decided to deploy automation, artificial intelligence, and machine learning resulting in a more sophisticated, AI-assisted enterprise defense. Options included hosting a secondary data center, outsourcing business continuity to a vendor, and establishing private cloud solutions.

IT Insurance Cost-Benefit Testing

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Strategic Objective Create a complete, user-friendly view of the data by preparing it for analysis. Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

Amazon EC2 to host and run a Jenkins build server. Solution walkthrough The solution architecture is shown in the preceding figure and includes: Continuous integration and delivery ( CI/CD) for data processing Data engineers can define the underlying data processing job within a JSON template.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

Amazon EMR has long been the leading solution for processing big data in the cloud. Amazon EMR is the industry-leading big data solution for petabyte-scale data processing, interactive analytics, and machine learning using over 20 open source frameworks such as Apache Hadoop , Hive, and Apache Spark.

Big Data

Big Data Data Analytics Analytics Interactive

Data Leaders Brief

Automating the Automators: Shift Change in the Robot Factory

How EUROGATE established a data mesh architecture using Amazon DataZone

Webinars

Trending Sources

CIOs are rethinking how they use public cloud services. Here’s why.

Webinars

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Use AWS Glue to streamline SFTP data processing

Amazon Redshift data ingestion options

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Addressing the Three Scalability Challenges in Modern Data Platforms

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

The 10 biggest issues IT faces today

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Deploy and Scale AI Applications With Cloudera AI Inference Service

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Enable data analytics with Talend and Amazon Redshift Serverless

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Exploring the AI and data capabilities of watsonx

Cross-account integration between SaaS platforms using Amazon AppFlow

Best BI Tools For 2024 You Need to Know

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

7 Things All Successful Data Product Managers Have In Common

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

CIO 100 Award winners drive business results with IT

What is Data Mapping?

What Is Embedded Analytics?

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Hybrid big data analytics with Amazon EMR on AWS Outposts

Stay Connected