Big Data, Data Processing and Data Transformation

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Table of Contents 1) Benefits Of Big Data In Logistics 2) 10 Big Data In Logistics Use Cases Big data is revolutionizing many fields of business, and logistics analytics is no exception. The complex and ever-evolving nature of logistics makes it an essential use case for big data applications.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements. In this post, we discuss how Amazon Q data integration transforms ETL workflow development.

Data Integration

Data Integration Visualization Data Processing Big Data

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. On your project, in the navigation pane, choose Data. For Add data source , choose Add connection. Choose the plus sign.

Visualization

Visualization Data Processing Testing Publishing

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

In the Driver Properties section, enter the parameters that you captured from Amazon DataZone: CredentialsProvider : The credentials provider to authenticate requests to AWS DataZoneDomainId : The ID of your Amazon DataZone domain DataZoneDomainRegion : The AWS Region where your domain is hosted. Lionel Pulickal is Sr.

Visualization

Visualization Data Lake Testing Data Governance

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Choose Create.

Data Warehouse

Data Warehouse Analytics Testing Sales

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau. With a unified catalog, enhanced analytics capabilities, and efficient data transformation processes, were laying the groundwork for future growth. She can reached via LinkedIn.

IoT

IoT Machine Learning Metadata Data-driven

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

With quality data at their disposal, organizations can form data warehouses for the purposes of examining trends and establishing future-facing strategies. Industry-wide, the positive ROI on quality data is well understood. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g.,

Data Quality

Data Quality Metrics Data-driven Management

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

AWS Big Data

AUGUST 8, 2024

Oracle GoldenGate for Oracle Database and Big Data adapters Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. Configure GoldenGate for Oracle Database and extract data from the Oracle database to trail files.

Analytics

Analytics Big Data Software Data Integration

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. Refer to Editing AWS Glue managed data transform nodes for more information.

Analytics

Analytics Data-driven Data Integration Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

Access to an SFTP server with permissions to upload and download data. If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2) , we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram.

Data Processing

Data Processing Visualization Data Lake Data Processing

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

To run HiveQL-based data workloads with Spark on Kubernetes mode, engineers must embed their SQL queries into programmatic code such as PySpark, which requires additional effort to manually change code. host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password')

Big Data

Big Data Data Processing Interactive Testing

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product.

Metadata

Metadata Data Governance Data Quality Data-driven

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

To create the connection string, the Snowflake host and account name is required. Using the worksheet, run the following SQL commands to find the host and account name. The account, host, user, password, and warehouse can differ based on your setup. Choose Next. For Secret name , enter airflow/connections/snowflake_accountadmin.

Data Processing

Data Processing Management Publishing Visualization

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

watsonx.data is truly open and interoperable The solution leverages not just open-source technologies, but those with open-source project governance and diverse communities of users and contributors, like Apache Iceberg and Presto, hosted by the Linux Foundation.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The Orca Platform is powered by a state-of-the-art anomaly detection system that uses cutting-edge ML algorithms and big data capabilities to detect potential security threats and alert customers in real time, ensuring maximum security for their cloud environment. This ensures that the data is suitable for training purposes.

Data Lake

Data Lake Analytics Snapshot Data Quality

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Data ingestion must be done properly from the start, as mishandling it can lead to a host of new issues. The groundwork of training data in an AI model is comparable to piloting an airplane. ELT tools such as IBM® DataStage® facilitate fast and secure transformations through parallel processing engines.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

Solution overview Typically, you have multiple accounts to manage and provision resources for your data pipeline. Every time the business requirement changes (such as adding data sources or changing data transformation logic), you make changes on the AWS Glue app stack and re-provision the stack to reflect your changes.

Data Integration

Data Integration Snapshot Testing Visualization

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Data engineers are crucial for schema conversion and data transformation, and DBAs can handle cluster configuration and workload monitoring. Platform architects define a well-architected platform.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

Solution overview The following diagram illustrates the solution architecture: The solution uses AWS Glue as an ETL engine to extract data from the source Amazon RDS database. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. See JDBC connections for further details.

Visualization

Visualization Metadata Data Transformation Testing

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products from the Business Vault and Data Mart stages are now available for consumers. smava decided to use Tableau for business intelligence, data visualization, and further analytics. The data transformations are managed with dbt to simplify the workflow governance and team collaboration.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Self-Service Data’s New Frontier: The Data Catalog

Alation

FEBRUARY 20, 2020

REFLECTIONS FROM THE GARTNER BI & ANALYTICS SUMMIT I hate to admit that the last time I attended the Gartner BI & Analytics Summit, Howard Dresner was still the host. Alation helps analysts find, understand and use their data. Everything you need to do to prepare for analysis before data transformation and visualization.

Scorecard

Scorecard ROI Data-driven Visualization

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. Query the data using Athena Athena is a serverless, interactive analytics service built to analyze unstructured, semi-structured, and structured data where it is hosted.

Analytics

Analytics IoT Metadata Internet of Things

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

AWS Big Data

MARCH 9, 2023

The following eventNames and eventCodes are returned as part of the onChange callback when there is a change in the SDK code status. append('Unable to load Dashboard at this time.'); break; } } } } Monitor interactions in embedded dashboards Another callback supported by SDK v2.0

Slice and Dice

Slice and Dice Dashboards Analytics Interactive

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

For Host , enter the Redshift Serverless endpoint’s host URL. As well as Talend Cloud for enterprise-level data transformation needs, you could also use Talend Stitch to handle data ingestion and data replication to Redshift Serverless. For Host , enter the Redshift Serverless endpoint’s host URL.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Extract, load, Transform (ELT) tools. Data ingestion/integration services. Data orchestration tools. These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? Reverse ETL tools.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

You can then apply transformations and store data in Delta format for managing inserts, updates, and deletes. Amazon EMR Serverless is a serverless option in Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.

Data Lake

Data Lake Dashboards Metrics Metadata

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Let’s take an example. The marketing team created leads based on the event in Adobe Marketo.

Sales

Sales Visualization Software Metadata

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The Amazon EMR Flink CDC connector reads the binlog data and processes the data. Transformed data can be stored in Amazon S3. We use the AWS Glue Data Catalog to store the metadata such as table schema and table location. the Flink table API/SQL can integrate with the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

It uses not just open-source technologies, but those with open governance and broad and diverse communities of users and contributors, like Apache Iceberg and Presto which is hosted by the Linux Foundation.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand.

Sales

Sales Data Warehouse Visualization Testing

Supercharging Your Digital Transformation with Embedded Analytics

Sisense

FEBRUARY 26, 2020

We all know that data is becoming more and more essential for businesses, as the volume of data keeps growing. Dresner reported that nearly 97% of respondents in their Big Data Analytics Market Study consider Big Data to be either important or critical to their businesses.

Digital Transformation

Digital Transformation Analytics Big Data Data-driven

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

Hybrid big data analytics with Amazon EMR on AWS Outposts

AWS Big Data

JANUARY 29, 2025

Amazon EMR has long been the leading solution for processing big data in the cloud. Amazon EMR is the industry-leading big data solution for petabyte-scale data processing, interactive analytics, and machine learning using over 20 open source frameworks such as Apache Hadoop , Hive, and Apache Spark.

Big Data

Big Data Data Analytics Analytics Interactive

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Ideally, your primary data source should belong in this group. Modern Data Sources Painlessly connect with modern data such as streaming, search, big data, NoSQL, cloud, document-based sources. Quickly link all your data from Amazon Redshift, MongoDB, Hadoop, Snowflake, Apache Solr, Elasticsearch, Impala, and more.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

AWS Big Data

FEBRUARY 25, 2025

Amazon EC2 to host and run a Jenkins build server. Solution walkthrough The solution architecture is shown in the preceding figure and includes: Continuous integration and delivery ( CI/CD) for data processing Data engineers can define the underlying data processing job within a JSON template.

Data Processing

Data Processing Machine Learning Data-driven Cost-Benefit

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Webinars

Trending Sources

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Webinars

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

How EUROGATE established a data mesh architecture using Amazon DataZone

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Use AWS Glue to streamline SFTP data processing

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Addressing the Three Scalability Challenges in Modern Data Platforms

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Amazon Redshift data ingestion options

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Use Snowflake with Amazon MWAA to orchestrate data pipelines

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The importance of data ingestion and integration for enterprise AI

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Self-Service Data’s New Frontier: The Data Catalog

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Enhance your analytics embedding experience with the new Amazon QuickSight JavaScript SDK

Enable data analytics with Talend and Amazon Redshift Serverless

The Modern Data Stack Explained: What The Future Holds

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Cross-account integration between SaaS platforms using Amazon AppFlow

Build a data lake with Apache Flink on Amazon EMR

Exploring the AI and data capabilities of watsonx

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Supercharging Your Digital Transformation with Embedded Analytics

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

Hybrid big data analytics with Amazon EMR on AWS Outposts

What Is Embedded Analytics?

Building and operating data pipelines at scale using CI/CD, Amazon MWAA and Apache Spark on Amazon EMR by Wipro

Stay Connected