Data Processing, Data Transformation and Modeling

CIOs are rethinking how they use public cloud services. Here’s why.

CIO Business Intelligence

JANUARY 8, 2025

Theres a renewed focus on on-premises, on-premises private cloud, or hosted private cloud versus public cloud, especially as data-heavy workloads such as generative AI have started to push cloud spend up astronomically, adds Woo. Organizations dont have much choice when it comes to using the larger foundation models such as ChatGPT 3.5

Data Processing

Data Processing Optimization Modeling Enterprise

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

Given that, what would you say is the job of a data scientist (or ML engineer, or any other such title)? Building Models. A common task for a data scientist is to build a predictive model. You know the drill: pull some data, carve it up into features, feed it into one of scikit-learn’s various algorithms.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Together with price-performance, Amazon Redshift offers capabilities such as serverless architecture, machine learning integration within your data warehouse and secure data sharing across the organization. dbt Cloud is a hosted service that helps data teams productionize dbt deployments. Create dbt models in dbt Cloud.

Data Warehouse

Data Warehouse Analytics Testing Sales

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. To achieve this, EUROGATE designed an architecture that uses Amazon DataZone to publish specific digital twin data sets, enabling access to them with SageMaker in a separate AWS account.

IoT

IoT Machine Learning Metadata Data-driven

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

In this regard, the enterprise data product catalog acts as a federated portal, facilitating cross-domain access and interoperability while maintaining alignment with governance principles. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Metadata

Metadata Data Governance Data Quality Data-driven

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Business/Data Analyst: The business analyst is all about the “meat and potatoes” of the business. These needs are then quantified into data models for acquisition and delivery. This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling.

Data Quality

Data Quality Metrics Data-driven Management

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Big data enables automated systems by intelligently routing many data sets and data streams. In a recent move towards a more autonomous logistical future, Amazon has launched an upgraded model of its highly-successful KIVA robots. Use our 14-days free trial today & transform your supply chain!

Big Data

Big Data Internet of Things Cost-Benefit Optimization

The importance of data ingestion and integration for enterprise AI

IBM Big Data Hub

JANUARY 9, 2024

Companies still often accept the risk of using internal data when exploring large language models (LLMs) because this contextual data is what enables LLMs to change from general-purpose to domain-specific knowledge. In the generative AI or traditional AI development cycle, data ingestion serves as the entry point.

Enterprise

Enterprise Data Integration Data Quality Contextual Data

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In legacy analytical systems such as enterprise data warehouses, the scalability challenges of a system were primarily associated with computational scalability, i.e., the ability of a data platform to handle larger volumes of data in an agile and cost-efficient way. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Amazon Redshift data ingestion options

AWS Big Data

SEPTEMBER 5, 2024

The currently available choices include: The Amazon Redshift COPY command can load data from Amazon Simple Storage Service (Amazon S3), Amazon EMR , Amazon DynamoDB , or remote hosts over SSH. This native feature of Amazon Redshift uses massive parallel processing (MPP) to load objects directly from data sources into Redshift tables.

IoT

IoT Data Warehouse Cost-Benefit Reporting

Deploy and Scale AI Applications With Cloudera AI Inference Service

Cloudera

OCTOBER 8, 2024

This service supports a range of optimized AI models, enabling seamless and scalable AI inference. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. By 2023, the focus shifted towards experimentation.

Optimization

Optimization Experimentation Metrics Enterprise

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This involves creating VPC endpoints in both the AWS and Snowflake VPCs, making sure data transfer remains within the AWS network. Use Amazon Route 53 to create a private hosted zone that resolves the Snowflake endpoint within your VPC. Refer to Editing AWS Glue managed data transform nodes for more information.

Analytics

Analytics Data-driven Data Integration Data Lake

The 10 biggest issues IT faces today

CIO Business Intelligence

JUNE 13, 2022

According to Evanta’s 2022 CIO Leadership Perspectives study, CIOs’ second top priority within the IT function is around data and analytics, with CIOs seeing advancing organizational use of data as key to reaching enterprise objectives. Angel-Johnson shares that perspective. “I Colisto says he’s seizing on those opportunities. “IT

IT

IT Digital Transformation Internet of Things Strategy

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

watsonx.data is truly open and interoperable The solution leverages not just open-source technologies, but those with open-source project governance and diverse communities of users and contributors, like Apache Iceberg and Presto, hosted by the Linux Foundation. This provides further opportunities for cost optimization.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

AWS Big Data

AUGUST 26, 2024

In this post, I’ll walk you through how to copy data from one Amazon Relational Database Service (Amazon RDS) for PostgreSQL database to another, while scrubbing PII along the way using AWS Glue. Built-in data transformations then scrub columns containing PII using pre-defined masking functions. PII detection and scrubbing.

Visualization

Visualization Metadata Data Transformation Testing

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver fast performance for even the most demanding and unpredictable workloads, and you pay only for what you use. For Host , enter the Redshift Serverless endpoint’s host URL. For Port , enter 5349. This is optional.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

The AWS pay-as-you-go model and the constant pace of innovation in data processing technologies enable CFM to maintain agility and facilitate a steady cadence of trials and experimentation. In this post, we share how we built a well-governed and scalable data engineering platform using Amazon EMR for financial features generation.

Interactive

Interactive Strategy Cost-Benefit Data Governance

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can modify the Lambda function to fetch additional vehicle information from a separate data store (for example, a DynamoDB table or a Customer Relationship Management system) to enrich the data, before storing the results in an S3 bucket. In this model, the Lambda function is invoked for each incoming event.

Analytics

Analytics IoT Metadata Internet of Things

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

is our enterprise-ready next-generation studio for AI builders, bringing together traditional machine learning (ML) and new generative AI capabilities powered by foundation models. With watsonx.ai, businesses can effectively train, validate, tune and deploy AI models with confidence and at scale across their enterprise. IBM watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Best BI Tools For 2024 You Need to Know

FineReport

MARCH 31, 2024

Furthermore, these tools boast customization options, allowing users to tailor data sources to address areas critical to their business success, thereby generating actionable insights and customizable reports. Best BI Tools for Data Analysts 3.1 Cost-effective pricing and comprehensive supporting services, maximizing value.

Dashboards

Dashboards Visualization Data mining Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

For the downstream consumption by all departments across the organization, smava’s Data Platform team prepares curated data products following the extract, load, and transform (ELT) pattern. The Raw Vault describes objects loaded directly from the data sources and represents a copy of the landing stage in the data lake.

Data Lake

Data Lake Data Warehouse Data-driven B2B

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

In this post, we assume the following three accounts: Pipeline account – This hosts the end-to-end pipeline Dev account – This hosts the integration pipeline in the development environment Prod account – This hosts the data integration pipeline in the production environment If you want, you can use the same account and the same Region for all three.

Data Integration

Data Integration Snapshot Testing Visualization

Empowering data mesh: The tools to deliver BI excellence

erwin

APRIL 16, 2024

In this blog, we’ll delve into the critical role of governance and data modeling tools in supporting a seamless data mesh implementation and explore how erwin tools can be used in that role. erwin also provides data governance, metadata management and data lineage software called erwin Data Intelligence by Quest.

Metadata

Metadata Data Quality Data Governance Modeling

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

And the highlight, for us data intelligence folks, was the Databricks’ announcement that Unity Catalog , its unified governance solution for all data assets on its Lakehouse platform, will soon be available on AWS and Azure in the upcoming weeks. A simple model to control access to data via a UI or SQL. and much more!

ROI

ROI Metadata Data Lake Digital Transformation

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

The system ingests data from various sources such as cloud resources, cloud activity logs, and API access logs, and processes billions of messages, resulting in terabytes of data daily. This data is sent to Apache Kafka, which is hosted on Amazon Managed Streaming for Apache Kafka (Amazon MSK).

Data Lake

Data Lake Analytics Snapshot Data Quality

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

However, you might face significant challenges when planning for a large-scale data warehouse migration. Trace the flow of data from its origins in the source systems, through the data warehouse, and ultimately to its consumption by reporting, analytics, and other downstream processes.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. Let’s take an example. The marketing team created leads based on the event in Adobe Marketo.

Sales

Sales Visualization Software Metadata

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

As data science is growing in popularity and importance , if your organization uses data science, you’ll need to pay more attention to picking the right tools for this. An example of a data science tool is Dataiku. Business Intelligence Tools: Business intelligence (BI) tools are used to visualize your data.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

Healthcare is changing, and it all comes down to data. Leaders in healthcare seek to improve patient outcomes, meet changing business models (including value-based care ), and ensure compliance while creating better experiences. Data & analytics represents a major opportunity to tackle these challenges.

Data Governance

Data Governance Measurement Data Quality Metrics

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

This project represents a transformative initiative designed to address the evolving landscape of cyber threats,” says Kunal Krushev, head of cybersecurity automation and intelligence with the firm’s Corporate IT — Digital Infrastructure Services. “We The system complements preconfigured components, workflows, and libraries.

IT

IT Insurance Cost-Benefit Testing

How to Aggregate Global Data from the Coronavirus Outbreak

Sisense

APRIL 10, 2020

In this article, we discuss how this data is accessed, an example environment and set-up to be used for data processing, sample lines of Python code to show the simplicity of data transformations using Pandas and how this simple architecture can enable you to unlock new insights from this data yourself.

Visualization

Visualization Reporting Data Processing Dashboards

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

When you start the process of designing your data model for Amazon Keyspaces, it’s essential to possess a comprehensive understanding of your access patterns, similar to the approach used in other NoSQL databases. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery.

Dashboards

Dashboards Testing Metrics Optimization

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

These licensing terms are critical: Perpetual license vs subscription: Subscription is a pay-as-you-go model that provides flexibility as you evaluate a vendor. Pricing model: The pricing scale is dependent on several factors. Some cloud applications can even provide new benchmarks based on customer data.

Analytics

Analytics Cost-Benefit Visualization Dashboards

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping helps standardize, visualize, and understand data across different systems and applications.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

Jet Global

AUGUST 24, 2023

Although many companies run their own on-premises servers to maintain IT infrastructure, nearly half of organizations already store data on the public cloud. The Harvard Business Review study finds that 88% of organizations that already have a hybrid model in place see themselves maintaining the same strategy into the future.

Finance

Finance Reporting Data Integration Data Warehouse

CIOs are rethinking how they use public cloud services. Here’s why.

Automating the Automators: Shift Change in the Robot Factory

Webinars

Trending Sources

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Webinars

How EUROGATE established a data mesh architecture using Amazon DataZone

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

The importance of data ingestion and integration for enterprise AI

Addressing the Three Scalability Challenges in Modern Data Platforms

Amazon Redshift data ingestion options

Deploy and Scale AI Applications With Cloudera AI Inference Service

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

The 10 biggest issues IT faces today

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio

Enable data analytics with Talend and Amazon Redshift Serverless

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Exploring the AI and data capabilities of watsonx

Best BI Tools For 2024 You Need to Know

How smava makes loans transparent and affordable using Amazon Redshift Serverless

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Empowering data mesh: The tools to deliver BI excellence

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

Cross-account integration between SaaS platforms using Amazon AppFlow

The Modern Data Stack Explained: What The Future Holds

The Rising Need for Data Governance in Healthcare

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

CIO 100 Award winners drive business results with IT

How to Aggregate Global Data from the Coronavirus Outbreak

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

What Is Embedded Analytics?

What is Data Mapping?

Unified Data Clears the Roadblocks of Your Hybrid Cloud Journey

Stay Connected