Blog, Data Transformation and Visualization

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Ali Tore, Senior Vice President of Advanced Analytics at Salesforce, highlighting the value of this integration, says “We’re excited to partner with Amazon to bring Tableau’s powerful data exploration and AI-driven analytics capabilities to customers managing data across organizational boundaries with Amazon DataZone.

Visualization

Visualization Data Lake Testing Data Governance

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and from third-party sources. Refer to the detailed blog post on how you can use this to connect through various other tools.

Analytics

Analytics Visualization Data Governance Data-driven

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. It can be used for something as visual as reducing traffic jams, to personalizing products and services, to improving the experience in multiplayer video games. This is something that you can learn more about in just about any technology blog.

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

We are excited to announce a new capability of the AWS Glue Studio visual editor that offers a new visual user experience. Now you can author data preparation transformations and edit them with the AWS Glue Studio visual editor. Within the new experience, you can choose from hundreds of prebuilt transformations.

Interactive

Interactive Visualization Data Integration Statistics

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. You can navigate to the projects Data page to visually verify the existence of the newly created table. option("url", jdbcurl).option("dbtable",

Visualization

Visualization Data Processing Testing Publishing

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Ontotext

NOVEMBER 18, 2021

And all of them are asking hard questions: “Can you integrate my data, with my particular format?”, “How well can you scale?”, “How many visualizations do you offer?”. Nowadays, data analytics doesn’t exist on its own. You have to take care of data extraction, transformation and loading, and of visualization.

Visualization

Visualization Reporting Metadata Enterprise

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

datapine

MAY 2, 2023

Financial efficiency: One of the key benefits of big data in supply chain and logistics management is the reduction of unnecessary costs. Using the right dashboard and data visualizations, it’s possible to hone in on any trends or patterns that uncover inefficiencies within your processes. Now’s the time to strike.

Big Data

Big Data Internet of Things Cost-Benefit Optimization

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

He/she assists the organization by providing clarity and insight into advanced data technology solutions. As quality issues are often highlighted with the use of dashboard software , the change manager plays an important role in the visualization of data quality. Here, it all comes down to the data transformation error rate.

Data Quality

Data Quality Metrics Data-driven Management

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. Typically users need to ingest data, transform it into optimal format with quality checks, and optimize querying of the data by visual analytics tool.

Management

Management Cost-Benefit Data Transformation Optimization

Unveiling the Top 10 Data Visualization Companies of 2024

FineReport

JUNE 7, 2024

In 2024, data visualization companies play a pivotal role in transforming complex data into captivating narratives. This blog provides an insightful exploration of the leading entities shaping the data visualization landscape.

Visualization

Visualization Predictive Analytics Dashboards Predictive Modeling

Migrate from Apache Solr to OpenSearch

AWS Big Data

JULY 18, 2024

The main driving factors include lower total cost of ownership, scalability, stability, improved ingestion connectors (such as Data Prepper , Fluent Bit, and OpenSearch Ingestion), elimination of external cluster managers like Zookeeper, enhanced reporting, and rich visualizations with OpenSearch Dashboards.

Dashboards

Dashboards Testing Data-driven Visualization

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

This allows business analysts and decision-makers to gain valuable insights, visualize key metrics, and explore the data in depth, enabling informed decision-making and strategic planning for pricing and promotional strategies. Open the secret blog-glue-snowflake-credentials. Under Secret value , choose Retrieve secret value.

Analytics

Analytics Data-driven Data Integration Data Lake

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Select Visual ETL in the central pane.

Data Processing

Data Processing Visualization Data Lake Data Processing

What is a DataOps Engineer?

DataKitchen

OCTOBER 5, 2021

DataOps establishes a process hub that automates data production and analytics development workflows so that the data team is more efficient, innovative and less prone to error. In this blog, we’ll explore the role of the DataOps Engineer in driving the data organization to higher levels of productivity.

Testing

Testing Dashboards Measurement Experimentation

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

AWS Glue , a serverless data integration and extract, transform, and load (ETL) service, has revolutionized this process, making it more accessible and efficient. AWS Glue eliminates complexities and costs, allowing organizations to perform data integration tasks in minutes, boosting efficiency. Customers can now use AWS Glue 4.0

Analytics

Analytics Visualization Data Integration Cost-Benefit

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

AWS Step Functions is a serverless orchestration service that enables developers to build visual workflows for applications as a series of event-driven steps. EMR Serverless automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use.

Big Data

Big Data Data-driven Management Visualization

Happy Birthday, CDP Public Cloud

Cloudera

OCTOBER 13, 2020

CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale. 3) Data Visualization is in Tech Preview on AWS and Azure. The post Happy Birthday, CDP Public Cloud appeared first on Cloudera Blog.

Data Warehouse

Data Warehouse Machine Learning Visualization Data Lake

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

Here are a few examples that we have seen of how this can be done: Batch ETL with Azure Data Factory and Azure Databricks: In this pattern, Azure Data Factory is used to orchestrate and schedule batch ETL processes. Azure Blob Storage serves as the data lake to store raw data. Azure Machine Learning).

Machine Learning

Machine Learning Cost-Benefit Data Transformation Testing

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

This blog post is co-written with James Sun from Snowflake. Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. Choose Airflow version 2.6.3. Choose Next.

Data Processing

Data Processing Management Publishing Visualization

Alteryx to Dataiku: AutoML

Dataiku

APRIL 24, 2024

In our last three blogs, we covered how Dataiku’s visual flow can help enhance collaboration and visibility, differences in how you work with datasets , and one of the key tools to accelerate data transformations: recipes. Welcome back to part four of the Alteryx to Dataiku series!

Visualization

Visualization Data Transformation

What is Data Lineage? Top 5 Benefits of Data Lineage

erwin

APRIL 29, 2020

These tools range from enterprise service bus (ESB) products, data integration tools; extract, transform and load (ETL) tools, procedural code, application program interfaces (API)s, file transfer protocol (FTP) processes, and even business intelligence (BI) reports that further aggregate and transform data.

Key Performance Indicator

Key Performance Indicator Metadata Data Governance Data Quality

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

It’s because it’s a hard thing to accomplish when there are so many teams, locales, data sources, pipelines, dependencies, data transformations, models, visualizations, tests, internal customers, and external customers.

Testing

Testing Data-driven Visualization Dashboards

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML). . b) Basic data transformation. Go to runner.py and run it.

Machine Learning

Machine Learning Statistics Dashboards Modeling

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

In addition, more data is becoming available for processing / enrichment of existing and new use cases e.g., recently we have experienced a rapid growth in data collection at the edge and an increase in availability of frameworks for processing that data. As a result, alternative data integration technologies (e.g.,

Data Processing

Data Processing Data Warehouse Enterprise Visualization

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. A reimagined visual editor to boost developer productivity and enable self service. Enabling self-service for developers.

Testing

Testing Cost-Benefit Interactive Visualization

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

AWS Big Data

OCTOBER 5, 2023

Amazon AppFlow , a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery. Select a data source: In Amazon AppFlow , select Google BigQuery as your data source.

Data Warehouse

Data Warehouse Machine Learning Data Integration Data-driven

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

AUGUST 8, 2022

The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse ( CDW ), Cloudera Data Engineering ( CDE ), and Cloudera Machine Learning ( CML ). In this first blog, we shared with you how to use Apache Iceberg in Cloudera Data Platform to build an open lakehouse.

Snapshot

Snapshot Data Warehouse Machine Learning Cost-Benefit

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

AWS Big Data

MAY 16, 2023

This blog post is co-written with Steve Alexander at PG&E. Data collection and processing are handled by a third-party smart sensor manufacturer application residing in Amazon Virtual Private Cloud (Amazon VPC) private subnets behind a Network Load Balancer.

Dashboards

Dashboards Statistics Data Collection Business Intelligence

How to Include BI in Your 2020 Budget

Sisense

DECEMBER 12, 2019

Building a data-driven business includes choosing the right software and implementing best practices around its use. Every year when budget time rolls around, many organizations find themselves asking the same question: “what are we going to do about our data?” This is a summary article. New year, same questions.

Business Intelligence

Business Intelligence Software Data-driven Visualization

Harnessing Streaming Data: Insights at the Speed of Life

Sisense

OCTOBER 15, 2020

We live in a world of data: There’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways data teams are tackling the challenges of this new world to help their companies and their customers thrive. billion market by 2025.

Dashboards

Dashboards IoT Optimization Internet of Things

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Cloudera

OCTOBER 14, 2020

Also, such a concept helps admin to visualize the jobs which are scheduled for debugging purposes. YuniKorn, thus empowers Apache Spark to become an enterprise-grade essential platform for users, offering a robust platform for a variety of applications ranging from large scale data transformation to analytics to machine learning.

Machine Learning

Machine Learning Management Big Data Optimization

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Provision resources with AWS CloudFormation For the initial setup, you launch an AWS CloudFormation stack to create an S3 bucket to store data, IAM roles for data access, and the AWS Glue crawler and Data Catalog components. This will enable both the CDC steps and the data transformation steps for the Jira data.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Self-Service Data’s New Frontier: The Data Catalog

Alation

FEBRUARY 20, 2020

For ease of understanding the differences between all of the them Rita shared this visual, categorizing the vendors: So at least for now, it looks like we’re a self-service data prep vendor, which makes sense. Alation helps analysts find, understand and use their data. Subscribe to Alation's Blog. I hope to see you there!

Scorecard

Scorecard ROI Data-driven Visualization

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

We just announced the general availability of Cloudera DataFlow Designer , bringing self-service data flow development to all CDP Public Cloud customers. In our previous DataFlow Designer blog post , we introduced you to the new user interface and highlighted its key capabilities.

Testing

Testing Publishing Metadata Interactive

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

In this blog, I will cover: What is watsonx.ai? Capabilities within the Prompt Lab include: Summarize: Transform text with domain-specific content into personalized overviews and capture key points (e.g., What capabilities are included in watsonx.ai? What is watsonx.data? What capabilities are included in watsonx.data?

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

Business Intelligence Tools: Business intelligence (BI) tools are used to visualize your data. You should pick those that allow for easy integration and can create beautiful data visualizations. These help data analysts visualize key insights that can help you make better data-backed decisions.

Data Warehouse

Data Warehouse Cost-Benefit Data Science Data Transformation

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

AWS Big Data

SEPTEMBER 13, 2024

The bulk of our data scientists are heavy users of Jupyter Notebook. Jupyter notebooks are interactive computing environments that allow users to create and share documents containing live code, equations, visualizations, and narrative text.

Interactive

Interactive Strategy Cost-Benefit Data Governance

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. Curious to learn how the data catalog can power your data strategy?

Dashboards

Dashboards Metrics Sales Reporting

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

We create the insert_orders_fact_tbl AWS Glue job manually using AWS Glue Visual Studio. You will see the message “Successfully connected to the data store with connection blog-redshift-connection.” Under Data Catalog in the navigation pane, choose Crawlers. Select Visual with a blank canvas , then choose Create.

Sales

Sales Data Warehouse Visualization Testing

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

For addressing data quality challenges in Amazon Simple Storage Service (Amazon S3) data lakes and data pipelines, AWS has announced AWS Glue Data Quality (preview). Stored procedures Stored procedures are commonly used to encapsulate logic for data transformation, data validation, and business-specific logic.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Transforming Big Data into Actionable Intelligence

Sisense

MARCH 14, 2021

Attempting to learn more about the role of big data (here taken to datasets of high volume, velocity, and variety) within business intelligence today, can sometimes create more confusion than it alleviates, as vital terms are used interchangeably instead of distinctly.

Big Data

Big Data IoT Data Warehouse Data-driven

7 Things All Successful Data Product Managers Have In Common

Alation

FEBRUARY 2, 2023

Many are subject matter experts for a particular kind of data, which enables them to spot anomalies in that data quickly, understand the root cause, and resolve the issue. Image Sourced from theproductmanager.com Further, they can quickly create helpful visualizations from the data they analyze. Here is her LinkedIn.

Management

Management Data-driven Visualization Strategy

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

You can’t talk about data analytics without talking about data modeling. The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable.

Modeling

Modeling Big Data IoT Data Warehouse

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 For Name , enter emr-delta-blog. For Type , choose Spark.

Data Lake

Data Lake Dashboards Metrics Metadata

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

Streamline AI-driven analytics with governance: Integrating Tableau with Amazon DataZone

Webinars

Trending Sources

Biggest Trends in Data Visualization Taking Shape in 2022

Webinars

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

10 Examples of How Big Data in Logistics Can Transform The Supply Chain

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Unveiling the Top 10 Data Visualization Companies of 2024

Migrate from Apache Solr to OpenSearch

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

Use AWS Glue to streamline SFTP data processing

What is a DataOps Engineer?

Unlock scalable analytics with AWS Glue and Google BigQuery

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

Happy Birthday, CDP Public Cloud

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Alteryx to Dataiku: AutoML

What is Data Lineage? Top 5 Benefits of Data Lineage

DataOps Observability: Taming the Chaos (Part 2)

Using COD and CML to build applications that predict stock data

Addressing the Three Scalability Challenges in Modern Data Platforms

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Simplify data transfer: Google BigQuery to Amazon S3 using Amazon AppFlow

How to Use Apache Iceberg in CDP’s Open Lakehouse

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

How to Include BI in Your 2020 Budget

Harnessing Streaming Data: Insights at the Speed of Life

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Self-Service Data’s New Frontier: The Data Catalog

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Exploring the AI and data capabilities of watsonx

The Modern Data Stack Explained: What The Future Holds

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Transforming Big Data into Actionable Intelligence

7 Things All Successful Data Product Managers Have In Common

Building Better Data Models to Unlock Next-Level Intelligence

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift