Blog - Data Leaders Brief

2021 Data/AI Salary Survey

O'Reilly on Data

SEPTEMBER 15, 2021

64% of the respondents took part in training or obtained certifications in the past year, and 31% reported spending over 100 hours in training programs, ranging from formal graduate degrees to reading blog posts. The tools category includes tools for building and maintaining data pipelines, like Kafka. Salaries by Programming Language.

Machine Learning

Machine Learning Statistics Reporting Consulting

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. Because it is such a new category, both overly narrow and overly broad definitions of DataOps abound. Studio.ML — A model management framework written in Python to help simplify and expedite your model-building experience.

Testing

Testing Machine Learning Consulting Data Science

New Format for The Bar Chart Reference Page

The Data Visualisation Catalogue

DECEMBER 13, 2021

First, locate the value scale axis and the category axis, to identify what is being visualised. Each category is assigned its own bar and the length of each bar is proportional to the value it represents. Colour-coding can be assigned to the bars to distinguish each category in the dataset. ” in each category.

Visualization

Visualization Experimentation Publishing Interactive

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Chart Snapshot: Dot Distribution Plots

The Data Visualisation Catalogue

JULY 14, 2024

A Dot Distribution Plot visualises the data distribution across multiple categories by plotting dots along an axis. There are two variations of Dot Distribution Plot: first, the kind that plots a series of dots to compare the distributions between various categories across a single dimension.

Snapshot

Snapshot Visualization Data-driven Reporting

Data Science Tools: Understanding the Multiverse

Domino Data Lab

JULY 15, 2021

While there are certainly engineers and scientists who may be entrenched in one camp or another (the R camp vs. Python, for example, or SAS vs. MATLAB), there has been a growing trend towards dispersion of data science tools. Key categories of tools and a few examples include: Data Sources. They range from flat files (e.g.

Data Science

Data Science Visualization Enterprise Modeling

Chart Snapshot: Mosaic Cartograms

The Data Visualisation Catalogue

JUNE 19, 2024

Colours can be assigned to the tiles in a Mosaic Cartogram to distinguish geographical regions, represent categories, or visualise an additional numerical variable. Tool to generate a Mosaic Cartogram: R / Python / D3.js Hence, the number of tiles assigned to a region is proportional to the data value assigned to that region.

Snapshot

Snapshot Visualization

Chart Snapshot: Jitter Plots

The Data Visualisation Catalogue

JULY 21, 2024

The function of a Jitter Plot is to visualise the data distribution across multiple categories by plotting dots along a value axis. The dots in a Jitter Plot can use colour coding to distinguish categories apart or visualise an additional variable. How To Make Stripplot with Jitter in Altair Python?

Snapshot

Snapshot Visualization IT

Chart Snapshot: Hex Cartograms

The Data Visualisation Catalogue

JUNE 22, 2024

Colours can be assigned to the hexagonal tiles in a Hex Cartogram to distinguish geographical regions, represent categories, or visualise an additional numerical variable. Hence, the number of hexagonal tiles assigned to a region is proportional to the data value assigned to that region.

Snapshot

Snapshot Interactive IT Visualization

Chart Snapshot: 100% Stacked Area Graphs

The Data Visualisation Catalogue

FEBRUARY 5, 2024

The X-axis is used for the time scale, which makes this chart ideal for showing the changing overall percentages of categories over time. The data series for each category is colour-coded, which helps to illustrate a part-to-whole relationship. One solution to this issue could be to group minor categories under an ‘other’ category.

Snapshot

Snapshot Visualization

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

The groups for the illustration can be broadly classified into the following categories: Regional sales managers will be granted access to view sales data only for the specific country or region they manage. A Python virtual environment. Create a Python virtual environment. and v3.12.2.

Visualization

Visualization Sales Data Warehouse Management

Data Exploration with Pandas Profiler and D-Tale

Domino Data Lab

AUGUST 12, 2021

In a previous blog , we have covered how Pandas Profiling can supercharge the data exploration required to bring our data into a predictive modelling phase. For example, are the numbers we are seeing actually referring to categories or are the dates provided in a specific format? Python and Pandas Profiling.

Machine Learning

Machine Learning Reporting Statistics Visualization

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

APRIL 24, 2023

category – This column represents the category of an item. Specify the bucket name as iceberg-blog and leave the remaining fields as default. For this post, we create iceberg-blog/raw-csv-input and iceberg-blog/iceberg-output. product_id – This is the primary key column in the source data table.

Data Lake

Data Lake Data Governance Machine Learning Cost-Benefit

Cloudera Open Data Lakehouse Named a Finalist in the CRN Tech Innovator Awards

Cloudera

AUGUST 21, 2024

The CRN Tech Innovator Awards spotlight innovative products and services across 36 categories, with winners chosen by CRN staff from over 320 product applications. release was named a finalist under the category of Business Intelligence and Data Analytics. Open Data Lakehouse also offers expanded support for Python 3.10

Snapshot

Snapshot Unstructured Data Data Architecture Data Warehouse

Data Visualizations in Python and R

Sisense

JUNE 26, 2020

Both Python and R are advanced coding languages that can produce beautiful images that allow humans to understand vast datasets with ease. Data Visualization in Python. There are a wide array of libraries you can use to create Python data visualizations, including Matplotlib, seaborn, Plotly , and others. Import Libraries.

Visualization

Visualization Unstructured Data Measurement Data-driven

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

AWS Big Data

FEBRUARY 21, 2024

For the connection name, enter MWAA-Glue-Blog-Subnet1. Repeat these steps using PrivateSubnet2 and name the connection MWAA-Glue-Blog-Subnet2. Replace the placeholder script with the following Python code: import ipaddress import socket subnets = { "PrivateSubnet1": "10.192.20.0/24", Use the default security group. Choose Next.

Strategy

Strategy Visualization Management IT

10 most in-demand generative AI skills

CIO Business Intelligence

SEPTEMBER 29, 2023

Analyzing the hiring behaviors of companies on its platform, freelance work marketplace Upwork has AI to be the fastest growing category for 2023, noting that posts for generative AI jobs increased more than 1000% in Q2 2023 compared to the end of 2022, and that related searches for AI saw a more than 1500% increase during the same time.

Deep Learning

Deep Learning Machine Learning Consulting Modeling

Build multimodal search with Amazon OpenSearch Service

AWS Big Data

JUNE 18, 2024

This blog post provides a step-by-step guide for building a multimodal search solution using OpenSearch Service. The domain has public access Fine-grained access control is enabled A master user is created Set up a Python client to interact with the OpenSearch Service domain, preferably on a Jupyter Notebook interface.

Dashboards

Dashboards Metadata Modeling Visualization

Chart Snapshot: Circular Dendrograms

The Data Visualisation Catalogue

SEPTEMBER 19, 2024

js) R Graph Gallery + R Option 2 (R) Radialtree (Python) RAWGraphs React Graph Gallery (React) SRPlot Vega (JS) ZingChart (JS) Examples of a Circular Dendrogram Tree of Life: A phylogenetic tree inspired by a figure from Nature and Jason Davies. Circular Dendrogram with Nodes Colored Based on Different Categories. 2016) Figure 1.

Snapshot

Addressing Irreproducibility in the Wild

Domino Data Lab

MAY 1, 2019

I attended the machine learning meetup and reached out to Mawer for the permissions to excerpt Mawer’s work for this blog post. Like JSON, YAML files can easily be read into Python as a dictionary but unlike JSON, a YAML file is human-readable, allowing easy changing of configurations all in one place. test_size: 0.25 Conclusion.

Machine Learning

Machine Learning Testing Data Science Modeling

Introducing enhanced support for tagging, cross-account access, and network security in AWS Glue interactive sessions

AWS Big Data

SEPTEMBER 20, 2023

Install AWS CLI and Python library Install and configure the AWS Command Line Interface (AWS CLI) if you don’t have it already set up. Optionally, if you want to use run a local notebook from your computer, install Python 3.7 For Service category , select AWS services. Choose Create subnet. 24 ), and create your subnet.

Interactive

Interactive Management Reporting IT

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

AWS Big Data

JUNE 19, 2023

It’s a portable columnar format, future proofed to support additional encodings as technology develops, and it has library support across a broad set of languages like Python, Java, and Go. For this post, we name the stack blog-lambda. And the best part is that Apache Parquet is open source! Choose Next. Enter a name for your stack.

Publishing

Publishing Dashboards Visualization Management

Two Downs Make Two Ups: The Only Success Metrics That Matter For Your Data & Analytics Team

DataKitchen

MARCH 16, 2023

It lists forty-five metrics to track across their Operational categories: DataOps, Self-Service, ModelOps, and MLOps. It takes them too long to write SQL, python, or make a dashboard. However, these various metrics can be seen as rolling into our four categories in the table below. Forty-five metrics! Data trust is imperative.

Metrics

Metrics Data Analytics Analytics Measurement

7 Powerful Open Source Tools For Your Data Projects

Smart Data Collective

OCTOBER 14, 2019

When Google talked about releasing this tool in its blog, the brand pointed out that if you don’t protect user data, you risk losing people’s trust. Plotly Python Open Source Graphing Library. The website breaks down the types of charts into categories. Kubernetes. Alternatively, Plotly offers geographical maps.

Data Science

Data Science Machine Learning Big Data Interactive

Top 5 Statistical Techniques in Python

Sisense

SEPTEMBER 25, 2020

In this article, we will explain how to execute five statistical techniques using Python. Code: Let’s see Python code implementation of logistic regression on an Amazon fine food reviews dataset. In Next-Level Moves , we dig into the ways advanced analytics are paving the way for the next wave of innovation.

Statistics

Statistics Predictive Modeling Modeling Machine Learning

Applied ML Prototype Hackathon with AMD Winners

Cloudera

APRIL 11, 2023

For the Cloudera and AMD Applied Machine Learning Prototype Hackathon , competitors were tasked with creating their own unique AMP for one of five categories (Sports and Entertainment, Environment, Business and Economy, Society, and Open Innovation). As you can tell, we left the guidance pretty open ended.

Forecasting

Forecasting Machine Learning Recreation/Entertainment Sales

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

But that isn’t all the art that a company needs: “hero images” for blog posts, designs for reports and whitepapers, edits to publicity photos, and more are all necessary. The LLaMA-family models also fall into the “so-called open source” category that restricts what you can build. Is generative AI the answer? Perhaps not yet.

Enterprise

Enterprise Modeling Testing Reporting

DevOps Interview Prep Guide

Insight

AUGUST 12, 2019

For a good overview of what DevOps entails and how to transition, check out this blog post. The activities within each category are ranked more or less in order of importance as well. if you don’t have a strong preference, I recommend Python) and practice, practice, practice. I suggest Python, but you can work in any language.

Software

Software Data-driven Testing Interactive

Next generation tools for data science

The Unofficial Google Data Science Blog

AUGUST 31, 2016

By DAVID ADAMS Since inception, this blog has defined “data science” as inference derived from data too big to fit on a single computer. Thus it is easy to use Spark for interactive analysis (from a python or scala shell) and to prototype ML algorithms but harder to do so in Dataflow. map( lambda x: convert_line(x.

Data Science

Data Science Sales Optimization Cost-Benefit

Analyst, Scientist, or Specialist? Choosing Your Data Job Title

Sisense

SEPTEMBER 3, 2020

Besides strong technical skills (for instance, use of Hadoop, programming in R and Python , math, statistics), data scientists should also be able to tackle open-ended questions and undirected research in ways that bring measurable business benefits to their organization. We live in a constantly-evolving world of data.

Statistics

Statistics Metrics Visualization Finance

Contextual Topic Identification

Insight

MARCH 4, 2020

In this blog, I’ll describe how to solve this with Contextual Topic Identification, leveraging machine learning methods to identify semantically similar groups and surface relevant category tags of the reviews. Then, leveraging Python frameworks, I implemented the topic identification models.

Visualization

Visualization Metrics Modeling Machine Learning

CIOs in transition: 5 tips for landing your next IT leadership job

CIO Business Intelligence

APRIL 30, 2024

Start a blog or contribute articles to media sites and other established blogs. Soft skills and collaborative competencies are broad categories, so CIOs should focus on the ones that resonate with their leadership styles. Focus on technologies in demand in your industries of interest.

IT

IT Advertising Technology Marketing

5 Key Takeaways from #Current2023

Cloudera

OCTOBER 17, 2023

This blog is for anyone who was interested but unable to attend the conference, or anyone interested in a quick summary of what happened there. Operationalizing python for real-time ML pipelines was a hot topic. The post 5 Key Takeaways from #Current2023 appeared first on Cloudera Blog.

Data-driven

Data-driven Enterprise IoT Data Warehouse

Hitting the Gym With Neural Networks: Implementing a CNN to Classify Gym Equipment

Insight

JANUARY 14, 2020

Pillow is a fork of the Python Imaging Library (PIL), and is along the same lines as OpenCV. Keras is an open source deep learning API that was written in Python and runs on top of Tensorflow, so it’s a little more user-friendly and high-level than Tensorflow. You can view the documentation and check out a quick tutorial.

Metrics

Metrics Optimization Modeling Testing

Building a scalable online product recommender with Keras, Docker, GCP, and GKE

Insight

MARCH 25, 2020

In this blog, I will share how I built Pair , a scalable web application that takes in a product image, analyzes its design features using convolutional neural network, and recommends products in other categories with similar style elements. Multiple such index libraries are generated for different furniture categories.

Deep Learning

Deep Learning Metrics Data Processing Interactive

Density-Based Clustering

Domino Data Lab

DECEMBER 2, 2020

In this blog post, I will cover a family of techniques known as density-based clustering. Using these two parameters, DBSCAN categories the data points into three categories: Core Points : A data point p is a core point if Nbhd ( p , ? ) [?-neighborhood Border Points: A data point *q is a border point if Nbhd ( q , ?

Metrics

Metrics KDD Testing Machine Learning

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

AWS Big Data

NOVEMBER 6, 2023

Amazon EMR Studio is an integrated development environment (IDE) that makes it straightforward for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark. For instruction, please refer to Create a data lake administrator.

Data Lake

Data Lake Sales Management Testing

Themes and Conferences per Pacoid, Episode 5

Domino Data Lab

JANUARY 6, 2019

And my favorite topic: what are some of the best books, blogs, podcasts, etc., or Julia (winner of 2018 Wilkinson prize ), most people are focusing on Python for introduction to data science. There are oh-so-many good blogs about data science, and one of my top picks is the go-to site for data visualization, Flowingdata.

Data Science

Data Science Machine Learning Visualization Reporting

Attributing a deep network’s prediction to its input features

The Unofficial Google Data Science Blog

MARCH 13, 2017

By MUKUND SUNDARARAJAN, ANKUR TALY, QIQI YAN Editor's note: Causal inference is central to answering questions in science, engineering and business and hence the topic has received particular attention on this blog. It takes an image as input and assigns scores for 1000 different ImageNet categories.

IT

IT Visualization Modeling Uncertainty

Leveraging user-generated social media content with text-mining examples

IBM Big Data Hub

AUGUST 28, 2023

Text classification: Useful for tasks like sentiment classification, spam filtering and topic classification, text classification involves categorizing documents into predefined classes or categories. Using programming languages like Python with high-tech platforms like NLTK and SpaCy, companies can analyze user-generated content (e.g.,

Data mining

Data mining Machine Learning Deep Learning Marketing

Using Machine Learning for Sentiment Analysis: a Deep Dive

DataRobot Blog

MARCH 9, 2022

The only caveat is that they must be adapted to classify inputs into one of n emotional categories rather than a binary positive or negative. Coursera – Applied Text Mining in Python video demonstration. Further reading. MonkeyLearn – A guide to sentiment analysis functions and resources.

Machine Learning

Machine Learning Deep Learning Modeling Measurement

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

In this blog, I’ll address some of the questions we did not have time to answer live, pulling from both Dr. Reichental’s book as well as my own experience as a data governance leader for 30+ years. And for good rason: many data governance jobs postings seek skills like Python, programming skills, etc. Curious to hear from the author?

Data Governance

Data Governance Data Quality Metadata Cost-Benefit

Chart Snapshot: Cleveland Dot Plots

The Data Visualisation Catalogue

JULY 5, 2024

A Cleveland Dot Plot is a simple form of data visualisation that plots dots to compare the values of a one-dimensional variable across multiple categories. On a Cleveland Dot Plot, one axis will list the categories, while the other axis represents a discrete value scale.

Snapshot

Snapshot Visualization Interactive

Chart Snapshot: Dumbbell Plot

The Data Visualisation Catalogue

MARCH 26, 2024

It is particularly effective for comparing quantitative value ranges across different categories, offering a clear visualisation of the differences or changes between them. The categories or groups are typically represented along one axis, while the quantitative values are plotted along the other axis.

Snapshot

Snapshot Visualization IT

2021 Data/AI Salary Survey

The DataOps Vendor Landscape, 2021

Webinars

Trending Sources

New Format for The Bar Chart Reference Page

Webinars

Chart Snapshot: Dot Distribution Plots

Data Science Tools: Understanding the Multiverse

Top BOB Blog Posts of 2018: Data Science, Machine Learning and the Net Promoter Score

Chart Snapshot: Mosaic Cartograms

Chart Snapshot: Jitter Plots

Chart Snapshot: Hex Cartograms

Chart Snapshot: 100% Stacked Area Graphs

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Data Exploration with Pandas Profiler and D-Tale

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

Cloudera Open Data Lakehouse Named a Finalist in the CRN Tech Innovator Awards

Data Visualizations in Python and R

Combine AWS Glue and Amazon MWAA to build advanced VPC selection and failover strategies

10 most in-demand generative AI skills

Build multimodal search with Amazon OpenSearch Service

Chart Snapshot: Circular Dendrograms

Addressing Irreproducibility in the Wild

Introducing enhanced support for tagging, cross-account access, and network security in AWS Glue interactive sessions

Ingest, transform, and deliver events published by Amazon Security Lake to Amazon OpenSearch Service

Two Downs Make Two Ups: The Only Success Metrics That Matter For Your Data & Analytics Team

7 Powerful Open Source Tools For Your Data Projects

Top 5 Statistical Techniques in Python

Applied ML Prototype Hackathon with AMD Winners

Generative AI in the Enterprise

DevOps Interview Prep Guide

Next generation tools for data science

Analyst, Scientist, or Specialist? Choosing Your Data Job Title

Contextual Topic Identification

CIOs in transition: 5 tips for landing your next IT leadership job

5 Key Takeaways from #Current2023

Hitting the Gym With Neural Networks: Implementing a CNN to Classify Gym Equipment

Building a scalable online product recommender with Keras, Docker, GCP, and GKE

Density-Based Clustering

Use IAM runtime roles with Amazon EMR Studio Workspaces and AWS Lake Formation for cross-account fine-grained access control

Themes and Conferences per Pacoid, Episode 5

Attributing a deep network’s prediction to its input features

Leveraging user-generated social media content with text-mining examples

Using Machine Learning for Sentiment Analysis: a Deep Dive

Data Governance for Dummies: Your Questions, Answered

Chart Snapshot: Cleveland Dot Plots

Chart Snapshot: Dumbbell Plot

Stay Connected