Exploratory data analysis on Kaggle 2022 survey. The objective is to understand different job families in data science, age groups of data professionals, gender differences, tools and technologies used, etc.
I was job hunting while doing this analysis. So I aimed to get an idea about the tools and technologies used, types of companies, etc.
Exploratory Data Analysis
Data Science is notorious for confusing job descriptions. Let’s see the top job families in the field.
What about the age of data professionals?
Young people from 18-16 are very active in data, but most of the working professionals are between 25 and 29.
What about gender?
So the number of men working in data is significantly higher than all other groups.
Which country has the highest number of data professionals?
India and the US have the highest number of data professionals.
What about the industries these people are working in?
What are the sizes of companies where data professionals work?
Now that we have a background on data professionals, let’s look at the details of their work.
Which is the most commonly used programming language?
Which is the most commonly used IDE by data professionals?
Do professionals use hosted notebooks?
What is the most commonly used cloud computing service?
AWS and GCP are the top cloud computing services.
Which is the most commonly used BI tool?
What are the top ML algorithms used by professionals?
Linear regression and decision trees are on the top. This would have been surprising to me when I started in data science, but not anymore.
What are the commonly used frameworks by professionals?
What are the most commonly used visualization libraries by professionals?
Are NLP methods widely used in the industry?
Are computer vision methods widely used in the industry?
How about AutoML?
Finally how about compensation?
This is not the best way to visualize compensation because of the currency differences.
Check out the code on Kaggle.