Exploratory data analysis to find out which countries spent more time on the Kaggle survey.
Data Collection and Preprocessing
Data Source
Survey data published by Kaggle in 2020.
Exploratory Data Analysis (EDA)
Created a data frame with data we need (Time duration in seconds and Countries).
#We only need 2 columns for this task
countries_time_df = df[['Time from Start to Finish (seconds)','Q3']]
countries_time_df
PythonConverted the time into hours.
#Converting time to integer
countries_time_df['Time from Start to Finish (seconds)'] = countries_time_df['Time from Start to Finish (seconds)'].astype(int)
#We are gonna find average of total time spend by Kagglers from different countries
countries_time_df1 = countries_time_df.groupby('Q3', as_index=False).mean()
countries_time_df1['Average Time In Hours']=countries_time_df1['Time from Start to Finish (seconds)'].apply(lambda a: a/3600)
countries_time_df1=countries_time_df1.sort_values(by='Average Time In Hours',ascending=False).head(10)
countries_time_df2=countries_time_df1.head(10)
PythonCreated a bar plot to visualize the top 10 countries in average time and total time.
#I'm gonna see who comes up if we take total time instead of average time
countries_time_df3 = countries_time_df.groupby('Q3', as_index=False).sum()
countries_time_df3['Total Time In Hours']=countries_time_df3['Time from Start to Finish (seconds)'].apply(lambda a: a/3600)
countries_time_df4=countries_time_df3.sort_values(by='Total Time In Hours',ascending=False).head(10)
#Let's visualize top 10 countries
ax = sns.barplot(x=countries_time_df4['Total Time In Hours'], y=countries_time_df4['Q3'])
ax.set_title("Top 10 in Survey(Total time spent on Survey)")
PythonYou can take a look at the notebook on Kaggle.