Is not easy to be a wannabe Data Scientist.
Be a Data Scientist is fucking hard, be a self-learner Data Scientist even harder.
Time is never enough, you need to focus, and focus on what market needs, this way you will have more chance to survive.
Where to focus?
You need to identify a path to follow and exercise, or you will be distracted by all the noise on the web.
From September 2017 until now, quite often, after sending my CV applications for Data Scientist positions I took note of the skills required and added manually to a Google Sheet.
I reached more than 430 rows each one contains an information.
Today I decided to analyze this CSV in order to identify the most frequent skills required for a Data Scientist.
The analysis I have done is very brutal and need to be improved, but gives me where to focus.
#importing the libraries
import pandas as pd
import matplotlib.pyplot as plt
csvname= "skill.csv"
df= pd.read_csv(csvname,sep= ",", header=None, index_col=False)
print(df.head(30))
print(df.columns)
df.columns=['skills','empty']
print(df.head())
df_skill=pd.DataFrame(df.iloc[:,0], columns=['skills'])
print(df_skill.head(5))
print(df_skill.info())
df_skill_grouped=df_skill.groupby(['skills']).size().sort_values(ascending=False)
print(df_skill_grouped)
df_skill_grouped.head(25).plot.bar()
I will improve this analysis working with:
1) Regex, this way I can fix typing errors and be more accurate (see for example in the bar graph “Python” and “Python ”
2) Web Scraping my applications in order to automatically extract all the skills required
3) Improve my ClockWork Pomodoro Analyzer in order to be aware where my time is allocated and if is coherent with the market requirements