-Doctor my son want to become DataScientist, Have I to worry about?
-It is a critical situation Miss, I am sorry for that, but I warn you.
Sadly we don’t have an answer to this kind of illness.
-You have to be prepared, you must be prepared, your son will go to IKEA, or to save time he will buy some blackboard on Amazon(https://amzn.to/2NpOImd) is not the first time we saw this. Did he start talk enthusiastic about Monte Carlo(https://en.wikipedia.org/wiki/Monte_Carlo)?
-Yes, he did.
-Oh, this is a lot worse than I originally thought.
Some days ago Davide Sicignani wrote me.
Davide is a brother of the InnLab Family from Terracina (so near home), but I met him the first time in London.
Davide asked me a suggestion for a biologist friend of him, who want to start Python and Data Science.
Specifically, he pings me because I started from zero and I am a self-taught.
He is not the first person that ping me for this reason, some month ago Stefano Spe (still InnLab) asked too, so I decided to write briefly about it.
I need to distinguish three categories that we can call “DataScientist”, but I will do in another post:
- Data Engineering (Computer Science)
- Data Modelling (Probability and Statistics- Operational Research)
- Business Intelligence (Analytic Knowledge)
I am going to write a brief of my path and the resources used.
No doubt, the best way to become a Data Scientist is the same to become a great surgeon:
A perfect synergy between Practice, Study and Great Mentors.
What I have understood until today:
- Courses that in one month or one week allow you to become a DataScientist don’t exist, if they say that bullshit
- You can’t become a DataScientist only with practice
- You can’t become DataScientist only with books
- It’s really exciting
- It’s hard, very hard
- A lot of people in the field are available to help you (for free)
- Before starting to understand something you need one year of training (Practice+Study+Mentors)
- Even if you need one training year you have to search for a job in the DataScience Job Market, what you can do and what you know is something that other people have to judge. (Here a great video on “How To Start”)You are risking to delay your debut in the job market scared to be underprepared.
“The world ain’t all sunshine and rainbows. It is a very mean and nasty place and it will beat you to your knees and keep you there permanently if you let it. You, me, or nobody is gonna hit as hard as life. But it ain’t how hard you hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. ” Rocky
Which resources to start the journey?
- Practice Practice Practice
- Postgresql exercises https://pgexercises.com/
- Secret Sauce
In August 2017 I started studying on DataCamp.com, I enrolled and completed “DataScientist Career Path”.
The cost for one year to all courses is about 130/180$ I do not remember correctly.
Great Investment, simply but effective courses.
These courses are useful for a first glance.
DataCamp Mobile App allows you to train during the commuting time on basic concepts.
Cons is not sufficient to start working, you need to integrate with other resources.
Python for Data Analysis: Data Wrangling with Pandas, Numpy, and IPython
This book was written by Wes McKinney, Pandas Framework Author, one of the most used in Python for data manipulation and cleaning.
This book was a Marchetti Present when I started this journey.
It is a great resource because step by step It explains everything you need to know about data manipulation.
I used one year to study it completely and other six months will be useful to re-read and practice on all the topics described.
You must, read, study the book with your laptop and jupyter notebook open. This way you can replicate in short times all the examples and tips in the book.
If you don’t put in practice the examples, even modifying it, the book loses its effectiveness.
This section is F-O-U-N-D-A-M-E-N-T-A-L
I had the possibility to practice trough some consultancy projects, no-profit projects, public datasets.
Most of the time is allocated for data cleaning and manipulation, it is an annoying operation but it is always the same.
There are a lot of public datasets also Italian where you can start doing some Data Visualisation, Inference and building some basic models.
Some Data Set:
Open Data on Italian Election from Viminale https://dait.interno.gov.it/elezioni/open-data
Open Data from Lazio Region on Tourism and Hospitality http://dati.lazio.it/catalog/it/dataset?category=Turismo%2C+sport+e+tempo+libero
For an international view:
Kaggle is a platform specifically for DataScientist.
Getting a good grade on kaggle, participating to competitions is a great way for self-branding, be spotted by a recruiter and to show our knowledge.
SQL knowledge is the second more requested skill, before python in job posts.
This is based on my cv sent and job description analyzed (+100)
A great platform where to practice is https://pgexercises.com/
Postgresql it was one of the most frequent DBMS in the job post, there are others, so don’t feel constrained in the choice
A technical mentor is a key resource for different reasons:
- He pushes you to do better
- He can help you during hard times to solve faster any problem (obviously after you give the blood at least for two days on the issue)
- He makes more human a path otherwise characterized by only numbers and lines of code
There are different podcast on SoundCloud and Spotify, you can listen in dead time to be updated on new technologies and market trend.
The secret sauce is passion.
If you are not electrified by a good plot, if you are not curious about the possibility to plan and predict sales trend, if you don’t be crazy on the idea to spend the night to analyze the exponential process that could represent electronic components failure rates, please don’t start this career.
Passion move everything, the other resources are secondary.
Thanks for reading! If you liked the post and you found useful, share it with others on Linkedin or Twitter.
I really appreciate it