As you probably remember from my last post, I started my internship at B12 Consulting two weeks ago as my second secondment foreseen by the network activities.
So, today my third week starts and I guess it’s a good time to tell you what I have experienced and learned in these days.
If you are PhD students or researchers in general and you’re thinking that the rhythms in consulting are the same as yours, well… forget about it. I’m not talking about the amount of work, that basically remains the same. But on the contrary to researchers (who don’t have to show up at their office at the same time every morning or go back home at a fixed time in the evening), in a consulting company you have office hours. To be honest, I can’t say what I like better between these two “work-styles” yet. There are pros and cons in both the options, so just give me more time to figure it out, I will tell you in the next weeks 🙂
I started working on data-mining using some Python libraries. The first thing you are strongly recommended to do if you start working on Python projects, is to download and install Anaconda, which, according to Wikipedia, is an
open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.
To simplify a bit, Anaconda allows you to create different Python environments on your computer while preserving the dependencies among the different packages. This can turn out to be really useful if you have to work on projects with different Python versions. Here you find a really simple and quick tutorial on how to manage different environments.
When you have to work with different data sets, you probably may want to optimize and speed up the performances of your code in the best way possible. Problem solved: you can use Pandas, a Python Data Analysis Library designed to provide easy-to-handle data structures. Here is a ten minutes tutorial about it. It’s the same that I used to learn and I found it quite efficient.
Just one thing, to be honest with you: ten minutes is maybe the time that it will take to just read the whole webpage. But if you want to follow it step by step, reproducing the commands in a script on your own, changing a bit the code to see if you make the things work out in this way or in the other, maybe even one day won’t be enough. But not a big deal, eventually. All new things always require some time to be learned properly 🙂
I will start applying some machine learning techniques on my data this week. I will use k-means, a clustering algorithm. But I’m not gonna spend more words on that: leave me something to tell you in my future posts!
So far, I’m really enjoying this experience: my new colleagues are very nice people, my office is pretty enough not to make me regret my old one and, overall, I’m learning a lot and this is maybe the part of my work that I like most. So, ’till next time!