A few days ago I left Padua, where I spent one intensive month, working with other network members and ESR fellows (Giles, Greg and Pablo) at the Statistical Department.
A first observation
When people from different disciplines work together, like in the case of physicists and statisticians, the first stumbling block is the communication. This is mostly because we are used to thinking in different ways to solve problems of different nature. But even when we deal with the same context, we often use different specific words to describe the same thing. For example, a signal search is called anomaly detection in statistical language. Our kinematic variables are called features by statisticians. Also, sometimes we use the same word or acronym to designate different concepts (pdf, model etc.).
But, after an initial period of training, everyone can learn a lot in interdisciplinary collaborations such as our network. Indeed, despite the initial difficulty, we managed to understand each other and proceed with the work.
Status of the work
You certainly remember that in July, Alessia and Giles joined me in Oxford to work on the Monte Carlo (MC) generation of the samples needed for our studies on the pair production of Higgs bosons in the bbbb and bbττ final states. On that occasion, we produced the samples for the signal and background processes, summarized in the following table.
The samples had been generated at the parton level by using MadGraph5_aMC@NLO, a framework that provides the elements necessary for Standard Model (SM) and Beyond the Standard Model (BSM) phenomenology, such as the computation of cross sections and the generation of hard events from the simulated collisions between elementary particles. Then, the MadGraph output files had been showered and hadronised by Pythia8, another C++ based high energy physics event generator. We eventually passed the output through Delphes, which performs a fast and realistic simulation of a general purpose collider detector, like ATLAS and CMS.
Now, what we did in Padua was a first study for the classification of these signal and background processes. Our studies in the network, indeed, include the development and refinement of tools suitable for the discrimination between signal and background. But, be careful, this is not the final goal!
Once we have optimized the higher performance tool, the plan is to use that tool to make a statistical inference, which means drawing conclusions based on data (special thanks to Pablo for shedding light on this sensitive issue and a lot more!). How and what conclusions we will draw have yet to be determined. Starting from a baseline, provided by Giles, which makes use of some libraries implemented in Python, in particular, we trained a deep neural network (DNN). We then compared the DNN result with a XGBoost (Extreme Gradient Boosting) classifier.
The month spent in Padua has shown to be really useful to get myself started on Machine Learning (ML). I got the possibility to discuss the main ideas with the other ESRs and I received many suggestions in order to improve my knowledge in this discipline. I also started attending an online course on ML, which seems to be very promising and possibly I will tell you about it when I finish it.
Besides the great work with the other network participants, during this month I had the opportunity to visit a wonderful city I had never visited before and to savour my last few moments in Italy, before moving to the UK. Now, instead of bothering you with the tales of Cecilia’s Travels, I will leave you with a photo gallery of the beauties and goodness I found in Padua!