by Giles Strong

Ciao. As the title suggests, it’s been about half a year now since I started my PhD research, and last week I presented a summary of my work so far to the CMS group here in Padova. I thought it would be an interesting exercise to translate my presentation into a more blog-friendly form, but for the more scientifically minded, I’ll link the original at the end. Here goes!

Introduction

Run I of the LHC exceeded expectations with the discovery of the Higgs boson, and one of the focuses for Run II, and beyond, is to perform precise measurements of its fundamental properties. One of these is how strongly the Higgs couples to itself, that is how likely it is that multiple Higgs bosons will interact directly.

Within the Standard Model, our currently accepted model for particle interactions, this can be measured by examining processes such as the Feynman diagram shown in Figure 1, in which two Higgs bosons are produced via a third, intermediary Higgs, the di-Higgs production.

trilinear-eps-converted-to-1
Figure 1: Di-Higgs production.

The Higgs, however, is unstable and will quickly decay to other particles. The most common decay channel for di-Higgs is to four bottom-quarksHowever, detecting this is difficult due to other processes easily obscuring the signal. Choosing instead a channel in which one Higgs decays to a pair of tau leptons still retains a relatively large probability of occurrence (cross-section), whilst providing a source of leptons, which can easily be detected.

Data samples and event selection

In order to understand what the di-Higgs signal is likely to look like, and what other processes could be backgrounds, producing the same final-state particles in our detector, I, Alessia, and Cecillia produced simulated data-sets for signal and background via Monte Carlo generation.

Because our detector will not pick up all the particles which enter it and will also absorb some of the particles’ energy, we used another simulation software to account for this.

Having produced the samples, I developed an event selector to categorise the events according to how the tau leptons decay: each tau can decay to quarks or to lighter leptons, resulting in several different final-state categories. I also applied some cuts on the kinematics of the particles.

Multivariate analysis

After event selection, the amount of background events far exceeded the amount of expected signal events, meaning that distinguishing the signal from statistical fluctuations in data would be extremely difficult. A method was required to separate the contribution of background processes from those of the signal. Enter our network’s namesake: multivariate analysis (MVA).

MVAs take many forms, but the state of the art is considered to be deep artificial neural networks.

The name neural network comes from our understanding of how a brain functions, e.g. via a series of interconnected neurons. Artificial neural-networks aim to simulate this by arranging nodes (neurons) in layers and allowing them to apply mathematical functions to input variables. Deep implies that the network contains many layers.

The power of neural networks (NNs) comes from the fact that they are able to consider all the features of a data set simultaneously, unlike a human brain, which struggles to conceptualise data in any more than perhaps three dimensions at once. By doing this, they can discover high-dimensional patterns in the data, which are different for signal and background.

Regression

As well as being applicable to event classification, NNs can be used to regress variables, that is to produce more accurate and precise evaluations of values.

Some of the products of the decays of the tau-leptons are neutrinos (incredibly light, fast moving, weakly interacting particles), which are considered to be undetectable in our detector. This causes difficulty when trying to calculate accurately the masses of certain particles, or pairs of particles. However, from the generation software we know what the masses should really be. Of course this information isn’t available in the real world, but we can train an MVA to understand the connection between real-world observables and the true values of reconstructed variables.

These regressed variables can also then be fed into a classifier, effectively folding in more of our knowledge of physics into the neural network and hopefully increasing its performance.

One such variable is the di-Higgs mass, the combined mass of the two Higgs bosons. Due to the equivalence of energy and mass, this value is not (always) equal to twice the Higgs mass, because Higgs bosons move and so have extra energy. Instead, it is a distribution which can be inferred from the observed decay products.

Using a neural network I was able to regress to the di-Higgs mass using the basic kinematic and geometric properties of the final-state particles, and the results can be seen in Figure 2. The Reco Signal distribution, in red, is my ‘best guess’ for the mass distribution, calculated by reconstructing the di-Higgs ‘by hand’. As can be seen, it is quite offset from the true Gen mass, in blue.

Passing the variables through my regressor resulted in the Reg Reco Signal, in green, and reassuringly we can see that the two distributions show good agreement. Passing the background data through the regressor results in a different distribution (not shown), meaning the regressed mass can be used to help separate signal from background in a classifier.

regression
Figure 2: Di-Higgs mass distributions.

Classification

I developed another neural network, which took the same variables as the regressor and the output of the regressor, and trained it to classify events into signal and background by assigning each event a value between zero and one. Events with values close to one are considered to be signal-like and events close to zero background-like.

classification
Figure 3: Classifier response on data samples.

Looking at the response of the classifier in Figure 3, we see that it provides very good separation. Signal events, in green, are indeed clustered very close to one and background is clustered to zero. Plotting on a log-scale, and weighting events by their probability of occurrence (cross-section times acceptance), we see in Figure 4 that background still has a significant contribution even at high MVA values. This is due to the huge difference between the production cross-sections of the signal and background processes. Definitely room for improvement.

weighted_classification
Figure 4: Classifer response on data samples, normalised to cross-section times acceptance.

Future plans

Although the regressor looks to perform well, checking the pull distribution, that is the distribution of the difference between the regressed mass and the true mass, I found that it was quite wide, indicating that the ‘correction’ was always correct. One way to improve the regressor’s performance could be to regress first to the variables related to the tau-leptons, since this is where the majority missing energy (which leads to inaccurate calculation) is generated. If this is seen to offer improvement, I could try regressing every variable.

One of the challenges of developing neural networks is optimising their architectures, e.g. how many nodes to use, how many layers of nodes to have, et cetera. Currently I use a process of (semi) educated guessing, but it would be good to have a more formal, accurate way of coming up with more performant networks. The most complete way would be to test each possible arrangement (set of hyper-parameters) step by step (an exhaustive grid-search). However, with each network taking about two hours to train and test, this process, even if automated, would take far too long.

A more appealing option is to use Bayesian optimisation, a process where test points are used to predict new parameter-sets which are expected to be highly performant. These can be tested and used to refine the model of the performance function. Since each train-test cycle will build a slightly different network, even for the same parameter set, uncertainties are assigned to the model. When choosing the next test-point a trade-off is performed between re-sampling points to reduce uncertainty and sampling new points which are predicted to be good. By doing this, one can find highly performant parameter-sets for a minimal number of train-test cycles.

Conclusion

All in all, work’s going well, and dare I say it, ahead of schedule. There’s certainly lots to work on, and I have lots of ideas I want to try, but I’m pleased with how well the MVAs appear to be performing.

As promised, here’s a link to the full presentation I gave.

Feature image taken from http://www.milestonesociety.co.uk/