It is with a certain satisfaction that I can announce today that the AMVA4NewPhysics network is in complete control of its planned schedule, and has now started to provide real research-grade output, delivering its first two scientific products of relevance. Deliverable 1.1 (from work package 1, which focuses on MVA applications to Higgs boson studies) and Deliverable 4.1 (from work package 4, which focuses on the development of entirely new Machine Learning tools with in mind their application to specific HEP problems) have been submitted on February 27th, with one full day of advance with respect to their due date (hehm).
Here I want to give you access to Deliverable 1.1, which is a public report. It is a 60-page document, so I expect it to be of interest only to those of you who are also doing Higgs boson studies at the LHC. Nevertheless, I encourage you to give it a look. It is a pretty damn good study, resulting from the efforts of more than half of the early-stage researchers who are working for the network.
The title of the report is “Multivariate Analysis Methods for Higgs Boson Searches at the Large Hadron Collider“. What we focus on, as foreseen in the network programme, is the decay of Higgs boson to tau lepton pairs and to bottom quark pairs. The process of interest is the production of Higgs boson pairs in 13 TeV LHC proton-proton collisions, a topic to which in particular three of the network institutions have turned their eyes in the recent past.
The reason for studying Higgs pair production – a very rare process, that takes place only once every 10^13 collisions! – is that it is the main avenue for determining a crucial parameter of the standard model: the self-coupling of the Higgs boson with itself. Only if we measure that parameter to be equal to the standard model prediction will be satisfied with the 125 GeV boson we discovered in 2012 being really, really the particle we expect it to be. On the other hand, many new physics models might make the parameter different, so its precise measurement is extremely valuable science.
So what have we done in the past year, and recently documented in the above report ? We studied how events with four b-quark jets, or events with two tau leptons and two b-quark jets, can be selected to evidence a signal of Higgs pair decay. The Higgs decay to b quarks takes place 66% of the time and so it is the most advantageous to focus on, given that the rate of the production is so small to begin with; but tau leptons, to which Higgs bosons decay at a much smaller relative rate (off the top of my head I can’t remember the exact number – it must be in the 1-2 percent range), offer a more distinctive experimental signature.
Background reduction is the name of the game, and so classification is the tool. We need to use all the available observable characteristics of the events to determine whether the events smell like Higgs pair decays or like backgrounds – which are mainly due to QCD multijet production in the case of the four-b final state, and to top pair production in the case of the tau-tau-b-b final state. What the ESR of the network have done, with help from other staff members, was to simulate large datasets of signal and background with a publically available package, DELPHES (which, by the way, is a product from members of one of the network nodes, Université Catholique de Louvain). Then, for the tau-tau-b-b signature the students considered deep neural network architectures and advanced boosted decision trees implementations, and constructed complex sequential procedures, first regressing some important variables to the most likely real value, and then using the output of the regression step in a powerful neural network. For the b-b-b-b signature they considered different architectures, finding the most performing ones, and they studied regression of the dijet masses (the “money variable” in a H->bb decay search) to improve the signal’s observability.
In the graph below, for instance, is shown the result of multi-stage regression of the di-Higgs boson mass (the four-body mass) calculated from observed four-momenta of the tau leptons and bottom-quark jets. On the top is shown the difference between true and measured value of the di-Higgs mass at various stages of the correction (the red curve being the final result), on the bottom panel is shown the distribution for signal and background; the signal distribution is compared to the true value to show the excellent performance of the regression procedure. Please note how separated the background and the signal become after the regression !
So, in summary the results are very encouraging, and they beg us to carry out the same studies on fully reconstructed, private simulations of ATLAS and CMS detector data. Which, of course, cannot be distributed as a network product – that is why we used DELPHES samples, which allowed us to carry out the work in cooperation in a group of network members that includes members of both collaborations, plus theorists, plus statisticians.
The next step is to produce at least a preprint article from the material. This will take a few more weeks at least. Yet for today we can say: mission accomplished!