In the last post, I mentioned one problem connected to the use of a classification tool to discriminate a new particle signal from backgrounds. The case is the search for production of Higgs boson pairs, which may decay (36% of the time) into a final state including four b-quarks.

The b-quarks produce jets of hadrons when they are kicked off the interaction point. They can be identified by an algorithm which searches for “secondary vertices” created by the back-propagated trajectories of charged particles, as explained by Pablo in an earlier post. So the final state includes a very distinctive “four-b-jets” signature. It should be easy to spot it in LHC collisions, right ?

Alas, no. The strong force (QCD) obeyed by quarks and gluons is the master of proton-proton collisions. It produces energetic gluons that “split” into pairs of b-quarks, so a 4-b-jets final state is not terribly rare to obtain from QCD processes that have nothing to do with di-higgs boson production. In fact, if you compare the rate of 4-b-jet production by QCD and by Higgs pair decays, you are looking at a 10,000 to 1 ratio!

Fortunately we may rely on the kinematics of the events to increase the signal purity of our selected data. And here Machine Learning tools like neural networks or boosted decision trees or other gizmos come into play. However, if you take QCD events and select the very few that have kinematics similar to that of HH decays, you end up with what you bargained for: events that have pairs of b-jets whose combined mass peaks at 125 GeV, exactly like the signal does!

So what, you could say – if your signal-to-noise ratio is high enough, you may spot an excess of events from the HH decay signal, and be done with it – the excess can be an evidence of the sought process. No, that’s not enough – what people really love to see in high-energy physics searches is a histogram where a “bump” is caused by the clustering of signal events at the same invariant mass (in this case, the one of the Higgs boson, 125 GeV). A similar thing is shown in the graph on the right, which reports on a search of Higgs decays to b-quarks produced in association with a W or Z boson performed with CMS. As you see, spotting a bump is not as easy as it seems… (The signal is the small distribution in red, which lays low in the 100-150 GeV region).

What one then really wants is a background that is not only small, but also “not peaky”. So the signal can be seen on top of it, even if it is small! A machine learning classifier that increases the signal purity without creating a bump in the background mass distribution is what we need. And it is not easy to find!

The idea I had this morning, just 5 minutes before the alarm clock started ringing (am I getting crazy or what?), is that one may give a weight to background events during the training of the classifier. This weight could be proportional to the ratio between signal and background in the mass distribution. If that were so, then the classifier would not like any more to consider as “signal-like” the background events based solely on the fact that they have a large probability of having a dijet mass in the 125 GeV whereabouts. It would thus “decouple” from that part of the supplied information. Cutting at high values of the discriminator produced by such a recipe would not preferentially select background events in the 125 GeV mass region, and this in turn would allow one to “see” a bump in that distribution if a signal is present, and see no bump if there is none.

Confused? Okay, think at it this way. A classifier is just a tool to determine the density of background and signal anywhere in a large parameter space (often a multi-dimensional one). If you fool the classifier by giving more weight to background events of some kind, the density estimated by it is distorted. Events with a mass in the 125 GeV whereabouts will not be favoured any longer by the classifier (as it would do if you gave backgrounds no mass-dependent weights). The classifier will be thus forced to “look elsewhere” for some discrimination power in the different densities of signal and background.

This idea is of course neither new nor very smart. But the field of statistical learning has gotten so incredibly thick with methods, ideas, and tools, that one has a lot of room in “rediscovering” tools that were designed for different tasks, in order to attack a new problem. It is a less “romantic” and inventive activity than the design from scratch of a new algorithm, but it may be just what you need to boost your analysis sensitivity….

I would be very happy if any of the twenty-three readers of this blog were willing to comment on the ideas above. I would be even happier if you could spot what citations I just implicitly made (hint: Manzoni).