I lost a bet!

Almost two months ago, Tommaso and I designed a challenge about guessing the b-flavour content of jets in simulated QCD processes. The aim of the competition was to predict the fraction of events with 0,1,2,3 and 4 selected b-jets (i.e. jets which contain b-hadrons) after an event selection which resembles the one used for the HH → bbbb analysis we are working on.

To make the game more interesting, we increased the number of variables to predict by dividing the data into four subsets corresponding to four different jet invariant mass ranges: 200-400 GeV, 400-600 GeV, 600-800 Gev and 800-1000 GeV. Therefore, a total of 20 variables were to be guessed, but four of them were not independent because the sum of fractions per bin had to be 1.

Both made an educated guess for these variables and agreed that the loser had to buy a refreshing beverage for the winner. We also invited readers of this blog to join us in the game, but we have not heard of any external participants, which is kind of understandable given the strangeness and specifics of this bet.

Some time has passed and it is time to see who won. Furthermore, given that we are currently generating QCD processes enriched in b-jets, we are genuinely interested in how common they are in the final state for the inclusive simulated sample.

Simulated datasets contain sets of events corresponding to a physical process. Simulated (aka Monte Carlo) samples can be treated as data, but they also include what we call truth information, which includes type and momenta of all the particles that were generated as products of the simulated collision. For more information on the Monte Carlo simulations have a look at Giles’ last post.

The MC truth can be linked with data-like observables (i.e. reconstructed objects) with matching procedures. There are severals ways of doing this, especially for compound objects as jets, usually leading to one-to-many and many-to-one pairings.

For this problem, I have used what is called the hadron flavour of the jets, which basically tells me if a jet is matched with a generated jet that clustered the products of heavy flavour hadrons (bottom or charm).

Therefore, to check who was the winner of the bet, I had to count the number of b-flavoured jets selected as a function of the four jet invariant masses for all the events that pass the event selection mentioned in the precursory in simulated QCD samples.

So let’s see graphically how our guesses and results compare:

stacked_flav_compostion — All figures follow the same format. The fraction of events with 0,1,2,3 and 4 b-jets is stacked for each invariant mass bin using different colours. One the left you see Tommaso’s guess, the one on the right is mine, and the center figure shows the results from the simulated QCD dataset.

Can you guess who won without further reading and forgetting the title of this post? That would depend on the figure of merit agreed upon when we defined the challenge.

In this case, we opted for a sum squared of the deviations of our prediction from the QCD simulation values, what is referred to as χ² in the previous post. I prefer to use root mean squared deviation (RMSD) instead, which is a monotonic function of χ² (i.e. same winner), but has the advantage that it is more convenient for interpretation and makes my score closer to my supervisor’s.

In the figure preceding this paragraph, we see that our guesses are clearly different. Tommaso expected that a small but not negligible fraction of events would have 0 and 1 selected b-jets, while approximately the same number of events with 2, 3, and 4 b-flavoured jets would be present. Maybe he can explain his rationale for the invariant mass dependence chosen.

However, I thought that b-tagging criteria would be tight enough to reduce 0 and 1 b-jet contribution under the 5% level, leaving a small 2 b-jet fraction and making the 3 b-jet category dominant.

Looking at the QCD simulation results, we see that neither of us was very close to truth. The largest fraction is constituted by events with 4 b-jets, which neither of us had expected and event with a low number of b-jets have a non-negligible contribution.

This was a difficult challenge indeed, because the variables to be predicted depend on the number and spectrum of jets from different flavours in an event, the efficiency and fake rate of online and offline b-tagging algorithms and how the invariant mass of four bodies depends on the jet variables.

In the next figures I compare predictions and QCD simulation results for each category independently. You can see that Tommaso did much better in the 0,1,2 and 3 b-jet categories, while I almost nailed the 4 b-jet category (loser consolation prize). In the last plot, you can compare our scores. I had a RMSD of 0.15 while Tommaso won the bet with a mean squared error of 0.10!

category_flav_compostion (1) — Comparison of predictions and QCD simulation for each number of b-jet category and final scores of the challenge for both participants. The error bars in the histograms are due to the limited statistics of the simulated sample. Predictions are in 5% units as agreed.

The take-home message is that we need simulated data for modelling complex phenomena in colliders, it is really hard to predict compound magnitudes without crunching numbers.

How could we test the hypothesis that our educated guesses were better than random number pickings? In other words, has previous acquired knowledge helped us to achieve a higher prediction accuracy? Not sure, I am eager to discuss it in the comments if you please.

The only thing I am certain about is that I will have to buy Tommaso ${another euphemism for beer} as agreed. Keep tuned to this blog for more challenges and games, which we will try to make more welcoming for other takers apart from us.

2 thoughts on “I lost a bet!”

Add yours

amva4np
14 April 2016 at 14:56


Excellent Pablo! I am looking forward to a good beer at CERN!
I also regret that not many others took this challenge… It would have doubled the fun!
Cheers,
Tommaso

LikeLike

amva4np
15 April 2016 at 22:45


Oh, and – about the reason why I chose some specific dependence of the fractions with invariant mass. My reasoning, IIRC, was the following.

1) fake b-tags are more likely in higher-energy jets. So the fraction of lower-b jets processes should be some function of the abscissa (which is also correlated with jet energies). I opted for some mild dependence, split between the 0-, 1-, and 2-b categories. If you look at the sum of these three, you see that I predicted it would grow with mass. Which is not really what happens in reality – but was my guess.

2) But then there is the effect of the trigger. The trigger selects events with two or three b-tags, by asking that these jets have high energy. When one looks for the highest b-tag-variable jets, the triggering jets will likely be among the set. Now, what the trigger really selects is tough to guess, as we have to deal with initial rates which vary by orders of magnitude. I must admit I was not really sure what the above considerations would lead.

3) then there is the fact that QCD produces b-jets in even numbers, most of the time. That is because the process of b-quark creation is caused by what is called “gluon splitting” – a gluon creates a b-antib pair. true, a b-quark can be “picked up” from the initial state, but this is more rare. So I tended to believe that the fraction of events with 3 b-jets would signal the “loss” of a b-jet, something which might be more likely in lower-energy events. Or at least, this was sort of my thought, I think.

As for the relative fractions, I think this comes from experience. I knew very well the composition of multijets data in CDF, but that was a different experiment, at a different center-of-mass energy… Still, I remember that I was surprised by how high was the fraction of light quark or gluon jets in the sample of b-tagged jets, even when the b-tagger was made very tight. Light quarks and gluons are so frequent, you always get some of them! Some back-of-the-envelope calculation made me predict that when you require four b-tags (or well, three medium and one loose b-tag) you still get a significant fraction of fakes….

Hope that helps!
T.

LikeLike

AMVA4NewPhysics

A Marie Sklodowska-Curie ITN funded by the Horizon2020 program of the European Commission

I lost a bet!

Pablo de Castro

2 thoughts on “I lost a bet!”

Add yours

Leave a comment Cancel reply

Recent Posts

Categories

Recent Comments

Archives

Follow Blog via Email

Social

AMVA4NewPhysics

A Marie Sklodowska-Curie ITN funded by the Horizon2020 program of the European Commission

I lost a bet!

Condividi:

Pablo de Castro

2 thoughts on “I lost a bet!”

Add yours

Leave a comment Cancel reply

Recent Posts

Categories

Recent Comments

Archives

Follow Blog via Email

Social