Ciao! Hope you found enjoyable things to do these days!

In my first blog post, I mentioned that b-tagging surely deserved a post on its own in this blog. As you already know or will understand after reading the following paragraphs, this tool is thoroughly used in high energy physics data analyses. Therefore, I expect it to be alluded by  AMVA4NewPhysics’ participants several times in the following years. The aim of this post is to introduce this experimental technique and explain why it is useful for New Physics searches and precise Standard Model measurements.

When proton bunches  collide at very high center-of-mass energies, as they do at the LHC, quarks or gluons within them (referred to as partons) might interact, producing other particles by hard scattering processes. An astonishing number of different processes might occur and different sets of particles can be created. Indeed, everything that respects nature symmetries can happen, as the production of a quark and its antiquark or a Higgs boson. Nevertheless, not all processes are equally likely to happen and their rate is quantified with the process cross section, which can be calculated for a given theoretical model.

Cross sections (or rates) of different physical processes as a function of center-of-mass energies. Note that the scale is logarithmic! Source: MCFM.

The problem is that usually the most frequent physical processes are already well known and the interesting stuff is produced with comparatively low rates. Sometimes boring events, which are referred to as background events, can look very much like the things you are after, and are especially problematic for final states with jets. Being a proton collider, quarks and gluons assiduously populate the final states because they can come from the hard scattering itself, initial or final state radiation, or other proton interactions in the same proton bunch.

Due to QCD confinement, the produced gluons and quarks rapidly form a stream of colourless bound states (i.e. hadrons), that can be detected or that further decay in detectable particles. They are clustered with fancy algorithms like a cone for their experimental treatment. Those cones are called jets and by themselves they do not provide any accurate information about the flavour or charge of the initial particles. The previous statements do not apply to top flavoured quarks, because their lifetime is much shorter than the hadronization timescales.

Let’s review what we just went over! We could detect jets in an event, which are complex experimental level objects coming from quarks and gluons providing useful information about their energy and momentum after some hard work on calibration, but nothing clear about the quark type or charge. Because the LHC is a hadron collider, many additional jets per event might be produced in addition to those that correspond to the process of interest and many not-very-fascinating-but-much-more-likely processes might generate a set of jets with similar kinematic characteristics.

But here is the catch: the nature of the hadrons formed in the fragmentation process does depend on quark flavour. Hadrons coming from charm and bottom quarks have some distinguishable physical properties, which can be exploited by high resolution tracking and vertexing detectors. The algorithmic identification of jets containing b and c hadrons, generally called b-tagging for simplicity, is a Swiss Army knife for simplifying event jet combinatorics and separating unexciting light-quark-rich background events from b-quark enriched signals.

Public CMS event display from VBF Higgs production to bottom quarks analysis at 8 TeV. In this channel, two additional quarks are expected. Yellow cones are jets and lines are charged tracks. Two secondary vertexes are found and two of the jets are b-tagged and identified as Higgs decay candidates.

Bottom quarks are in fact decay products of many interesting processes in the Standard Model and of possible Beyond Standard Model extensions. As mentioned before, top quarks decay before hadronization, and when they do they nearly always produce a bottom quark and a W boson. A bottom-antibottom quark pair is also the most likely decay mode of the discovered Higgs boson and accurate jet flavour tagging can greatly improve the powerfulness of analyses studying its properties.

A modern collider experiment with a high resolution tracking detector walks into a bar and says: “Let’s get to the bottom of why those jets taste funny!”

After such a bad joke, we can go through the properties of jets coming from bottom quarks, referred to as b-jets, that make b-tagging possible. Hadrons that contain bottom quarks (and also those containing charm quarks to a lesser extent) have lifetimes  of the order of the picosecond, so when highly boosted they can travel several millimetres away from the primary vertex (PV, where the hard scattering occurred) before decaying. They are characterized by large masses and typically decay to several charged tracks, which can be detected with the tracker and extrapolated towards the collision region. The impact parameter (IP, the closest distance to the PV) of those tracks will be larger than usual. In fact, if the track resolution is high enough, a secondary vertex (SV, where the heavy hadron decayed) can be found.

Tracks coming from b-hadron decays have larger IP (so they are displaced) and come from a SV that could be resolved with a high resolution tracker. Source: DO Collaboration.

Hence, a jet that contains several displaced tracks or a reconstructed secondary vertex is very likely to come from a bottom quark. While these are currently the most used b-jet properties for identification, the fact that a lepton is quite likely to be produced in the b-hadron decay chain (~36%), while this is not the case for light jets can also be useful.  In order to improve identification, differentiating properties are expressed with related numerical variables and usually combined to create a single b-tagging discriminator, so jets with high discriminator values are very likely to be b-jets and not light jets.

The identification of b-jets can be thought of as a machine learning classification problem. In fact, the most powerful discriminators in CMS and ATLAS are created by using neural networks and other multivariate analysis techniques. Apart from standard b-tagging, there are ongoing efforts in both collaborations to develop algorithms optimized for c-jet and b-jet charge and fat jet tagging. Keep this blog tuned for more information about that!

This post is getting too long already and before I go I would like to mention that, while performance of these techniques could be estimated from simulation, many dedicated analyses use clever methods to measure it directly from experimental data. I will try to explain these techniques here at some point in the near future. By the way, do not hesitate to ask for further explanations or to give some suggestions for new posts in the comments!