by Andrea Giammanco

This series of posts is about an ill-posed mathematical problem, and will marginally touch particle physics and management decisions in HEP analysis groups.

The ill-posed problem is the so called “unfolding” problem. Saying ill-posed is not a judgment about the value of the problem but it has to be intended in the mathematical meaning. (Although many colleagues use it, intentionally, in both the mathematical and common meaning.) This topic has been already the subject of two posts on the work of Mikael Kuusela, one written by himself and one by Tommaso about a seminar by Mikael. In short, it is about how to invert a matrix that you should not invert.

My vantage point over unfolding is very different from Mikael. For example, I have never published papers about the technique itself, while he did. He is a mathematician, while I am an experimental particle physicist. So now you know that, by formation, I don’t tremble by indignation when some corners are cut, unless that results in some numerically significant effect.

I am currently coordinating a group inside the CMS collaboration, entirely devoted to a single particle, the top quark. But that particular particle is not important in this post, here it suffices to say that as time passes this group is becoming more and more an unfolding factory.

That’s because when you increase your statistics, there are two natural ways to improve a measurement of something (like some property of some particle): by making it more precise, and by measuring as a function of something else. We call that a differential measurement. (To be noted that also the Higgs boson is now entering the age of differential measurements.)

As surprising as it can be, this topic raises quite some controversy, and even strong emotions. That’s probably because there are several methods to tackle the unfolding problem, and none of them can be proven right. Yes, like religions.

Like religions, you choose the one that feels right in your guts, or you blindly follow the one you were taught long ago and that imprinted your world view since then. And of course, you easily find arguments to prove that all others are wrong (or not even wrong), while brushing off the known defects of your favourite one as practically irrelevant in the cases of real interest.

Let’s start from the basics.
When you measure an observable X, its “true” distribution is “folded” with several experimental effects. For example, acceptance effects: the probability of observing some data may be larger for some values of X than for others. And resolution effects: any instrument has a finite resolution, so the value that you measure is randomly scattered around the “true” value.

We like histograms, so we “bin” (= discretize) the data. A histogram is nothing more than a vector, and that’s why all the mathematics related to the unfolding business only requires to know how to do arithmetics with vectors and matrices. (Vectors are 1-dimensional matrices, so let’s just talk about matrix arithmetics from now on.)

Let’s start with a simple example: a measurement with two “bins”, as you have for example when you are interested in some asymmetry between forward-going and backward-going particles (a rather common case in HEP: the electroweak force for example causes this kind of asymmetries; some “beyond Standard Model” interactions may induce such asymmetries too, where the Standard Model does not, or conversely wash away the ones that the Standard Model predicts.)

One way of measuring that (beware: not the only one) passes through some simple matrix arithmetic. Let’s say that The Truth (a priori unknown to us, of course) is that a certain particle from the decay of a certain other particle, in some appropriate rest frame, should be 75% of the times produced in the “forward” direction.

We like vectors, so let’s write a vector (i.e. a 2×1 matrix):

\left(\begin{array}{c}0.25\\ 0.75\end{array}\right).

We produce 1000 such particles (again: that’s The Truth, so a priori we don’t know that), so let’s consider the vector

\vec{t} = \left(\begin{array}{c}250\\ 750\end{array}\right).

Now suppose that the detector response is not completely symmetric, and the efficiency to select the signal is 95% in the forward direction and just 90% in the backward direction. To get a 2×1 matrix from a 2×1 matrix, just multiply by a 2×2 matrix, let’s call it the acceptance matrix:

\displaystyle A = \left(\begin{array}{cc}0.90&0\\ 0&0.95\end{array}\right)

Note that it is diagonal: the distortion that it induces comes from biasing in favour of one bin versus the other, not by migrating data from one to the other.

But there also plenty of processes that can migrate data from one bin to the other, like the detector resolution. Let’s say that 20% of the times your assessment of the direction is wrong.
So let’s write the migration matrix:

\displaystyle M = \left(\begin{array}{cc}0.80& 0.20 \\ 0.20 & 0.80 \end{array}\right)

And of course there may be background processes contaminating the samples, and they may be asymmetric too; let’s define the background vector

\vec{b} = \left(\begin{array}{c}55 \\ 60\end{array}\right)

So now from the truth vector \vec{t} one gets

\vec{x} = A\times M\times\vec{t} + \vec{b} = \left(\begin{array}{c} 370 \\ 677.5 \end{array}\right).

(Note that it is not bound to contain only integers, as this is the “truth” and not an actual observation.)

And then of course on top of that one has to consider the statistical fluctuations (given by a Poissonian distribution), also known as “statistical noise” (although almost nobody in HEP would call it like that), which means that some information is irremediably lost. Let’s call \vec{x^{\prime}} the vector of data that we actually observe, let’s say

\vec{x^{\prime}}= \left(\begin{array}{c} 380 \\ 655 \end{array}\right).

(Note that it is bound to contain only integers, as this is an actual observation.)

What we have done so far is to “fold” The Truth with all the experimental effects. But of course we start from the data, so we need to “unfold” from what we see to a credible estimator of The Truth, within some uncertainty.

So you just invert the matrix equation above, and you get

\displaystyle \vec{t^{\prime}} = (A\times M)^{-1}\times (\vec{x^{\prime}}-\vec{b})

The matrices A and M can be estimated from Monte Carlo, but of course they are in principle only valid for the model assumed in the MC; so one also extracts them for different (reasonable) models, and takes into account the difference as a systematic uncertainty. Also the prediction of the background in the two bins has an uncertainty. The statistical noise is also an intrinsic uncertainty, and is propagated to the measurement uncertainty as well.

So far so good; we know how to invert matrices. When using a good mathematical library, e.g. numpy in python, you don’t even need to remember how to do that. Or we can google for an online calculator 😉

But let’s not be lazy, we are talking about a teeny weeny 2×2 matrix, we can do that even in the little margin of this webpage if we remember the matrix inversion rule that we learned at school:

matrix-inverse-2x2
The matrix inversion rule. Picture from https://www.mathsisfun.com/algebra/matrix-inverse.html

So in our case one gets:

\displaystyle (A\times M)^{-1} = \left(\begin{array}{cc} 1.48 & -0.35 \\ -0.37 & 1.40 \end{array}\right)

(The rest of the calculation, up to the extraction of \vec{t^{\prime}}, is left as an exercise to the reader. Including the extraction of the statistical uncertainty on the elements of the vector!) Note that if the A\times M matrix were singular, its determinant would be null and the inverse matrix would not exist.

Now, increase the number of bins and therefore the dimension of the matrices, and you will start getting in trouble. Why exactly we get in trouble will be the topic of the next post.
Hint: it has something to do with “near singularity”.