by Greg Kotkowski

On the 19th of May I was very glad to take part in the RooStats tutorial organised by the AMVA4NewPhysics Network as a part of a workshop in Oviedo. RooStats is a ROOT library that uses the “RooFit” package, and provides classes to perform statistical analysis. The tutorial was attended by all the ESR from our Network, among which I was the only non-physicist. I am a statistician who does not use ROOT at all. For this reason, my attendance at the tutorial could seem pointless, however, as it turns out, I learnt an important lesson. In the following I would like to debate about my experience from that day but before that let me introduce a real word example that is later used for a comparison purpose.

During the Olympic Games in Rio 2016 there were three ties for medals in swimming competitions. In fact, one of them was even a three-way tie for the silver medal. The precision of the time measurement is up to a hundredth of a second and the data seems to indicate that it is not enough. People question if it is possible to measure the time more accurately in order to eliminate ties and choose the three real medalists.

Technically, time could be measured with much bigger precision – to a thousandth or even a millionth of a second. However, according to the International Swimming Federation’s (FINA) regulations, the length of each line is determined with precision up to 3 cm due to a tiny variance in the placement or thickness of a touch pad, the position of a starting block and the roughness of tiles.

Given that the Olympic record for Men’s 50 m freestyle is 21.30 second, achieved by Cèsar Cielo, his average speed was 2.35 m/s. In other words, in a hundredth of a second, he moved on average by 2.35 cm in the pool, which is roughly the precision of the line length. For this reason, FINA does not introduce a more precise time measurement, as it would wrongly suggest that the winning swimmer was the one taking less time, when the victory could actually be due to having a little shorter distance to cross than the others.

This example closely expresses my impressions on the RooStats tutorial. Before the workshop I had not been aware of the number of uncertainties that physicists have to deal with in performing their measurements, and the reasons why a specific approach is introduced. As far as the swimming example is concerned, I had never thought about the line length accuracy measurement; in the case of the physics applications studied during the tutorial I was unaware of the complexity of adding and mixing all the statistical and experimental uncertainties. And just like FINA gives standards for swimmers, there are also arbitrary ways by means of which physicist perform data analysis, justified by “this is how we do it”.

During the tutorial, I focused mostly on the statistical tools that are used in RooStats and the foundations for the models. For example, the speaker presented the “CLs” method for an estimation of an upper limit for the parameters of interest. For experimental particle physicists, that is a well-known methodology, while for statisticians it is something completely unknown. Unfortunately, physicists are difficult to understand by statisticians. For example, CLs intervals are described by telling what they are not:

“confidence intervals obtained in this manner do not have the same interpretation as traditional frequentist confidence intervals nor as Bayesian credible intervals.”

The above is quite confusing for me. I kind of grasp the aim of the CLs method, however, interpretation of the results is still beyond me.

My impression about RooStats was that the library is very well written, much better than most of the R packages – R is the language commonly used by statisticians. RooStats is uniform and allows easily to change from frequentist to Bayesian approach (in my opinion that could be a big disadvantage for scientists who do not have a strong statistical foundation). On the other hand, its documentation is quite poor in comparison to R, where packages are often backed up by precisely written scientific papers.

All in all, I am happy to have attended the tutorial. I am now even more convinced that the world of statisticians and physicists is clearly different, and furthermore that the cooperation between us should only be increased. Finally, I want to thank Mario Pelliccioni, who gave the Tutorial and was patient enough to answer my strange questions! For anyone who wants to learn RooStats, I’d recommend to contact him.