Have you ever wondered how Facebook suggests the tags for the picture you post on your wall, or how the photo library on your computer manages to automatically create albums containing pictures of particular people? Well, they use facial recognition software based on Convolutional Neural Network (CNN).
CNN is the most popular and effective method for object recognition, and it is a specialized kind of neural network for processing data that has a known grid-like topology. The network employs a mathematical operation called convolution which allows it to extract a set of features from the input image.
The main architecture is the same as a “regular” neural network, consisting of multiple stacked layers of inter-connected neurons (for a very nice review on the topic see the “Understanding Neural Network” series by Giles Strong). The main difference here is that the inputs are images characterized by a three-dimensional structure: height, width and pixel intensity. The first layers in the structure learn to recognize simple patterns such as lines, edges or corners in the input image; intermediate layers might recognize more complex patterns, like an eye or a nose; eventually, later layers happen to locate and recognize major objects in the image, like a human face or a dog.
There are a lot of examples of recent and really exciting applications of CNNs. If you have been recently feeling like an artist by transforming your banal selfie into a piece of art, you probably have used Prisma, a photo-editing mobile app (https://www.facebook.com/getprisma/) which utilizes a CNN to transform the pictures into an artistic effect. Or maybe you’ve heard about automatic lip reading sentences (https://arxiv.org/pdf/1611.05358.pdf) or instant visual translation (https://research.googleblog.com/2015/07/how-google-translate-squeezes-deep.html). I could go on and on, but I have something else to tell you today. Just keep in mind that we are talking about a very powerful and widely applicable tool.
Given that, and remembering that we are physicists mainly interested in physics searches, we are wondering if there is any chance to use these promising methods for physics analyses. Obviously the answer is yes! And this is what I’d like to focus on today.
It turns out that CNNs are well suited to measurements that exploit a particular class of detectors, such as sampling calorimeters that use scintillators, liquid argon time projection chambers and water Cherenkov detectors. These kinds of detectors, widely used in high energy neutrino physics, allows to record the amount of energy deposited in small regions throughout the volume of the detector. The result is a collection of images of the physics interactions that can be treated with CNNs.
A very interesting recent application of CNNs for neutrino event classification has been performed by the NOvA collaboration and reported in this article. NOvA is a long baseline neutrino experiment. The Near Detector is located at Fermilab, where the neutrino beam (almost pure in νμ) is produced, and the Far Detector is about 800 km away near Ash River, Minnesota. The main goal of NOvA is to make precision measurements of neutrino oscillation parameters by looking at the disappearance of νμ and the consequent appearance of νe .
The oscillation phenomenon originates from the fact that we can describe neutrinos on two different bases, one described by mass eigenstates and the other by flavor eigenstates. What we conventionally call a neutrino is a state that is produced in a weak interaction, with a well defined flavor (e, μ, τ). We can switch from one description to the other via a mixing matrix, meaning that the flavor state is composed of a superposition of mass eigenstates that are allowed to evolve in time (following the Schroedinger equation). This means that as neutrinos propagate through space, their content changes as a function of the distance and we can observe a different composition in the neutrino beam reaching the Far Detector.
Precision measurements of oscillation require an optimal reconstruction of neutrino energy and flavor state. The flavor can be determined in charged-current (CC) interactions (with the detector material), which produce a charged lepton in the final state that has the same flavor as the interacting neutrino, in addition to an hadronic component. Every kind of interaction will produce a peculiar signature. The muon produced in a νμ interaction, for instance, leaves a long track, due to the low dE/dx typical of a minimal ionizing particle. The evidence of the electron coming from a νe interaction is instead characterized by a wide shower, rather than a track. Furthermore, neutrinos can interact via neutral-current (NC) interactions. In this case the final state lepton is a neutrino, which will travel onwards undetected, without displaying information about the flavor state. Here the hadronic component can mimic the electron activity and then the entire process can be mistaken for a CC interaction. NC interactions in this case are treated as background for CC analyses.
So, let’s cut to the chase. The NOvA collaboration literally uses the images of the detector interactions interpreted as pixel maps to feed a state-of-the-art CNN to classify the candidate neutrino events into one of the interaction types. The CNN is implemented using Caffe, an open source framework for deep learning applications and trained over millions of Far Detector events for several categories (depending on further classification of the interaction).
The output of the CNN, shown in figure, can be interpreted as the probability of the input event falling in one of the main categories, surviving νμ CC or appearing νe CC type. In both cases an excellent separation is obtained with the CNN. A natural way to evaluate the performance of the CNN is the comparison with the results obtained with sophisticated identification algorithms commonly in use by the collaboration. In the surviving νμ channel the CNN obtains a separation of the signal versus background with an efficiency of 58%, representing a modest improvement with respect to previous results. The significant gain is observed in the νe appearance where a classification efficiency of 49% considerably outperforms the 35% of common algorithms. Since the separation of νe CC signal from the background is particularly hard and is statistically limited, this improvement is very significant.
The amazing work of the NOvA collaboration shows that CNNs represent a powerful approach to event classification and work well also with non-natural images like the readout of sampling calorimeters. The same approach could be exploited in future by a wider range of detector technologies and analyses. Let me just say that I have noticed that the physics community has some reservations about the reliability of these methods when applied to physics analyses. In particular, it seems quite hard to obtain reliable results in collider physics, but on the other hand it’s true that there is a lot of excitement around the topic, so I think it’s worth keeping an eye out!
[The featured image at the top has been obtained with Prisma using the “Disco” filter. The original picture can be found here.]