In my last article (on the need for dimension reduction) I deliberated on the data sparsity in high dimensions and difficulty for classifiers to dig out a signal from noise. Certainly there exist numerous methods to transform data into a space of smaller dimension.

Different methods could yield different data transformations, so the adequate choice is an important step. That is why it is crucial to understand at least the basic approaches and learn its assumptions, strengths and weaknesses. So let’s focus on the Independent Component Analysis (ICA) – a particular method intended for dimension reduction.

The classical and well-known method is Principal Component Analysis (PCA). I’m not going to to debate about it much. In general it tries to choose a possibly small subspace from the original, rotated space, so that maximum variability is preserved. Consequential Principal Components as the base vectors are orthogonal, that means, in statistical language, not correlated. However, in general it doesn’t mean independent. Here comes the main idea of ICA – let’s choose from the uncorrelated components the independent ones in order to find the true underlying signal and to further reduce the dimension.

Let’s consider a vector X(t) of dimension p. X(t) contains our measurements in time or just any set of independent observations of a phenomenon. We assume that the observed data is composed as a linear combination of some underlying, independent, non-Gaussian signals as for example light sources or observed processes. It could be written as follows:

ica_1

The aim of ICA is now to estimate the matrices A=(aij) and S=(si(t))=(sij), given the data X.

The assumption for ICA seemed relevant for me to apply the method to CMS data. Just imagine that the signals si were the physical underlying processes, like W, Z or Higgs boson production. A single observation could then be just a linear combination of these signals. Seems too perfect to be true, but why not try?

So I took some CMS data that is considered as almost entirely made of irreducible QCD processes and also the MC simulated data of double Higgs production. I trained the simple classifier (CV decision tree) on the training set and tested its performance on the testing set in 3 cases: without dimension reduction, after PCA and after ICA. My results are presented below as ROC curves.

Rplot02
The ROC curves for classifiers. In the legend, also the area under the curve for each classifier is given.

From the above figure it becomes obvious that there are almost no differences in performance for the classifiers. However, from 46 variables of the initial data set, I obtained 16 PCs preserving 80% of the total variance and only 5 independent signals for ICA. That means that data could be reduced to only 5 dimensions, while the classification ability is preserved!

To tell you the truth, I expected that the classifier based on ICA would outperform the others, because having retrieved a clear signal of the Higgs boson production would boost the accuracy of the classifier. Seems that either such a signal hasn’t been found or rather that the ICA assumptions were too strict and the observed data is not of a simple linear form.

I wonder what you physicists think about this concept.