by Greg Kotkowski

Modern statistical modelling seeks for more and more flexible methods to describe a wide variety of random phenomena. The Gaussian distribution is heavily exploited thanks to its properties, easy interpretation and simplicity. However, the data is often more complex and fitting it with a normal distribution is insufficient for skewed or heavily-tailed settings. Hence, more sophisticated methods are of great importance.

On the other hand, more complex approaches bring new difficulties. It is often not as straightforward as for a simple model to get the estimates for parameters. However, due to simultaneous development of computer technology and new algorithms it is now possible to employ complicated models.

In this article, the finite mixture model of Gaussian distributions is described. The modelling scheme remains simple and the parameter estimation is achievable in a fairly easy manner using the expectation-maximisation (EM) algorithm.

Let $\textbf{Y}_j$ denote a $p$-dimensional random vector with a probability density function $\textbf{f}(\textbf{y}_j)$. For the mixture model of $g$ components, the density $\textbf{f}(\textbf{y}_j)$ of $\textbf{Y}_j$ can be written as $\textbf{f}(\textbf{y}_j)=\sum_{i=1}^g\pi_i\textbf{f}_i(\textbf{y}_j)$,

where the mixing proportions $\pi_i$ are non-negative and sum up to 1.

Karl Pearson was the first to use mixture models for analysis purposes in 1894. He studied the body length data of 1000 crabs. It was suggested that there were two subspecies of crabs present, so that it was natural for him to use the mixed model approach.

Unfortunately for Pearson, at his time it was a huge effort to estimate the parameters. He used the method of moments to estimate five parameters of a heteroscedastic mixture model (two estimates of the mean, two for the variance and one for the mixing proportions). To obtain the solution he had to find the roots of a ninth degree polynomial, calculating everything by hand and using simple algebra.

In order to present the power of this method let us consider the simple example of skewed data presented in the figure below. Modelling the data with a simple normal distribution is not the best choice (black line in the figure), although very simple. The blue line represents the mixture in proportion 2:1 of two Gaussian distributions with different means and variances. It is clear that this modelling scheme is more appealing for these data. In fact, in 1994 Priebe showed that for 10 000 observations from a log normal density (that is known to be skewed) it is enough to use a mixture of only 30 normal distributions to obtain the approximate density. Nowadays the parameter estimation is usually performed using the EM algorithm. It is an iterative algorithm that searches for the global maximum of the likelihood function from the data that is considered incomplete as the mixing proportion is unknown. This method was successfully applied in many fields of science.

In High Energy Physics the finite mixture model was introduced for example by Mikael Kuusela (a member of our network), the co-author of the paper “Semi-supervised detection of collective anomalies with an application in high energy particle physics”. The paper presents the semi-supervised approach for signal searches having a labeled background and unlabeled original data. Both background and data are modelled using finite mixture models.

For more detailed information concerning the EM algorithm or finite mixture models I recommend the book of G. McLachlen and D. Peel, “Finite Mixture Models”. Although quite technical it is useful to study the underlying concept.