In a previous post I entertain with the data of my newborn daughter’s daily habits. After publishing the article I was asked if the number of barfing events per day was Poisson distributed. I want to do a small research about that. I’m also sorry if anybody is disguised by the subject, but the curious question cannot be left unanswered.

In my previous data I haven’t collected vomiting events because it has never been an issue with her. My daughter rather used to burp – only after being fed and in about 90% of the cases. However, such a data also has not been collected.

Let us focus now more on the statistical aspect of the analysis. I’m rather a frequentist, but given no proper data I cannot infer on the burping distribution. That is why I should perform the analysis in Bayesian style.

Our goal is to find a distribution of the number of burping events per day, π(θ). We know the distribution of daily burping events given the number of feedings. It is simply a Binomial distribution f(θ|n)=B(n, 0.9), as the probability of the single event is 90% after a feeding. The additional variable “feedings” in the data is a sample of the distribution of the number of daily feedings, π(n). Therefore, using Bayes’ Theorem we can write

π(θ)∝∫f(θ|n)·π(n)dn.

However, we don’t know the analytical density function of π(n), so we cannot calculate the posterior. We only possess a sample of the prior distribution consisting in 73 observations. Therefore, we can use a bootstrap simulation to get an estimate of the posterior. The procedure has a very simple idea and it might be hard to believe that it actually works. It is just a simple version of Monte Carlo simulation.

At first we pick some huge number T that is going to be the number of simulations. Each simulation step consists of:

  • Take ni as a single sample with replacement from the data of daily feedings.
  • Each observation ni is assumed to be from the prior distribution π(n).
  • For the observation ni, simulate another variable yi from the distribution B(ni, 0.9).
  • The histogram of variables yi is an estimate of the posterior density π(θ).

In figure 1 we can compare the posterior and prior distribution. The prior is a skewed distribution with mean 10.3 and variance 2.2. The posterior is more symmetrically distributed with mean 9.24 and variance 2.7.

Rplot_001
Figure 1. Comparison of prior and posterior distribution

Let us consequently find a distribution for the posterior. In figure 2, the kernel estimate of the posterior is drawn in blue. Certainly, the posterior is discrete, but continuous density functions are better for eye comparisons. In addition, you can see the kernel estimate of  a Poisson distribution with the parameter λ equal to the sample mean (9.24) and a normal distribution with maximum likelihood estimators of μ and σ2, hence N(9.24, 2.7).

Rplot_002
Figure 2. Comparison of the kernel estimator of the posterior distribution with a Poisson and a normal distribution.

The final comment is that the number of burping events per day is NOT Poisson distributed. The distribution of burping is highly under-dispersed. We have rather found that the posterior is approximately normally distributed (certainly after correction of discreteness – if Z is a normal random variable, then the posterior is the random variable Z rounded to its integer part).

Well, it is quite surprising for me. I expected it to be a bimodal, skewed or heavy tailed, not an uninformative Gaussian distribution.