In a previous post I entertain with the data of my newborn daughter’s daily habits. After publishing the article I was asked if the number of barfing events per day was Poisson distributed. I want to do a small research about that. I’m also sorry if anybody is disguised by the subject, but the curious question cannot be left unanswered.
In my previous data I haven’t collected vomiting events because it has never been an issue with her. My daughter rather used to burp – only after being fed and in about 90% of the cases. However, such a data also has not been collected.
Let us focus now more on the statistical aspect of the analysis. I’m rather a frequentist, but given no proper data I cannot infer on the burping distribution. That is why I should perform the analysis in Bayesian style.
Our goal is to find a distribution of the number of burping events per day, π(θ). We know the distribution of daily burping events given the number of feedings. It is simply a Binomial distribution f(θ|n)=B(n, 0.9), as the probability of the single event is 90% after a feeding. The additional variable “feedings” in the data is a sample of the distribution of the number of daily feedings, π(n). Therefore, using Bayes’ Theorem we can write
However, we don’t know the analytical density function of π(n), so we cannot calculate the posterior. We only possess a sample of the prior distribution consisting in 73 observations. Therefore, we can use a bootstrap simulation to get an estimate of the posterior. The procedure has a very simple idea and it might be hard to believe that it actually works. It is just a simple version of Monte Carlo simulation.
At first we pick some huge number T that is going to be the number of simulations. Each simulation step consists of:
- Take ni as a single sample with replacement from the data of daily feedings.
- Each observation ni is assumed to be from the prior distribution π(n).
- For the observation ni, simulate another variable yi from the distribution B(ni, 0.9).
- The histogram of variables yi is an estimate of the posterior density π(θ).
In figure 1 we can compare the posterior and prior distribution. The prior is a skewed distribution with mean 10.3 and variance 2.2. The posterior is more symmetrically distributed with mean 9.24 and variance 2.7.
Let us consequently find a distribution for the posterior. In figure 2, the kernel estimate of the posterior is drawn in blue. Certainly, the posterior is discrete, but continuous density functions are better for eye comparisons. In addition, you can see the kernel estimate of a Poisson distribution with the parameter λ equal to the sample mean (9.24) and a normal distribution with maximum likelihood estimators of μ and σ2, hence N(9.24, 2.7).
The final comment is that the number of burping events per day is NOT Poisson distributed. The distribution of burping is highly under-dispersed. We have rather found that the posterior is approximately normally distributed (certainly after correction of discreteness – if Z is a normal random variable, then the posterior is the random variable Z rounded to its integer part).
Well, it is quite surprising for me. I expected it to be a bimodal, skewed or heavy tailed, not an uninformative Gaussian distribution.
21 April 2016 at 5:49
You don’t have a posterior distribution at all since you didn’t observe any actual data. What you’ve generated is the “Prior predictive distribution”. https://en.wikipedia.org/wiki/Posterior_predictive_distribution#Prior_vs._posterior_predictive_distribution
It’s a useful idea for evaluating the reasonableness of a prior since sometimes it’s easier to think in terms of what you believe the data looks like rather than what you think the parameters are. With that said it’s not unusual to see things look fairly normal for these distributions. In this case you’re simulating from a binomial (I mean sure you’re changing the parameter for n each time but really it’s still a binomial)
21 April 2016 at 11:40
besides the useful comment of the reader above, I point you to some more discussion that has been going on concerning your analysis, in the “statistics” group at Reddit: https://www.reddit.com/r/statistics/comments/4fnozk/a_bayesian_analysis_of_the_burping_frequency_in/d2ahpej
21 April 2016 at 21:46
Dear Dason Kurkiewicz,
Thank you for your comment. Now I see that you are right. Definitely the distribution I called the posterior is the prior predictive. Shame on me. I’m just being introduced into Bayesian framework. I tried to use my not developed skills to solve the given problem. I probably should have used some uninformative prior and run MCMC.
Well, at least I’ve learnt something. Thanks again.