by Greg Kotkowski

Living abroad brings a lot of experience and surprises. One of them is to enjoy countries’ diversity and customs for particular holidays that are not celebrated in the homeland. The other thing is when you discover on your calendar a vacation day due to some national event about which you had no idea. Another time, national vacations could bring troubles when it comes to cooperation between nodes in different countries.

I tried to express our expectations regarding the timing of holidays in different countries in a data-driven manner. The data of national holidays for 18 European countries (Austria, Belgium, Czech Republic, Denmark, France, Germany, Hungary, Italy, Luxembourg, Netherlands, Norway, Poland, Portugal, Romania, Spain, Sweden, Switzerland and the United Kingdom) for the year 2018 available here was used for the following analysis.

It is no surprise that for all the considered countries dates like December 25th or January 1st are vacations. However, December 26th (expected by most Europeans to be also vacation) is not free in Belgium and Portugal and in some parts of Switzerland and France.

The national calendar for countries can be simple, as in the case of the United Kingdom, where there are 6 particular days of National holidays, or complex as in Switzerland, where each canton has its own particular calendar.

The data for each country was encoded as a 3-level vector with factors:

• 1 – national holidays
• 0 – regular day
• 0.5 – free day only in some parts of the country

The cosine distance was used to express similarity between two given observations. In order to compare all countries, a table with the distance between each country should be obtained – a so-called similarity matrix. The resulting matrix could be shown here, however, it is meaningless to print a bunch of numbers and let the reader deal with them. In the following, the multidimensional scaling technique is used as a natural tool to visualise the data, with the similarity matrix as input.

In essence, the algorithm of multidimensional scaling finds the coordinates of the observations in the “P”-dimensional space, so that the distances between points are preserved as well as possible. In particular, setting P=2 allows us to perform simple visualisation with a 2-dimensional plot as presented below.

From the figure above we could understand relations hidden in the data. For example, a Norwegian shouldn’t be much surprised by the national calendar of Sweden or Denmark, but it could be strange for him to live in Belgium or Portugal. It is also interesting to see already in 2 dimensions the separation between countries with public holidays originating predominantly from the Catholic or Protestant religion (respectively positioned closer to the bottom right and top left corners of the figure).

Given the data and the algorithm, I wanted to experiment with the stability of the results. For example, by removing New Year’s day and December 25th from the data (common holidays for all the considered countries) a second figure, shown below, was obtained. The results remain similar to the previous.

However, if August 15th is removed (Assumption of Mary – distinguishable catholic day), the results are changed completely into a cloud of points in which the former classification is not present. I was surprised that the single event can have such a tremendous influence on the results.

I  cannot say anything particular to conclude this article. It is a simple application of the well known multidimensional scaling algorithm. I found it interesting to play with accidental data and I wanted to share it with you. Also, it is always a good training to revise some methods learned a long time ago.