Statistics of Handicap Matches
Don Gugan, Bristol Croquet Club
Handicapping in croquet has a simple aim, to give each player the same chance of winning a handicap game, but whether this is achieved in practice is difficult to know. However, when there are groups of games which share some features, then one can apply statistics to test whether or not they are shared as expected. The results of matches between clubs allow one to test how well the handicap system is working, and to quantify the size of 'home advantage'. If handicapping were perfect one would expect the match scores, i.e. the number of wins and losses in each match, to follow a random binomial distribution, analogous to repeated tossing of a coin, but this is not the case, and a better analogy is with tossing a weighted coin, modelled by a skewed binomial distribution with unequal probabilities of heads and tails.
An ideal handicap system would make the winning ratio for individual games equal to 50%, and the distribution of match results should then follow the values of the successive terms in a binomial expansion, (a + b)n, where a and b are the probabilities of two mutually, exclusive events such that (a + b) = 1 (i.e. the win or the loss of a game with a = b = 50% for correctly bisqued players), and where n is the number of independent trials (i.e. the number of games in the match). Croquet matches often comprise seven games when the eight possible match results 7-0, 6-1, etc., should occur in the ratios 1:7:21:35:35:21:7:1, i.e. as the percentage probabilities shown in the upper part of Table 1, column 2, while the probabilities for 5-game matches are given in the lower part of the Table; the figures given in columns 3-5 are for various skewed binomial distributions met later, where the probability a <> b. This predicted distribution of match scores is independent of the scores in individual games. Even very close games, though important to the players, are immaterial to the match statistics because over a large enough number of games a team will on average have as many narrow wins as narrow losses.
There are several reasons why the statistics of matches may differ from this 1:1 binomial distribution of coin tossing, e.g.:
and probably other reasons too.
If one tries to devise models beyond the simple 1:1 binomial distribution, there are many possibilities. The assumption of independence does not apply in general, and there is no reason to expect that the departures from the condition a = b = 50% should be the same for different team members, or for different teams at different times. The game probabilities still sum to unity, of course, and the individual games can be regarded as statistically independent, but the winning probabilities in them should be written as a1 and b1, etc. (which can include 'home advantage'), when the distribution of 7-game match results depends on the expansion of (a1 + b1) … (a7 + b7). This is not calculable without knowing the individual values of ai and bi but a practicable way to make some estimate of the size of systematic departures from the 1:1 binomial is to assume that all the ai and bi are equal to some average values, α and β say, and to calculate the coefficients of a skewed binomial expansion with α <> β. Some of these issues are explored in what follows.
A large amount of data is needed to apply meaningful statistical tests, but fortunately the results of matches since 2000 are given on the SW Federation web site for 3 leagues; the Federation league (handicaps <16 ), the Intermediate league (9-18), and the 'B' (originally for 'Beginners') league (>16). The results for 401 completed 5-game and 7-game matches (all containing one doubles match) are listed in Table 2. In addition, the results of 557 Longman Cup matches from 1989 onwards are listed on the CA web site, in 5-game format (with three doubles) initially, and then in 7-game format (one doubles) since 2000. The Longman matches are knockout matches (handicaps 3˝ - 20, team total greater or equal to 24) but as the home team is not recorded, the distribution of scores is only known in aggregate, e.g. the entry in Table 2 for the score 7-0 includes both results 7-0 and 0-7, etc.; these are predicted to be equal for the simple 1:1 binomial distribution, but can be seen from Table 1 to differ greatly for skewed binomial distributions.
In making statistical tests one compares the observed distribution of data with some likely theoretical model. The procedure is quantified by a well established statistical test, the X2 (chi-square) test, which relates the magnitude of a calculated quantity, X2, to the probability, F, that it could be expected to occur by random statistical fluctuations of the data. A very high value of the parameter X2 and a correspondingly very low probability that the observed distribution could have differed by chance from the assumed model is a powerful argument for rejecting the model. It is important to remember, however, that the opposite is not true; a value of X2 corresponding to a high probability does not validate the initial model. It provides reassurance that there is nothing inconsistent with it, but there may be other models which agree with the observations as well, or better. This is the limitation of all statistical tests, but nevertheless, they are very powerful for revealing inappropriate models. As an example, Table 3 shows the results for the 7-game Longman matches from Table 2 analysed using the X2 test. The total number of matches having each score (row 2) is compared with the model of the ideal 1:1 binomial distribution (row 3) calculated from the probabilities given in Table 1, and normalised to the total number of matches, 152. The numbers are clearly generally similar, decreasing for the extreme match scores, but the X2 test allows one to quantify how significant the similarity is. The values of a deviation parameter, x2 say, in row 4 are formed by taking the difference between the observed and the theoretical numbers in each column, squaring it, and then dividing by the theoretical number; one expects this value of x2 to be about unity as a result of random statistical fluctuations for each group we are comparing, and its actual value quantifies the deviations for that group. On adding together all the values of x2, we obtain the sum X2= 7.2, the bottom right hand entry in the Table, a value which characterises the overall agreement between the observed and the model distributions. The significance of this final number is that, dependent on the number of groups which we are comparing, one can find the probability that a particular value of X2 could be exceeded by random statistical fluctuations. In our case there are four groups, the four different match scores, but as the total number of matches is fixed (152), the numbers could have varied independently in only three of them, i.e. there are three degrees of freedom. Statistical tables show that a value of X2 = 7 for three degrees of freedom has a probability, F, of only about 7%; the odds are about 14 to 1 against the match data being consistent with the 1:1 binomial distribution of the model, i.e. that they conform to the assumption of equal probabilities of winning a game. This result hardly endorses the model, but neither is it so unlikely as to raise serious doubts about it.
Similar analyses have been made of the other sets of match data given in Table 2, and the results are shown in Table 4. None of the sets of data conforms well with the 1:1 binomial distribution; all are odds against, varying from 14:1, to millions to one against. In statistical tests it is always desirable to have large number in each group in order to reduce the effect of fluctuations in the data, and for the South West Federation data this analysis combines the groups predicted to contain few members (e.g. the scores 6-1 and 7-0 for the 7-game matches), with a resultant reduction in the number of degrees of freedom as shown in the table. One can go further and combine the whole of the SWF data, but while this leads to different values of X2, it does not alter the conclusion that most of the data are strongly inconsistent with a 1:1 binomial distribution, and that the overall match statistics require us to reject it as an appropriate model. The inconsistency appears to arise from an excess of extreme match scores, but examining these in detail would mean reviewing the results of individual clubs in separate competitions in different years, and while the data exist, statistics cannot help, even if the exercise had any point. We are forced to conclude that the chances of winning individual games are for some reason not equal, that the results do not follow a 1:1 binomial distribution, and that we should consider skewed binomials with the probabilities α <> β <> 50%.
The SW Federation data allow one to find whether there is any home advantage in handicap matches. The data can be arranged in four groups: (5-game/7-game) matches and home (wins/losses), as in Table 5. If there were no systematic departures from equal chances of winning individual games, the wins and losses for the middle two rows would be the same, and just half the totals in the right hand column. This is evidently not the case, and the departures from equality give X2 = 29 for two degrees of freedom, with a likelihood of occuring by chance, F, of less than one in a million! The raw data from Table 5 give an observed home match winning ratio of 254/401 = 63%, and if one assumes that there is a home advantage shared equally by all the members of a team, then it is possible to recalculate the value of X2 by comparison with a skewed binomial. It can be seen from Table 1 that the winning advantage depends on the amount of skewedness, and also that it differs slightly for 7-game and 5-game matches, but a skewedness ratio of 4:3 gives winning probabilities for 7-game and 5-game matches of 65.31% and 63.21% respectively, close to the average value observed, and scaling these to the totals for each type of match in column four gives the numbers shown in italics in Table 5, which lead to values of X2 = 1.95, and of F = 38%. This huge reduction in the discrepancy of fit gives strong support to the idea that match statistics follow a skewed distribution, and that a home match winning probability of about 64%, i.e. a ratio of wins to losses of about 1.8, must be allowed for in the statistical analysis (cf. below). The size of the home advantage is perhaps unexpectedly large, but is not inherently unlikely. It conforms almost optimally with an individual game winning advantage of 4:3, or about 57.1%.
To translate a difference of winning probability into bisques is not straightforward, but Louis Nel has discussed such matters recently on the Oxford web site, and it appears that a winning probability of 2:1, i.e. 66.7%, corresponds to a difference of 150 grade points, irrespective of handicap level, while, for instance, the difference between handicaps 5 and 6 is equivalent to only 27 grade points (a winning probability of 53%), with steps between higher handicaps having successively smaller differences of grade points and winning probability. Using these figures, the 'home advantage' suggested by the match results analysed above is equivalent to about two bisques for each player.
Whether this value of 1.8 for 'home advantage' is typical of other times and places is not known. It could in principle be extracted from the full records of the Longman Cup matches, though not without a great deal of effort to find the home teams, details not given on the CA web site, but since the SW Federation contains at least as wide a range of sizes and types of club as do other parts of the country, there seems no reason to doubt that it is a fair average value over a typical spectrum of players and clubs. If one accepts that it is a realistic value for the match winning advantage, and makes the assumption that each player in a team is affected by the same amount, the winning chance in individual games must be very close to 4:3, or 57.1%, and rather than comparing the match results with a 1:1 binomial model as in Table 4, one should instead use a 4:3 binomial model, which then gives the results in Table 6 below.
On comparing Tables 4 and 6, it is clear that the 4:3 skewed binomial distribution is a much better model for the match data. Most of the sets of data conform well to it, which suggests that the asymmetric distribution of match scores shown in Table 4 can reasonably be explained entirely on the basis of a 'home advantage'. There are still large discrepancies, however, for the 'B' league data and also, surprisingly, for the Longman 5-game data. The 'B' league data fit considerably better to a model where, after taking account of home advantage in the ratio of wins to losses, the actual match scores are then assumed to be random, i.e. to fall equally amongst the different scores: the values of X2and F for this model are 13 and 7%, and 11 and 6% for the 7-game and 5-game matches respectively, values which are in themselves unexceptionable, but only on the basis of an ad hoc model which takes no account of the statistical realities for independent matches. Probably all one can conclude is that the 'B' league data cannot be regarded as a reliable test of the handicap system, which will probably not surprise those who have witnessed what can happen in games between inexperienced and often erratic high-bisquers.
The very high value of X2 = 28 for the Longman 5-game data is a more difficult problem since the data set is large and, unlike the 'B' league data, believed to be reliable, and also since the Longman 7-game data conform well with a credible model. The handicap range involved and the nature of the competition guarantee that, unlike the 'B' league, these are nearly always experienced competitors, but nevertheless, there are twice as many extreme match scores (5-0 and 0-5) as one should expect. One should however, still expect the data to conform to a binomial distribution, and a 2:1 binomial proves to be very close to an optimal fit, with X2 = 3.1 (two degrees of freedom) and F = 22%. The Longman Cup matches changed from 5-games to 7-games in 2000 with the result that the part played by doubles was reduced from 60% of the match to only 14%, and since doubles matches depend much more than singles on tactics and psychology, one may wonder whether the greater doubles component in the 5-game matches had something to do with the striking difference between the values of X2 for the two formats, but whether it could credibly increase the average winning probability for a game from 57%, as deduced for 7-game matches, to the 67% implied by the 2:1 binomial model is problematic. Unless it proves possible to identify the home teams for the 5-game Longman matches, it is pointless to speculate further about the origin of their anomalous value of X2, but whatever its origin, its size is unexpectedly large, and if translated into bisques using Nel's figures as before, amounts to about six bisques per player. One must conclude that there was something seriously wrong with the handicap system for the Longman Cup matches prior to 2000, and also be relieved that the present 7-game format appears to be working fairly.
Bearing in mind the proviso made above that statistical tests cannot provide certain confirmation of an assumed model, but only show that it may disagree with observation by amounts so large that it is very unlikely (or even inconceivable) that the disagreement could arise from random statistical fluctuations, the analysis of all the handicap match data conveniently available leads to the following conclusions:
Document received 29th September 2004
The results of 35 Longman Cup matches for 2004 are now available on the CA website, though still without identifying the home teams, unfortunately. The value of F for the 1:1 or 'even chances' distribution of match scores is 1.3%, which increases to 8.7% if one compares the data with the 4:3 'home advantage' skewed distribution; not a very convincing agreement with statistical expectations. However, one team had several extreme match scores, wins of 7-0, and 6-1 (twice), and if one excludes all their matches from the analysis, the remaining 30 matches yield a value of F = 43%.
Postscript received 1st November 2004
All rights reserved © 2004