OxfordCroquet Logo

Dr Ian Plummer

Donation Button

Bayesian Ranking for Croquet: Discussion

Discussions on Louis Nel's Bayesian Ranking paper:

Sam Tudor wrote:

One thing that I don't understand about this system is the fact that it assumes that my chances of playing above my mean standard are improved by absence from competition. Does anybody have any thoughts on why this might be true?

Louis Nel responds:

An increase in Standard Deviation flattens the Rating Curve. So the curve will be lower near the mean and higher far from the mean. I suppose Sam's question is with reference to the part of the Rating Curve far enough from the mean to have gone up. For definiteness, let us suppose it involves the Performance Interval within 10 points of Mean + 55.

The paradoxical impression that the player somehow seems stronger, comes from looking at that interval in isolation and ignoring the fact that the player is also deemed to have a similar higher probability of performing well below the Mean e.g. in the interval within 10 points of Mean - 55 and, perhaps more to the point, the fact that the player is deemed to have a LOWER probability of performing in intervals close to the mean e.g.  within 10 points of Mean.

On the whole, when a player's Standard Deviation is increased by 20 points (say), the system deems the player's win probability against a given opponent to go down. This is illustrated by the following numerical example.


Player A with data 2386 61 (Mean and Standard Deviation)
Player B with data 2054 71
Bayesian Win Probability = 0.813

Now consider the situation when the Temporal Update increases the Standard Deviation of Player A by 20. Then we have

Player A with data 2386 81
Player B with data 2054 71
Bayesian Win Probability = 0.811

Thus the BWP decreased from 0.813 to 0.811.

With regard to Chris Dent's question about the post-game reduction in Standard Deviation even in the case of upsets, I have the following remarks.

The post-game reduction of Standard Deviation comes as an automatic consequence of the application of Bayes' Rule. The system is not programmed to handle upsets differently from expected results. In general, every game result gives more information and this should in general reduce uncertainty about the player's true strength. This is consistent with general post-game reduction of the Standard Deviation.

 Louis 29.xi.6

Chris Dent wrote:

Could you confirm please whether the player's performance curve represents the uncertainty in our knowledge of his/her true ability, or whether it is a probability distribution for his/her performance in a particular game. I think that Bayesian statistics implies the former interpretation, but by my reading the following statements in your oxfordcroquet.com article are ambiguous on this point:

"A player will, with probability 0.68, perform at a level that lies within Standard Deviation points of his Mean"

"The area below a Rating Curve and vertically above the Performance Interval represents the probability that the players' performance will be somewhere in that interval."

Louis Nel responds:

I'm glad to confirm that the second sentence you quoted:

"The area below a Rating Curve and vertically above the Performance Interval represents the probability that the players' performance will be somewhere in that interval"

puts you on track to understand what the Rating Curve means. The second quoted sentence

"A player will, with probability 0.68, perform at a level that lies within Standard Deviation points of his Mean"

follows from the first as a special case (at least for Normal Probability Density curves).

Think first about the Mean of the curve (i.e. the number listed as "BGrade" on www.croquetrecords.com.) Suppose it is 2300. Now think of the Standard Deviation. Suppose it is 60. Then think of the set of points within Standard Deviation points of the Mean i.e. within 60 points of 2300 i.e. the points X that satisfy 2240 <= X <= 2360 i.e. the points in the interval [2240,2360]. Imagine that the points in the plane that lie vertically above these X and below the curve are painted orange. Then those orange points must cover an area = 0.68 and therefore this is the probability that the player is performing at level X for some X in the mentioned interval. (See the sketch in the "Oxford Croquet " article where this area is really coloured orange).

That the "orange area" i.e. below the curve and vertically above the interval

[Mean - Standard Deviation, Mean + Standard Deviation],

must equal 0.68, is a property of all Normal Probability Density curves.

So the Rating Curve implicitly tells us what the uncertainty of the player's performance level is. It tells us via the Standard Deviation. The Standard Deviation relates to the shape of the curve. A Rating Curve with a high narrow peak has a small Standard Deviation. A flatter curve has a larger Standard Deviation. The Standard Deviation of any known probability density function can be computed in terms of an integral.

I hope this helps.

Louis  30.xi.6

Chris Dent wrote:

There are two candidates for the correct interpretation of the performance distribution from the Bayesian system:

1. it encodes our knowledge of and uncertainty about a player's true underlying ability, this being a single precise (but unknown) number.

2. it gives the probability distribution of a player's ability in a particular game.

My understanding of Bayesian stats points towards the former - could you confirm whether or not this is the case.

Louis responds:

I agree that (1) works, but feel a little worried by your emphatic words " this being a single precise (but unknown) number" .

I believe a player's performance level is always continuously varying. For the top players the variation may be slight, but it is still there. For lesser players it is often visible during a tournament, during a match and even during a single game. The Bayesian rating accommodates this view very well. The Rating Curve provides an indication of the bounds between which the performance level varies, with the given likelihoods expressed by the curve. So, if you mean in (1) that at every instant it is a "single precise number" which may be a different "single precise number" from one moment to the next, then we are in agreement. But not if you thought it is a "single precise number" which remains constant, at least for a while.

My first email gave an example of the significance of this point. The property of the Bayesian update that the Standard  Deviation always decreases is clearly consistent with (1) - additional information will decrease the uncertainty in our knowledge of a players true underlying ability. However it is clearly possible for a player to become more erratic - the property of the Bayesian update is that the Standard  Deviation always decreases therefore does not sit quite comfortably with (2).

Your observations have increased my awareness that the update algorithm of Bayesian Ranking always decreases the Standard Deviation and in particular that it ignores the possibility that upset results could be a source of uncertainty. So much so that I have embarked on a series of experiments to determine if the performance of Bayesian Ranking could be improved by programming in an increase in Standard Deviation as a result of upset results. This will require patience, because one has to consider all relevant angles and my program runs rather slowly.

Could you confirm whether the following understanding of how the system works is correct please: If the level of a player's performance in a given match is a random sample from some underlying probability distribution, then given enough games in the system the player's Mean/Standard  Deviation in the Bayesian ranking system eventually converges to the Mean/Standard  Deviation. of the underlying distribution.

I also view this as a convergence process in a vague sense. One can regard the Rating Curve as an approximation of the player's ability. If the latter remained absolutely constant, the approximation would become better up to a point. Randomness (in the guise of upset results) will disallow real convergence even in this idealised case. But the ability is never really constant...

Hence, while changes in a player's true underlying distribution won't necessarily be picked up during a tournament due to the nature of the Bayesian update, the increase in Standard  Deviation. between events allows a 'fresh start' for convergence towards a different underlying distribution (perhaps with a different width) of the player's performances. As a result, a player's mean and Standard  Deviation at the end of an event provide a robust estimate of his/her underlying performance distribution at that time.

As you know, even in mathematical analysis, the limit of a sequence (e.g. 1/1, 1/2, 1/3, ..., 1/n, ...) is often not reached, but merely exists as an idealised outcome. So also the convergence process under discussion here. But it is worse, because even the "sequence" that is trying to converge, is itself forever changing...

Louis 30.xi.6

Ian Burridge wrote:

Having just read the details of this system I think it is fair to say that I am still coming to terms with it!! Can anyone who understands it try to verbalise why the following players are those that have move up and down most in terms of absolute position in the list (Bayesian Grade v CGS Grade):-

 Chambers +32 places
 Bassett +28
 Newcombe +24
 Toye +21
 Jackson +20
 Cumming +20
 Birch +15
 Bidencope +15
 Comish -24
 Cordingley -23
 Burridge -21 (no idea why the question came to mind!)
 Dickson -20
 Hardy -18
 Cunningham -16

I cannot see any obvious link. There does seem to be some correlation (I haven’t done any tests) with the players in the + list generally having an index higher than their grade and those in - list having an index lower than their grade. Given that I thought it was generally felt that the current ranking system was too volatile I was slightly surprised that there was evidence of a link along these lines.

Louis responds:

I agree with your suggestion that players who have a large difference between Grade and Index are natural candidates for having a large difference between their Bayesian Ranking and their CGS ranking. It is interesting to see that about half of them went up while the other half went down. Movement in both directions occur where the Index was higher than the Grade as well as in the other case. Those that went down this time will likely go up on another occasion.

In 7 of your cases the absolute difference between Grade and Idx was 48 or more; the highest was 119. In all but two cases the difference was substantial.

The only two cases for which this difference was small (Hardy and Basset), were also pushed in opposite directions and I have not yet figured out a reasonable explanation why they were so differently treated by the two systems.

It is really no surprise that a large abs(Gr - Idx) will be handled differently by the two systems. Bayesian Ranking puts greater emphasis on how recent the performance is while CGS, by its very design - the lagging effect of the Grade - tends to dwell more on not so recent performance.

Examination of the top of the list where it is obviously harder to generate such large moves shows the only change in the ordering of the top 6 being Westerby moving up 2 places from 5 to 3, he also happens to have the biggest Grade/Index difference of any of the top 6.

From what I can gather from the article (which I admit to not fully understanding) there appear to be two types of Bayesian Grade (Recent and Period). I assume that my observations are due to the list now available on the web-site being the "recent" list and as such positional movements correlate with grade/index differences as this is an indicator of recent form. I have to ask though why after years of the general debate focussing on the over-volatility of the Croquet Grading System (CGS) that it has now been decided to publish something apparently even more volatile. Is it actually the intention to propose this list as a successor to CGS?

I don't think it is fair to characterize Bayesian Ranking as "even more volatile" than CGS. Its volatility has a rather different profile. My impression is that BR is significantly less volatile than CGS near the top, but becomes more volatile lower down. The latter is a consequence of the large Standard Deviation with which players are started. This has the desirable effect that newcomers reach relatively stable performance data more quickly, so it contributes to the overall efficiency of the system.

This beneficial side-effect Bayesian Ranking was not deliberately designed, but it is not purely serendipitous either. In my view it is futile to design a ranking system specifically to have a predetermined kind of volatility. A system should be designed to have the greatest achievable efficiency with respect to the kind of player performance it wants to measure. Success in that endeavour will automatically ensure appropriate volatility. I believe Bayesian Ranking ended up having appropriate volatility.

Louis Nel 1.xii.6

Ian Burridge wrote:

Looking at my specific case I do agree with the figures you use as your illustration, now going to 1997 and using a similar approach for my decline in 1997:










2317 (-108)

2283 (-172)

2317 (-103)


2291 (-26)

2209 (-74)

2274 (-41)

The high point occurs at game 10, I have taken an interim trough that occurred after 60 games as well as the final figure which was the low point for the year. The 50 game interval between the first two sets of figures being comparable to the 51 games in the trough to peak interval of your example. I observe that unlike the example in the rising case the BGrade has followed the Grade more closely which is the effect that I observed and commented on. I cannot draw any conclusions from this but if I had to speculate I would say that the Bgrade presumably regards the more rapid increase in index +319 points in 51 games with more scepticism than a loss of 172 points in 50 games, is this correct?

Louis responded:

I take it you are still looking for a pattern in the phenomena that confronts you: in 2006 Bayesian Ranking appeared to hold back when CGS showed a vigorous climb, but in 1997 when CGS showed a long downward slide, Bayesian Ranking closely echoed that slide. You want to know why it did not stay close both times.

I looked at the details of the two situations and while there are similarities, there is also a notable difference. In 1997 the ever explosive Index more or less cancelled every big upward (or downward spurt) by a corresponding downward (or upward spurt). So it kept on "returning" to near Bayesian Ranking, sometimes even overshooting to a point below BR. That is the kind of slide (upward or downward) which will cause Bayesian Ranking and CGS to stay close. In 2006 , by contrast, your plot shows more than one upward spurt which was not cancelled by a corresponding downward spurt soon thereafter. Take for e.g. your match with Fulford. You went WLL and it gave you a net Index gain of 49 points (in a class 1 match the Index goes into overdrive ...), not followed by a downward spurt. In the Opens (also class 1) where you went 7-3 in the block you had another upward spurt of 81 Index points which was not followed by a downward spurt. This is the kind of upward (or downward) trend that will cause CGS to deviate far from Bayesian Ranking.

You may now ask: why does it deviate far? Well, for players with smallish Standard Deviation (around 70, say)    Bayesian Ranking behaves locally much like an Index20 system (i.e. a system defined by an Index with step size 20). So where you gained 49 Index points in CGS for going WLL against Rob, you gained only 14 points in BR. Keep repeating this kind of occurrence and you can see how the systems will go wide apart. The only thing that prevents them from going hundreds of points apart is that the large Index gains in CGS get cancelled by correspondingly large losses. But when somebody is on a long winning streak (or losing streak) the cancellation does not happen ... When you look at the Idx30 ranking list you find that the top players there also have noticeably lower ratings than in CGS. The same reason applies there.

Supposing that the games in 2006 were my first games in the system but that I entered the system with the same starting BRgrade (e.g. we are just removing all of my history from the system). Would my Bgrade at the end of the year be the same but with a larger Standard Deviation figure? Or as I believe to be the case is the calculation actually affected with the Bgrade coming out as a different number, would it be higher or lower and by roughly how much?

If you entered the system with BRGrade =2279 (your starting Bgrade for 2006) and the customary Standard Deviation=350 given to all newcomers, then your final Bgrade for 2006 will be different from what it is now, and your Standard Deviation may not have dropped quite to where it is now. Your hypothetical end2006 Bgrade will be influenced by the win-loss mix of your record. A good mix of wins as well as losses will give much the same as your present Bgrade. A start on a winning streak will increase the likelihood of a higher Bgrade later.

For the avoidance of any doubt I am quite happy with "Bayesian Ranking has been submitted to the WCF Ranking Review committee for consideration", I just wanted to know what was going on, there was nothing sinister in my request. The reason for asking the publishing of the data on the Butedock web-site. I read into this (wrongly as you have now made clear) that another step down the path had been taken and thought that we were at the point where the committee had made their decision and their proposal was entering a public consultation phase. I would still like to know if the absolute values of the CGS grade and Bgrade are directly comparable. If it helps you understand why I keep asking I had a conversation with a friend at the end of last season when I said that I thought my CGS grade (2444) flattered me and that I felt 2350 was more realistic. Can I directly compare my 2311 Bgrade with the 2350 in this conversation or is this not valid? If the numbers are comparable why, for the top players, are Bayesian Ranking grades typically 150 points lower than their CGS grade? If they are not comparable why not change the number scale so that misleading comparisons are not made.

The absolute values are theoretically comparable, because they operate on the same scale. However, because of the different behaviours, for most practical purposes it makes more sense to compare  Bayesian Ranking ratings only to  Bayesian Ranking ratings and CGS ratings only to CGS ratings. Since most top players have an Standard Deviation near 70 it is helpful to think (while dealing with these players) of Bayesian Ranking as a system that behaves locally (i.e. in the short term, for these players) much as an Index20 system. So all the top players can be expected to have lower ratings than in CGS (because of the vigorous spurts of the Index which then becomes echoed by the Grade a little later). The larger ratings of CGS (where they occur) will often give rise to larger win probability estimates.

Louis Nel 5.xii.6



All rights reserved © 2006-2017

Updated 28.i.16
About, Feedback
on www.oxfordcroquet.com
Hits: 5678