Bayesian Ranking for Croquet: Discussion
Discussions on Louis Nel's Bayesian Ranking paper:
Sam Tudor wrote:
Louis Nel responds:
An increase in Standard Deviation flattens the Rating Curve. So the curve will be lower near the mean and higher far from the mean. I suppose Sam's question is with reference to the part of the Rating Curve far enough from the mean to have gone up. For definiteness, let us suppose it involves the Performance Interval within 10 points of Mean + 55.
The paradoxical impression that the player somehow seems stronger, comes from looking at that interval in isolation and ignoring the fact that the player is also deemed to have a similar higher probability of performing well below the Mean e.g. in the interval within 10 points of Mean - 55 and, perhaps more to the point, the fact that the player is deemed to have a LOWER probability of performing in intervals close to the mean e.g. within 10 points of Mean.
On the whole, when a player's Standard Deviation is increased by 20 points (say), the system deems the player's win probability against a given opponent to go down. This is illustrated by the following numerical example.
Now consider the situation when the Temporal Update increases the Standard Deviation of Player A by 20. Then we have
Thus the BWP decreased from 0.813 to 0.811.
With regard to Chris Dent's question about the post-game reduction in Standard Deviation even in the case of upsets, I have the following remarks.
The post-game reduction of Standard Deviation comes as an automatic consequence of the application of Bayes' Rule. The system is not programmed to handle upsets differently from expected results. In general, every game result gives more information and this should in general reduce uncertainty about the player's true strength. This is consistent with general post-game reduction of the Standard Deviation.
Chris Dent wrote:
Louis Nel responds:
I'm glad to confirm that the second sentence you quoted:
puts you on track to understand what the Rating Curve means. The second quoted sentence
follows from the first as a special case (at least for Normal Probability Density curves).
Think first about the Mean of the curve (i.e. the number listed as "BGrade" on www.croquetrecords.com.) Suppose it is 2300. Now think of the Standard Deviation. Suppose it is 60. Then think of the set of points within Standard Deviation points of the Mean i.e. within 60 points of 2300 i.e. the points X that satisfy 2240 <= X <= 2360 i.e. the points in the interval [2240,2360]. Imagine that the points in the plane that lie vertically above these X and below the curve are painted orange. Then those orange points must cover an area = 0.68 and therefore this is the probability that the player is performing at level X for some X in the mentioned interval. (See the sketch in the "Oxford Croquet " article where this area is really coloured orange).
That the "orange area" i.e. below the curve and vertically above the interval
[Mean - Standard Deviation, Mean + Standard Deviation],
must equal 0.68, is a property of all Normal Probability Density curves.
So the Rating Curve implicitly tells us what the uncertainty of the player's performance level is. It tells us via the Standard Deviation. The Standard Deviation relates to the shape of the curve. A Rating Curve with a high narrow peak has a small Standard Deviation. A flatter curve has a larger Standard Deviation. The Standard Deviation of any known probability density function can be computed in terms of an integral.
I hope this helps.
Chris Dent wrote:
I agree that (1) works, but feel a little worried by your emphatic words " this being a single precise (but unknown) number" .
I believe a player's performance level is always continuously varying. For the top players the variation may be slight, but it is still there. For lesser players it is often visible during a tournament, during a match and even during a single game. The Bayesian rating accommodates this view very well. The Rating Curve provides an indication of the bounds between which the performance level varies, with the given likelihoods expressed by the curve. So, if you mean in (1) that at every instant it is a "single precise number" which may be a different "single precise number" from one moment to the next, then we are in agreement. But not if you thought it is a "single precise number" which remains constant, at least for a while.
Your observations have increased my awareness that the update algorithm of Bayesian Ranking always decreases the Standard Deviation and in particular that it ignores the possibility that upset results could be a source of uncertainty. So much so that I have embarked on a series of experiments to determine if the performance of Bayesian Ranking could be improved by programming in an increase in Standard Deviation as a result of upset results. This will require patience, because one has to consider all relevant angles and my program runs rather slowly.
I also view this as a convergence process in a vague sense. One can regard the Rating Curve as an approximation of the player's ability. If the latter remained absolutely constant, the approximation would become better up to a point. Randomness (in the guise of upset results) will disallow real convergence even in this idealised case. But the ability is never really constant...
As you know, even in mathematical analysis, the limit of a sequence (e.g. 1/1, 1/2, 1/3, ..., 1/n, ...) is often not reached, but merely exists as an idealised outcome. So also the convergence process under discussion here. But it is worse, because even the "sequence" that is trying to converge, is itself forever changing...
Ian Burridge wrote:
I agree with your suggestion that players who have a large difference between Grade and Index are natural candidates for having a large difference between their Bayesian Ranking and their CGS ranking. It is interesting to see that about half of them went up while the other half went down. Movement in both directions occur where the Index was higher than the Grade as well as in the other case. Those that went down this time will likely go up on another occasion.
In 7 of your cases the absolute difference between Grade and Idx was 48 or more; the highest was 119. In all but two cases the difference was substantial.
The only two cases for which this difference was small (Hardy and Basset), were also pushed in opposite directions and I have not yet figured out a reasonable explanation why they were so differently treated by the two systems.
It is really no surprise that a large abs(Gr - Idx) will be handled differently by the two systems. Bayesian Ranking puts greater emphasis on how recent the performance is while CGS, by its very design - the lagging effect of the Grade - tends to dwell more on not so recent performance.
Examination of the top of the list where it is obviously harder to generate such large moves shows the only change in the ordering of the top 6 being Westerby moving up 2 places from 5 to 3, he also happens to have the biggest Grade/Index difference of any of the top 6.
I don't think it is fair to characterize Bayesian Ranking as "even more volatile" than CGS. Its volatility has a rather different profile. My impression is that BR is significantly less volatile than CGS near the top, but becomes more volatile lower down. The latter is a consequence of the large Standard Deviation with which players are started. This has the desirable effect that newcomers reach relatively stable performance data more quickly, so it contributes to the overall efficiency of the system.
This beneficial side-effect Bayesian Ranking was not deliberately designed, but it is not purely serendipitous either. In my view it is futile to design a ranking system specifically to have a predetermined kind of volatility. A system should be designed to have the greatest achievable efficiency with respect to the kind of player performance it wants to measure. Success in that endeavour will automatically ensure appropriate volatility. I believe Bayesian Ranking ended up having appropriate volatility.
Louis Nel 1.xii.6
Ian Burridge wrote:
I take it you are still looking for a pattern in the phenomena that confronts you: in 2006 Bayesian Ranking appeared to hold back when CGS showed a vigorous climb, but in 1997 when CGS showed a long downward slide, Bayesian Ranking closely echoed that slide. You want to know why it did not stay close both times.
I looked at the details of the two situations and while there are similarities, there is also a notable difference. In 1997 the ever explosive Index more or less cancelled every big upward (or downward spurt) by a corresponding downward (or upward spurt). So it kept on "returning" to near Bayesian Ranking, sometimes even overshooting to a point below BR. That is the kind of slide (upward or downward) which will cause Bayesian Ranking and CGS to stay close. In 2006 , by contrast, your plot shows more than one upward spurt which was not cancelled by a corresponding downward spurt soon thereafter. Take for e.g. your match with Fulford. You went WLL and it gave you a net Index gain of 49 points (in a class 1 match the Index goes into overdrive ...), not followed by a downward spurt. In the Opens (also class 1) where you went 7-3 in the block you had another upward spurt of 81 Index points which was not followed by a downward spurt. This is the kind of upward (or downward) trend that will cause CGS to deviate far from Bayesian Ranking.
You may now ask: why does it deviate far? Well, for players with smallish Standard Deviation (around 70, say) Bayesian Ranking behaves locally much like an Index20 system (i.e. a system defined by an Index with step size 20). So where you gained 49 Index points in CGS for going WLL against Rob, you gained only 14 points in BR. Keep repeating this kind of occurrence and you can see how the systems will go wide apart. The only thing that prevents them from going hundreds of points apart is that the large Index gains in CGS get cancelled by correspondingly large losses. But when somebody is on a long winning streak (or losing streak) the cancellation does not happen ... When you look at the Idx30 ranking list you find that the top players there also have noticeably lower ratings than in CGS. The same reason applies there.
If you entered the system with BRGrade =2279 (your starting Bgrade for 2006) and the customary Standard Deviation=350 given to all newcomers, then your final Bgrade for 2006 will be different from what it is now, and your Standard Deviation may not have dropped quite to where it is now. Your hypothetical end2006 Bgrade will be influenced by the win-loss mix of your record. A good mix of wins as well as losses will give much the same as your present Bgrade. A start on a winning streak will increase the likelihood of a higher Bgrade later.
The absolute values are theoretically comparable, because they operate on the same scale. However, because of the different behaviours, for most practical purposes it makes more sense to compare Bayesian Ranking ratings only to Bayesian Ranking ratings and CGS ratings only to CGS ratings. Since most top players have an Standard Deviation near 70 it is helpful to think (while dealing with these players) of Bayesian Ranking as a system that behaves locally (i.e. in the short term, for these players) much as an Index20 system. So all the top players can be expected to have lower ratings than in CGS (because of the vigorous spurts of the Index which then becomes echoed by the Grade a little later). The larger ratings of CGS (where they occur) will often give rise to larger win probability estimates.
Louis Nel 5.xii.6
All rights reserved © 2006-2017