A Continuous Grading System
By Tim Harrison
The croquet grading system has been the source of much comment recently. Unfortunately some of this has been rather uninformed criticism, based largely on what the critics think the ranking order should be, rather than on possible drawbacks in the system. Steve Mulliner described the mechanics of his system in the November 1986 'Croquet' leaving aside the questions of why. In this article I will explain the theory underlying this system and then 1 will propose an enhancement that is both elegant and original.
The objective of any grading system is to provide a figure, the grade, for every player in a pool of players. This figure says something about the ability of a player. In the system to be described, it is a measure of the player's current ability. To be more specific, the difference between two player's grades is related to the expected outcome of any game between them.
As well as defining what the grade means, it is necessary to find an objective method for estimating the grade. If this is to be a general method, the only information that can be used from a single game is the result (win/loss/draw) - the significance of the score is too game dependent. The problem then is that there is very little information arising from any single game. Moreover we are also often working with a small sample of games, which further compounds our difficulties. Two desirable properties of any grading system are that grades can be continuously updated and secondly that there is a sound theoretical basis.
Definition of grade
The first step in designing a grading system is to decide upon a mathematical model which accurately explains past results and can be used to predict future results. We will use a relatively simple model, in which one figure, the grade, is assigned to each player in the pool. The meaning of this figure is seen by examining how the model predicts the result of a game between two of the players.
Consider two players A and B with grades and Xa and Xb. The model gives the expected probability of A beating B as:
This probability function is called the logistic function. For those whose mathematics may be slightly rusty this looks like a sigmoid. If A is much worse than R then Pab is near but always greater than zero, whereas if A is far superior the probability approaches one. For the gamblers a probability can be converted to odds according to the formula:
The table shows the probabilities and corresponding odds for a series of grade differences.
This formula does rather appear 'out of thin air' However its validity can be evaluated by considering the situation of three players, A, B and C. What is the relationship between Pab, Pbc and Pac. From the above equation if can be shown that:
By inspecting what happens when some numbers are put in, it can be seen that the model is reasonable for most games, croquet included. However if this is not the case, then the grading system derived from it is likely to be inaccurate. This probability function does seem a sensible choice but it is not essential to use it. There are some major constraints on the function though. Firstly it must always be increasing from 0 to 1 in a continuous fashion and secondly it must be true that Pab + Pba =1.
A player's grade is in theory only relative in that it is only the difference between two grades that has real meaning - it gives the odds of each player winning. However in practice, with a large enough pool of players, the overall ability of the pool can be considered stationary and then a player's grade does say something about his ability. Note though that the absolute value of a grade is arbitrary.
There is an analogy between a grading scale and a temperature scale. In the Celsius scale 0 is chosen as the freezing point of water and there are 100 degrees between the freezing and boiling points of water. Both these numbers are arbitrary and in fact for the Fahrenheit scale the numbers are chosen are 32 and 180. However the freezing point of water is the same temperature on both scales. What is important though is to pick a scale and then to stick to it.
In choosing a suitable mode, some consideration has to be given to the estimation of the parameters, which in our model are the grades. The more complicated the model, the more results are required to estimate the parameters. Thus a more complicated model will not necessarily give a better grading system.
Estimation of Grade
1. Big Change Method
The broad principle behind the estimation formula is that given the result of a game the winner's grade is increased by a small amount, and at the same time the loser's grade is decreased by the same amount. Thus the total number of grade points in the system is unchanged. The amount that the grades are changed by depends on how unlikely the result was - the more unlikely, the greater the change.
Consider a game between A and B. If A wins increase his grade by Iw and if he loses decrease his grade by Il. Let D Xa be the change in A's grade. The expected change in A's grade E(D Xa) is zero as we assume both Xa and Xb are correct grades. Now,
E (D Xa) = Pab Iw - (1 - Pab) Il
Thus we let
i.e. Iw is proportional to (1 - Pab), the probability of B winning, and Ie is proportional to -Pab, the probability of A winning. In other words, I is proportional to the difference between the actual result (1 or 0, corresponding to win or loss) and the expected result, Pab. Thus the change is greater the more unlikely the actual result. An intuitive reason for this is that the result is only unlikely because our estimates of the two grades are inaccurate and so we wish to alter our estimates. The more inaccurate the estimates, the greater the change. C is a constant which affects the total amount the grades are changed as a result of one game. The setting of C is quite crucial to the performance of the system and in the present system is one of 4, 5 or 6, depending on the game's importance.
2. Smoothed Method
If C is set to a reasonable value it is found that the formula above gives a grade figure which is too volatile. To circumvent this it is necessary to provide some smoothing, whereby the change in the grade depends not only on the last game, but also on the most recent previous ones. Mulliner does this by taking an average of the grade over several games to produce a ranking - the figure that is published. This is normally a 20 game average but since this will be too sluggish for a rapidly improving or deteriorating player, this can be lowered to 10 or even 5 games depending on how rapid the change is. The major problem with this is the sharp discontinuities introduced when converting from a five game average to a ten game average and similarly from ten to twenty. Also the figures of 5, 10 and 20 seem rather arbitrary.
I propose a different method that is relatively simple but highly effective. Define a figure f which will be called a player's current form (relative to his grade). It will be positive if his recent results are better than his grade would indicate and negative if they are worse. After every game f is recalculated according to
fnew = fold + 1, 0 < a < 1
where I is the increment as calculated above (but with C much smaller, typically 0.7) and can be positive or negative, and a is the momentum constant and is typically 0.9. Then the grade is recalculated simply by adding the form figure to the old value i.e.
Xnew = Xold + fnew
Note that changing grades using this method satisfies the condition that the total number of grade points in the system is constant.
These formulae effectively take an exponentially weighted average of the grades produced by the Big Change method but without having to store all the most recent grades individually. The increment produced by the Big Change method is applied to the grade over the course of several games and so for a given size increment (i.e. a given value of the constant C), a much smoother grade is produced.
I have performed several computer simulations of such a grading system and made comparisons with the present system. I will gladly send details to any interested readers but for now the conclusions will suffice. At the very least the smoothed method is definitely no worse than the big change method and I would claim it behaves considerably better. For a rapidly changing player the sharp discontinuities no longer occur. The present system cannot cope with a rise in grade of 0.5 per game, which is equivalent to a rise in grade from 100 to 150 in 100 games, but lags behind by an average of 11 points. This compares with a figure of 7 points for the smoothed method.
As an estimation of accuracy, the probable error (a 50% confidence limit) for the current system is 2.6 points, so it is only differences of three or more points which are significant. With the smoothed method the probable error is 3.7 points. However this figure does not take into account the slow response to rapid changes and so is biased towards the current system.
The true test of any grading system is to see how it works in practice. It has been shown though that the Continuous Grading System does have a very pure mathematical basis to it. The system is also relatively easy to implement with only two figures needing to be stored at any stage, one of which is the grade.
From "Croquet" pp8-9, July 1988, Issue No. 198
All rights reserved © 1998