|
Technical
A Continuous Grading System
By Tim Harrison
The croquet grading system has been the source of much comment recently. Unfortunately
some of this has been rather uninformed criticism, based largely on what the
critics think the ranking order should be, rather than on possible drawbacks
in the system. Steve Mulliner described the mechanics of his system in the
November 1986 'Croquet' leaving aside the questions of why. In this article
I will explain the theory underlying this system and then 1 will propose an
enhancement that is both elegant and original.
Introduction
The objective of any grading system is to provide a figure, the grade, for
every player in a pool of players. This figure says something about the ability
of a player. In the system to be described, it is a measure of the player's
current ability. To be more specific, the difference between two player's grades
is related to the expected outcome of any game between them.
As well as defining what the grade means, it is necessary to find an objective
method for estimating the grade. If this is to be a general method, the only
information that can be used from a single game is the result
(win/loss/draw) - the significance of the score is too game dependent. The
problem then is that there is very little information arising from any single
game. Moreover we are also often working with a small sample of games, which
further compounds our difficulties. Two desirable properties of any grading
system are that grades can be continuously updated and secondly that there
is a sound theoretical basis.
Definition of grade
The first step in designing a grading system is to decide upon a mathematical
model which accurately explains past results and can be used to predict future
results. We will use a relatively simple model, in which one figure, the grade,
is assigned to each player in the pool. The meaning of this figure is seen
by examining how the model predicts the result of a game between two of the
players.
Consider two players A and B with grades and Xa and Xb. The model gives the
expected probability of A beating B as:
This probability function is called the logistic function. For those whose
mathematics may be slightly rusty this looks like a sigmoid. If A is much worse
than R then Pab is near but always greater than zero, whereas if A is far superior
the probability approaches one. For the gamblers a probability can be converted
to odds according to the formula:
The table shows the probabilities and corresponding odds for a series of
grade differences.
| Xa-Xb |
Pab |
Odds |
| -50 |
0.09 |
10/1 |
| -40 |
0.14 |
6/1 |
| -30 |
0.20 |
4/1 |
| -20 |
0.29 |
5/2 |
| -10 |
0.39 |
6/4 |
| 0 |
0.50 |
Evens |
| 10 |
0.61 |
6/4 on |
| 20 |
0.71 |
5/2 on |
| 30 |
0.80 |
4/1 on |
| 40 |
0.86 |
6/1 on |
| 50 |
0.91 |
10/1 on |
This formula does rather appear 'out of thin air' However its validity can
be evaluated by considering the situation of three players, A, B and C. What
is the relationship between Pab, Pbc and Pac. From the above equation if can
be shown that:
By inspecting what happens when some numbers are put in, it can be seen that
the model is reasonable for most games, croquet included. However if this is
not the case, then the grading system derived from it is likely to be inaccurate.
This probability function does seem a sensible choice but it is not essential
to use it. There are some major constraints on the function though. Firstly
it must always be increasing from 0 to 1 in a continuous fashion and secondly
it must be true that Pab + Pba =1.
A player's grade is in theory only relative in that it is only the difference
between two grades that has real meaning - it gives the odds of each player
winning. However in practice, with a large enough pool of players, the overall
ability of the pool can be considered stationary and then a player's grade
does say something about his ability. Note though that the absolute value of
a grade is arbitrary.
There is an analogy between a grading scale and a temperature scale. In the
Celsius scale 0 is chosen as the freezing point of water and there are 100
degrees between the freezing and boiling points of water. Both these numbers
are arbitrary and in fact for the Fahrenheit scale the numbers are chosen are
32 and 180. However the freezing point of water is the same temperature on
both scales. What is important though is to pick a scale and then to stick
to it.
In choosing a suitable mode, some consideration has to be given to the estimation
of the parameters, which in our model are the grades. The more complicated
the model, the more results are required to estimate the parameters. Thus a
more complicated model will not necessarily give a better grading system.
Estimation of Grade
1. Big Change Method
The broad principle behind the estimation formula is that given the result
of a game the winner's grade is increased by a small amount, and at the same
time the loser's grade is decreased by the same amount. Thus the total number
of grade points in the system is unchanged. The amount that the grades are
changed by depends on how unlikely the result was - the more unlikely, the
greater the change.
Consider a game between A and B. If A wins increase his grade by Iw and if
he loses decrease his grade by Il. Let D Xa
be the change in A's grade. The expected change in A's grade E(D Xa)
is zero as we assume both Xa and Xb are correct grades. Now,
E (D Xa) = Pab Iw - (1 - Pab)
Il
Thus we let
i.e. Iw is proportional to (1 - Pab), the probability of B winning, and Ie is proportional to -Pab, the probability of A winning. In other words, I is
proportional to the difference between the actual result (1 or 0, corresponding
to win or loss) and the expected result, Pab. Thus the change is greater the
more unlikely the actual result. An intuitive reason for this is that the result
is only unlikely because our estimates of the two grades are inaccurate and
so we wish to alter our estimates. The more inaccurate the estimates, the greater
the change. C is a constant which affects the total amount the grades are changed
as a result of one game. The setting of C is quite crucial to the performance
of the system and in the present system is one of 4, 5 or 6, depending on the
game's importance.
2. Smoothed Method
If C is set to a reasonable value it is found that the formula above gives
a grade figure which is too volatile. To circumvent this it is necessary to
provide some smoothing, whereby the change in the grade depends not only on
the last game, but also on the most recent previous ones. Mulliner does this
by taking an average of the grade over several games to produce a ranking -
the figure that is published. This is normally a 20 game average but since
this will be too sluggish for a rapidly improving or deteriorating player,
this can be lowered to 10 or even 5 games depending on how rapid the change
is. The major problem with this is the sharp discontinuities introduced when
converting from a five game average to a ten game average and similarly from
ten to twenty. Also the figures of 5, 10 and 20 seem rather arbitrary.
I propose a different method that is relatively simple but highly effective.
Define a figure f which will be called a player's current form (relative to
his grade). It will be positive if his recent results are better than his grade
would indicate and negative if they are worse. After every game f is recalculated
according to
fnew = fold + 1, 0 < a < 1
where I is the increment as calculated above (but with C much smaller, typically
0.7) and can be positive or negative, and a is the
momentum constant and is typically 0.9. Then the grade is recalculated simply
by adding the form figure to the old value i.e.
Xnew = Xold + fnew
Note that changing grades using this method satisfies the condition that
the total number of grade points in the system is constant.
These formulae effectively take an exponentially weighted average of the grades
produced by the Big Change method but without having to store all the most
recent grades individually. The increment produced by the Big Change method
is applied to the grade over the course of several games and so for a given
size increment (i.e. a given value of the constant C), a much smoother grade
is produced.
I have performed several computer simulations of such a grading system and
made comparisons with the present system. I will gladly send details to any
interested readers but for now the conclusions will suffice. At the very least
the smoothed method is definitely no worse than the big change method
and I would claim it behaves considerably better. For a rapidly changing player
the sharp discontinuities no longer occur. The present system cannot cope with
a rise in grade of 0.5 per game, which is equivalent to a rise in grade from
100 to 150 in 100 games, but lags behind by an average of 11 points. This compares
with a figure of 7 points for the smoothed method.
As an estimation of accuracy, the probable error (a 50% confidence limit)
for the current system is 2.6 points, so it is only differences of three or
more points which are significant. With the smoothed method the probable error
is 3.7 points. However this figure does not take into account the slow response
to rapid changes and so is biased towards the current system.
The true test of any grading system is to see how it works in practice. It
has been shown though that the Continuous Grading System does have a very pure
mathematical basis to it. The system is also relatively easy to implement with
only two figures needing to be stored at any stage, one of which is the grade.
From "Croquet" pp8-9, July 1988, Issue No. 198
Author: Tim Harrison
All rights reserved © 1998
|