Using Elo rating system to rank players within a club

Discussion in 'General Forum' started by Line & Length, Nov 19, 2010.

  1. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    The 'how do we work out handicaps for internal club tournaments' conversation reared its ugly head last week. Having played chess in my (mis-spent) youth, I was familiar with the Elo ranking system (see the article 'Elo rating' in Wikipedia for details). Essentially, the better your ranking relative to your opponent, the fewer points you get for winning and the more points you lose for losing.

    Does anyone have any experience with using the Elo rating system in badminton? If so, which C & K-factors did you use & how did you combine ratings for doubles?
     
  2. amleto

    amleto Regular Member

    Joined:
    Feb 12, 2008
    Messages:
    2,890
    Likes Received:
    89
    Location:
    UK
    how does a rating (elo or otherwise) give you a handicap??

    hint: it doesnt

    so no only would you have to calculate rankings throughout some period of time, you would then have to devise a method of converting the ratings of the players into a handicap.
     
  3. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    I will answer in brief because all the detail about how it works is contained in the aforementioned article.

    The Elo rating system uses relative rankings (& a scaling 'C' factor) to generate an expected result (which is a number between 0 and 1). This is then compared to the actual result (also between 0 & 1) & the difference is (scaled by a 'K' factor &) then added or subtracted from the players ranking. Therefore, over time, the rankings will represent the relative strengths of the players.

    The system I was thinking about was to use the relative rankings to estimate the proportion of points that a player would win. For example 0.6 would equate to a 21-14 win (21 out of 35 points equals 0.6). Then, depending upon the proportion of points won by the players, their rankings would be adjusted. For example, if our player won 21-7 (21 out of 28 equals 0.75) their ranking would rise by K * (0.75 -0.60). Similarly, their opponent's ranking would fall by the same amount.

    Needless to say, past results would need to be recorded, but the overhead of re-calculating rankings can be done easily in a spreadsheet. So, other than recording who played who and the score and then 5 minutes of manual data entry, there's no massive outlay.

    One point that does need to be cleared is the method of combining the rankings of 2 players to make a single doubles ranking. I was hoping that someone out there had some guidance on that.
     
  4. hhwoot

    hhwoot Regular Member

    Joined:
    Sep 22, 2008
    Messages:
    163
    Likes Received:
    0
    Occupation:
    Graduate Student
    Location:
    Urbana, IL
    This is certainly an interesting idea. If you really want to do an accurate Elo rating system, you probably need the correct badminton points distribution model. I think one of my previous threads should be an interesting read for you.

    http://www.badmintoncentral.com/for...adminton-Win-Probability-From-Points-to-Games

    I'll have to think about making such an Elo system, and I am interested in what you come up with.
     
  5. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    Reading my last post, I realised that I haven't specifically addressed amleto's question with regard to handicapping. I'll do so here:

    A benchmark is set (typically half-way between the highest and lowest rankings). The expected result of each participant (x) would then be evaluated against the benchmark:
    S_x = P * 10 ^ ( (R_x - R_bm) / C), where P is the number of points in a game, C is a scaling parameter described in the last post, R_x is the participant's rating and R_bm is the rating of the benchmark.

    For example, in a tournament up to 21 points and a scaling factor of 400, a participant who's rating is 50 higher than the benchmark would get S_x = 21 * 10 ^ ( 50/400) = 28.004. In other words, in the time for the benchmark to get to 21 points, our player would get 28 and a small bit. Therefore, a handicap of -7 would bring the player in line with the benchmark.

    Therefore, the general formula for a handicap is:
    H_x = P * ( 1 - 10^((R_x - R_bm)/C) )
     
  6. amleto

    amleto Regular Member

    Joined:
    Feb 12, 2008
    Messages:
    2,890
    Likes Received:
    89
    Location:
    UK
    thats qutie interesting.

    I think trying to get a handicap for pairs from singles ranking is only good enough for first-order estimate though. Just down to the fact that some people play together much better than others. That is to say that four people of similar abilites do not necessarily make 6 pairs of similar abilities

    1-2
    1-3
    1-4

    2-3
    2-4

    3-4

    Maybe this first estimate is 'good enough', otherwise it looks to be necessary to track doubles games as well as singles.
     
  7. hhwoot

    hhwoot Regular Member

    Joined:
    Sep 22, 2008
    Messages:
    163
    Likes Received:
    0
    Occupation:
    Graduate Student
    Location:
    Urbana, IL
    Just some quick thoughts during lunch break.

    Use the Elo ratings to set the point win probability between two players (this is, of course, a little bit arbitrary):

    Elo of player A = Elo_A = 1500
    Elo of player B = Elo_B = 1000

    Probability of player A winning any one point from B = (Elo_A)/(Elo_A + Elo_B)
    = (1500)/(1500+1000) = 0.6

    Then we can use the point distribution to determine the expected game scores to see if the players performed above or below expectations.

    Thinking about it, the Elo system probably doesn't really work with single elimination tournaments. Since players knocked out early will not have another chance to play.

    For the doubles ratings, it's probably best to use a separate rating for doubles play. Although using the average of the pair is probably good as an initial rating.

    I'm in a bit of a hurry, so sorry for the bits of rambling.
     
  8. urameatball

    urameatball Regular Member

    Joined:
    Jul 24, 2010
    Messages:
    417
    Likes Received:
    4
    Occupation:
    Photographer
    Location:
    Drill-Sergeant Troll-Face
    I generally dislike these handicap systems in sports with so many variables.
    Oftentimes, 3 players, A B C.
    A>B>C, but C>A... because of different play styles.
    If in the past, C loses to B, and B loses to A. C supposedly has a large handicap if C plays A. However, everyone knows that C can beat A even without a handicap. In which case, the handicap is not evening the odds, but actually stacking the odds against the weaker player.

    If you wanna nerd out using handicaps, go bowling or play some golf.
     
  9. alexh

    alexh Regular Member

    Joined:
    May 19, 2009
    Messages:
    408
    Likes Received:
    1
    Location:
    Adelaide, Australia
    Ratings for A, B and C would depend on how well they play against other people, not only against each other. It might turn out that on average they're all the same strength, so no handicap needed.
     
  10. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    Glad hhwoot got in touch. His excellent thread (see #4) was one of the inspirations for this, though I'd forgotton where I'd seen it. One thing I'd like to correct on is that the predicted score depends upon the difference in ratings. Therefore the predicted score for 1000vs800 would the same as 2000vs1800 etc. The average of the group is arbitary & elo works fine with negative ratings. However, I'm not sure my ego could handle a negative rating!

    I agree that individuals should have separate 'singles' and 'doubles' ratings. I was going to just add 2 players doubles ratings together to make a pair's rating, but the average would be better. Presuming a player is equally 'good' at singles and doubles, if they are 400 points better in singles, they would roughly be only 200 points better in doubles. Taking the average would make the ratings more comparible.

    With regards to A>B>C>A, Elo is a reflection of the average performance, so there is always going to be some variation. You can't tell me that A's handicap should depend upon whether B and/or C is in their group in a tournament.

    Reading the wikipedia article, Elo's intention was that a 200 point difference in rating should equate to the stronger player having a 75% chance of winning. From hhwoot's thread, a point percentage of 55% leads to a 75% chance of winning in a game to 21. Rearranging the equation of prediced proportion, this leads to a 'C' factor of 2200 (compared with 400 in chess). Other rearrangement yields:
    A = P.10^(-del_R/C), where A is the score of the expected loser, P is the length of game in points, del_R is the difference in ratings & C is the scaling parameter. Therefore a rating difference of 400 in a game to 21 would lead to a prediced score of 21*10^(-400/2200) = 13.82

    Finally, one of the nice things about doing Elo against proportion of points won is that it can be used for different game lengths and games that may go to a deciding game. However, this can lead to the issue where the player/pair who wins the most point loses the game.
     
  11. PopsiclePete

    PopsiclePete Regular Member

    Joined:
    Jan 14, 2009
    Messages:
    326
    Likes Received:
    0
    Location:
    London, Ontario
    Listen big shot, don't make me come there and show you how abc works...you know on my greatest day and your worst I can clean the floor with you :D :D :D
     
  12. alexh

    alexh Regular Member

    Joined:
    May 19, 2009
    Messages:
    408
    Likes Received:
    1
    Location:
    Adelaide, Australia
    Probably you want a weighted average: if two players differ in strength a lot, then opponents will tend to hit to the weaker player, so the rating of the pair will be more like 2/3 of the lower rating plus 1/3 of the higher rating.
     
  13. urameatball

    urameatball Regular Member

    Joined:
    Jul 24, 2010
    Messages:
    417
    Likes Received:
    4
    Occupation:
    Photographer
    Location:
    Drill-Sergeant Troll-Face
    For me... you can clean the floor FOR me, LOL
     
  14. phil-mm

    phil-mm Regular Member

    Joined:
    Aug 19, 2010
    Messages:
    2,304
    Likes Received:
    1
    Occupation:
    None
    Location:
    Misty Mountains
    The ELO rating system has its inherent flaws. It is inflationary with time. Actually, I think it depends on the total number of players having ELO ratings. Let me illustrate.
    These were the ratings of the top ten players in the world in January 1980

    Rating

    Anatoly Karpov
    2725

    Mikhail Tal
    2705

    Victor Korchnoi
    2695

    Lajos Portisch
    2655

    Lev Polugaevsky
    2635

    Boris Spassky
    2615

    Tigran Petrosian
    2615

    Zoltan Ribli
    2610

    Florin Gheorghiu
    2605

    Yuri Balashov
    2600


    30 years later, in January 2010

    Rating

    Magnus Carlsen
    2810

    Veselin Topalov
    2805

    ViswanathanAnand
    2790

    Vladimir Kramnik
    2788

    Levon Aronian
    2781

    Boris Gelfand
    2761

    Vugar Gashimov
    2759

    Vassily Ivanchuk
    2749

    Wang Yue
    2749

    Peter Svidler
    2744


    So, either there is a tremendous improvement in the playing skills of the world's best, or the ratings cannot be taken for its absolute value. Of course, with the advent of all the computer programs and other aids, the levels of the players are different from those of the earlier generations. But it cannot account for the meteoric rise of the ratings. The ELO rating system, and other similar ones, are useful only when comparing relative strengths of players at a particular instant, not of players that are years or decades apart. So we'll never really know if Garry Kasparov is actually better than Jose Capablanca or Mikhail Botvinnik
     
    #14 phil-mm, Nov 24, 2010
    Last edited: Nov 24, 2010
  15. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    phil-mm is correct to state that Elo doesn't give an estimation of absolute standard. However, I don't believe that we are/should be asking it to.

    Elo gives an estimation of result based on the difference between 2 ratings (see first paragraph of post #10). For the purposes of handicapping players within a relatively closed group (e.g. the players at a club) and/or currently ranking those players relative to one another, this is fine.

    That said, Elo may struggle to deal with new or returning players. Their ratings may no longer be applicable to the changed group dynamic. In that case, their ratings may have to be artificially tweaked to be more representative. New-comers would also have to have their initial ratings estimated.

    Interesting idea from alexh (post #12). Would suggest that we'd need some experimental data to draw conclusions on the split between higher/lower doubles rating. Another question would be what proportion of the rating points gained or lost would go to the higher ranked player.
     
  16. phil-mm

    phil-mm Regular Member

    Joined:
    Aug 19, 2010
    Messages:
    2,304
    Likes Received:
    1
    Occupation:
    None
    Location:
    Misty Mountains
    In fact, I think the Elo rating system may not even truly reflect relative strengths. For many years, I had been pondering over why the ratings are inflationary, but without making much headway. It's only recently that I have some inkling as to how it works. Imagine a large pyramid, where the volume represents the total number of players, and the height their ratings. Within this pyramid are many smaller ones situated at different heights. These represent the tournaments of different categories. The stronger players will cannibalise the ratings of the weaker ones and move upwards. The larger the pyramid, the higher will be the apex. This is what has happened over the last few decades. The total number of players has increased significantly. Elo should have modified his formula to account for this increase.
    Also, imagine two separate communities of players with differing populations who do not interact with each other. Two players of the same standard playing in each community will evolve to a rating that is different from each other, due to a difference in the population. As long as they stay within their own communities, their relative strengths within their communities are accurate. When they play with each other, obviously their relative strength is wrong. So, technically, the Elo system does not really work. It is just an approximation. How complicated is Elo, encompassing both Einsteinian stuff and Darwinism.
     
    #16 phil-mm, Nov 24, 2010
    Last edited: Nov 24, 2010
  17. Line & Length

    Line & Length Regular Member

    Joined:
    Nov 9, 2010
    Messages:
    220
    Likes Received:
    4
    Location:
    Worcestershire
    The pyramid analogy is a good one, but I believe that a normal distribution (upon which Elo is based) is an even better one. As the number of players increase, the top 10 go from being the top 1 in a 1000 to the top 1 in 10000 etc. As the proportion gets smaller, the sigma (deviation from the mean) gets larger, hence inflation. I still maintain that the absolute strength (and its inflation) isn't relevant to a single club of players.

    However, 'separate communities' is a real issue & could also be a contributary factor to top 10 inflation. Club players will very rarely (if ever) play internationals, so the difference in their ratings won't be accurate. However, national players tend to beat club players and internationals tend to beat nationals, so the ratings gap may be unrealistically high.

    For just the players of one club, the 'separate community' issue is kept to a minimum, so long as most players regularly play with and against most of the other players. However, clubs that have multiple 'tiers' will suffer from this issue & elo isn't appropriate for when players from 2 groups 'mix'.

    As with any system, Elo isn't perfect. It is initially complicated (though once set-up, maintennance is relatively easy) & it is vulnerable to separate communities and ratings inflation.

    However, if its limitations are observed and accounted for, proper implementation of Elo does provide an unbiased estimator of relative ability.
     
  18. phil-mm

    phil-mm Regular Member

    Joined:
    Aug 19, 2010
    Messages:
    2,304
    Likes Received:
    1
    Occupation:
    None
    Location:
    Misty Mountains
    Yes, but of course. I had been wrongly assuming that the weakest players are the most numerous :eek:. That is not true. The pyramid model is misconceived. It has to a normal distribution. A larger curve will explain both inflation and deflation, which is also a feature of the Elo system. I just realised that both FIDE and USCF has mechanisms to combat this inflation and deflation.
     
  19. urameatball

    urameatball Regular Member

    Joined:
    Jul 24, 2010
    Messages:
    417
    Likes Received:
    4
    Occupation:
    Photographer
    Location:
    Drill-Sergeant Troll-Face
    has anyone considered that tournaments with the ELO rating will both be a spectator's and player's nightmare?
    Finals: 2 mediocre players who marginally exceeded their normal playing performance.
    And any match-up where the best player meets the worst player, the best player will need to completely destroy the worst player in order to win. Competitive players might start head hunting in these matches.
     
  20. Gollum

    Gollum Regular Member

    Joined:
    May 23, 2003
    Messages:
    4,642
    Likes Received:
    298
    Location:
    Surrey, UK

    The Elo rating system is not a handicap system, although it could be used to help assign handicaps more objectively.

    The Elo system is a way of measuring how good a player is, relative to other players. A tournament could specify a range of allowed Elo ratings; this would determine the playing standard of the tournament. Using Elo doesn't necessarily imply a handicap tournament.

    Imagine an international tournament where some amateur players can win places by lottery. I win a ticket, and I get to play against Lin Dan. I have an Elo rating of (say) 1700, and Lin Dan has an Elo rating of 2800 (I'm just using rough numbers from chess here).

    The difference in Elo ratings indicates that Lin Dan is a much, much stronger player than I am, and will almost certainly win. Indeed, you can use the ratings to estimate the probability of him winning (nearly 100%, in this case).

    He still wins a match by winning 2 games of 21 points -- there's no handicap applied here. Even if the score line is 21--19, 21--19 (in his favour), he still wins (obviously that's not going to happen. I'd be delighted with even one point).

    The improvement in his Elo rating will be tiny, because I'm such an easy player for him to beat. In the extremely unlikely event that I beat Lin Dan, however, my rating will improve quite a lot, and his rating will suffer quite badly.
     
    #20 Gollum, Nov 25, 2010
    Last edited: Nov 25, 2010

Share This Page