# Badminton Win Probability - From Points to Games

Discussion in 'General Forum' started by hhwoot, Aug 13, 2010.

1. ### hhwoot Regular Member

Joined:
Sep 22, 2008
Messages:
163
0
Occupation:
Location:
Urbana, IL
[SIZE=+2]Badminton Win Probability - From Points to Games[/SIZE]

This is a statistical analysis of how the probability of winning a single point can be used to calculate the probability of winning a game. For this analysis, I assume that each point/game is independent of any other. This means that no psychological effects are considered (chance to win a point with a 10 point lead is the same as with a 10 point deficit).

If we take the probability to win a single point as a measure of the skill difference between players. Then this analysis will show both the expected result of a match and the expected points breakdown in a game.

The rally scoring system will be used. In which:
A point is scored on each serve.
A game is won by the first player to:

• reach 21 points with at least a two point lead over the opponent
• after 20:20, score a two point lead over the opponent
• after 20:20, reach 30 points
A match is won by the first player to win two games.

[SIZE=+2]Introduction[/SIZE]

Consider a match between players A and B. Let:
p_point = probability of player A winning any one point.
p_game = probability of player A winning any one game.
p_match = probability of player A winning the match.

Due to the simple combinatorics of winning two games out of three, It's trivial to see that:
Code:
`p_match = p_game * p_game + 2 * p_game * p_game * (1 - p_game)`
In words this means that there is just one way for A to win two games in a row (Win-Win-END), and two ways for A to win two games and lose one game (Win-Lose-Win-END, or Lose-Win-Win-END).

To introduce a bit more notation, p_match can be rewritten as:
Code:
`p_match = (2,0) * p_game^2 + (2,1) * p_game^2 * (1 - p_game)`
Where the caret, "^", signifies exponentiation (5^3 = 5*5*5 = "five to the power of 3").

While the notation of (n,k) signifies the binomial coefficient, the number of ways k items can be chosen from a set of n items. For (2,1) in the case, we are choosing where the one lost game can occur in the set of two games (first or second game). There is only one lost game to consider since we are calculating for when player A wins. Also, we only consider putting it in either the first or second game since A would not have a chance to lose the third game if he won both earlier games.

[SIZE=+2]Analysis[/SIZE]

With the introduction out of the way, the true objective is to calculate how p_point is related to p_game. This is done in essentially the same way as above, where p_match was calculated from p_game.

Up until 20:20, p_game is easy to think about. Player A can win anywhere from 21:0 to 21:19. This means that for each particular score, A scores 21 points and B scores i points, where i is between 0 and 19. Also, for A to win, A will always score the last point. So the combinatorics is to chose where 20 points can be placed in (20+i) places.

The formula in pseudo-code is:
Code:
```p_game = 0
for i = 0:19
p_game = p_game + (20+i, 20) * p_point^21 * (1 - p_point)^i
end```
20:20 occurs when both A and B score 20 points each, this of course means that we are choosing 20 out of 40. The probability to reach 20:20 is:
Code:
`(40, 20) * p_point^20 * (1 - p_point)^20`
From here A needs a two point advantage in order to win the game (until 29:29). This means an extra factor of p_point^2. Also, going up the score charts, there needs to be a factor of 2 * p_point * (1 - p_point) every time deuce is reached. To explain in more detail, going from 20:20 to 21:21 requires one point for A and one point for B, and this can be done in two ways (A-B or B-A).

The formula for 22+ point victories are:
Code:
```for i = 0:8
p_game = p_game + (40, 20) * p_point^22 * (1 - p_point)^20 * (2 * p_point * (1 - p_point))^i
end```
Reaching 29:29, A just needs one more point to win, the probability to win 30:29 is simply:
Code:
`(40, 20) * p_point^21 * (1 - p_point)^20 * (2 * p_point * (1 - p_point))^9`
[SIZE=+2]

Results[/SIZE]

After putting everything together, lets plug in some numbers and get some results.

Code:
```p_point       p_game        p_match
0.5000        0.5000        0.5000
0.5500        0.7458        0.8390
0.6000        0.9086        0.9765
0.7000        0.9970        1.0000```
It's quite clear that even a small change in p_point leads to big changes in p_game. Winning 55% of rallies leads to winning 75% of games. Increasing that to 60% of rallies give 90% win probability for games.

The probability for how points breakdown for each game is plotted at the end of the post. The plots show results for the four p_points above for both A winning or B winning. The probability dip beyond 20 points is expected since that can only be achieved with a score of 20:20 at some point and can only be continued with deuces. The x-axis label kept getting cut off for some reason, but the last word is just "game".

In the next section, I will use the same method for tennis.

[SIZE=+2]Tennis Win Probability - From Points to Games to Sets[/SIZE]

I had worked out the above analysis for a while now, but had been too lazy to write it all up. GameGod's post, http://www.badmintoncentral.com/forums/showthread.php?88196, was what finally motivated me to do the analysis for tennis and post both results.

I will again be assuming a single probability for winning a point. This is much more questionable in tennis since the serve is such an advantage. Since this tennis analysis is just to answer GameGod's question, it should suffice. But if there is interest, I may redo the analysis to account for service advantage.

[SIZE=+2]Tennis Scoring[/SIZE]

There are more variations in tennis scoring than badminton. I will use the following:
A point is scored on each serve.
A game is won by the first player to:

• reach 4 points with at least a two point lead over the opponent
• after 3:3, score a two point lead over the opponent with no limit

A set is won by the first player to:

• reach 6 games with at least a two game lead over the opponent
• after 5:5, win two games in a row
• after 6:6, win seven points with a two point lead over the opponent with no limit
A match is won by the first player to win two/three sets.

[SIZE=+2]Analysis[/SIZE]

I will not be posting the tennis analysis, this is a badminton forum after all. The basic idea is the same, but there are more tricky parts. If there is interest to see the full analysis, I will consider posting them.

Consider a match between players A and B. Let:
p_point = probability of player A winning any one point.
p_game = probability of player A winning any one game.
p_set = probability of player A winning any one set.
p_match = probability of player A winning the match.

The main differences from badminton is the addition of a "set", this really just adds another level of analysis and is not that much of a problem. The main variations in tennis rules is due to tie-breaking. There are major differences in this aspect even in major tournaments.

Tie-breaking to win a game follows the deuce/advantage system, in which a two point lead is needed and there is no limit. However, there is also "no-advantage" scoring, where a game is won by the first player to win four point, full stop.

Tie-breaking to win a set is usually done with "seven point tie-break", where after reaching 6:6, a final game is played to a seven points and at least a two point lead with no limit. However, some tournaments still play with "advantage set", where the set continues until a player has a two game lead with no limit. This was shown recently with the 2010 Wimbledon first-round match between John Isner and Nicolas Mahut.

To win a match, Men's usually play best of 5 sets, while Women's usually play best of 3 sets. For best of three sets, the method to get p_match from p_set is the same as that as in badminton (where p_game is used to get p_match). For best of five sets, that is left as an exercise to the reader.

[SIZE=+2]Results[/SIZE]

Code:
```p_point       p_game        p_set
0.5000        0.5000        0.5000
0.5500        0.6231        0.8150
0.6000        0.7357        0.9634
0.7000        0.9008        0.9998```
It's again clear that a small change in p_point leads to big changes in p_game and p_set. It's worth mentioning that for the same p_point, both badminton and this tennis scoring variant give very similar p_set. This will allow us to compare the point breakdown in a game of badminton and the game breakdown in a set of tennis to answer GameGod's question.

The probability for how games breakdown for each set is plotted at the end of the post. The plots show results for the four p_points above for both A winning or B winning. While the distribution moves left just as the badminton point breakdown distribution, the tennis distribution moves much faster. The x-axis label again keeps getting cut off for some reason, the last word is just "set".

[SIZE=+2]GameGod's Question[/SIZE]

My answer is to compare the point distributions from badminton to the game distributions from tennis. This is done by taking groups of points in badminton and summing their individual probabilities and matching the sum to the probability for each game in tennis.

This comparison needs to be done when the distributions are calculated from the same p_point. This is justified since the p_game from badminton match the p_set from tennis for the same p_point.

For p_point = 0.5
Code:
```tennis       badminton
6 : 0        0-8
6 : 1        9-11
6 : 2        12-14
6 : 3        15-16
6 : 4        17-18
7 : 5        19
7 : 6        20-29```
For p_point = 0.7
Code:
```tennis       badminton
6 : 0        0-8
6 : 1        9-12
6 : 2        13-15
6 : 3        16-18
6 : 4        19-20
7 : 5        21-22
7 : 6        23-29```
Due to the integer nature of points and games, the sums do not match up exactly. However, I think it is sufficient given the coarse nature of the tennis analysis. If a comparison between a match of badminton and a match of tennis was desired, it would probably require a tedious cross multiplication between the distributions before the summation comparison.

I hope this has been an interesting analysis for everyone to read. Thanks.

- hhwoot

#### Attached Files:

File size:
7.7 KB
Views:
804
File size:
7.1 KB
Views:
806
File size:
6.8 KB
Views:
818
File size:
6.4 KB
Views:
823
File size:
5.5 KB
Views:
891
File size:
7.4 KB
Views:
795
File size:
6.2 KB
Views:
782
• ###### tennis_050.png
File size:
5.9 KB
Views:
787
#1
Last edited: Aug 13, 2010

Joined:
Sep 22, 2009
Messages:
8,358
13
Location:
London, UK
Wow, really intensive and impressive work done. Thumbs up for this, I'm sure this will satisfy a lot of curious readers. Job well done!

#2
3. ### hhwoot Regular Member

Joined:
Sep 22, 2008
Messages:
163
0
Occupation:
Location:
Urbana, IL
Supplemental - Professional Results

[SIZE=+2]Supplemental - Professional Results[/SIZE]

I have compiled some results of professional players from http://www.tournamentsoftware.com to show how my analysis performs.

I've selected Lin Dan, Lee Chong Wei, and Peter Gade as the players and have complied the results from their 8 most recent major tournaments. I have also chosen to apply the analysis for each of their head to head matches (for only the rally scoring, of course).

It should not be too surprising to see that that analysis does not match up perfectly for each tournament. Since each player only plays about 10 games total in each tournament, one game difference translates into ~0.1 difference in p_game.

The results are much better for the TOTAL over 8 tournaments. We see that Lin Dan performs a bit better than his point percentage predicts, Lee Chong Wei performs a bit worse, and Peter Gade performs as predicted. However, it should be noted that the deviation is only about two games out of ~75 games.

The head to head results are much more interesting. Since it does away with the variable of playing against other player. We can see that Lin Dan holds a bit of an advantage over both Lee Chong Wei and Peter Gade, while Lee Chong Wei has a more significant advantage over Peter Gade.

It may be interesting to do the same analysis with top tennis players. I have not found (have not looked) for a site that provides an easy to read point/game/set breakdown for tennis tournaments. But I'm sure there's one out there.

Code:
```Lin Dan        Points Points Actual        Expected      Actual
Tournament     Won    Played p_point       p_game        p_game
2010 TC        214    342    0.6257        0.9539        1.0000 (10/10)
2010 SO        117    206    0.5680        0.8163        0.6667 (4/6)
2010 AE        138    261    0.5287        0.6477        0.7143 (5/7)
2009 CO        210    337    0.6231        0.9503        1.0000 (10/10)
2009 FO        212    354    0.5989        0.9061        1.0000 (10/10)
2009 CM        241    440    0.5477        0.7359        0.8333 (10/12)
2009 WC        266    456    0.5833        0.8657        0.9231 (12/13)
2009 IO        127    244    0.5205        0.6067        0.7143 (5/7)
TOTAL         1525   2640    0.5777        0.8488        0.8800 (66/75)
```
Code:
```Lee Chong Wei  Points Points Actual        Expected      Actual
Tournament     Won    Played p_point       p_game        p_game
2010 IO        218    359    0.6072        0.9237        0.9091 (10/11)
2010 SO        173    317    0.5457        0.7272        0.5556 (5/9)
2010 TC        151    252    0.5992        0.9068        0.7500 (6/8)
2010 AE        224    401    0.5586        0.7811        0.9091 (10/11)
2010 MO        248    421    0.5891        0.8819        0.8333 (10/12)
2010 KO        210    325    0.6462        0.9755        1.0000 (10/10)
2009 CO         44     99    0.4444        0.2309        0.3333 (1/3)
2009 HKO       236    428    0.5514        0.7518        0.8333 (10/12)
TOTAL         1504   2602    0.5780        0.8497        0.8158 (62/76)
```
Code:
```Peter Gade     Points Points Actual        Expected      Actual
Tournament     Won    Played p_point       p_game        p_game
2010 SO        198    348    0.5690        0.8198        0.7000 (7/10)
2010 TC        118    197    0.5990        0.9063        0.6667 (4/6)
2010 SO        163    295    0.5525        0.7564        0.7500 (6/8)
2010 AE        182    360    0.5056        0.5295        0.6667 (6/9)
2010 MO        133    243    0.5473        0.7342        0.7143 (5/7)
2010 KO        222    449    0.4944        0.4705        0.6667 (8/12)
2009 HKO       239    419    0.5704        0.8247        0.7500 (9/12)
2009 FO        174    315    0.5524        0.7559        0.6667 (6/9)
TOTAL         1429   2626    0.5442        0.7205        0.7250 (58/80)
```
Code:
```Head to Head   Points Points Actual        Expected      Actual
Won    Played p_point       p_game        p_game
LD - LCW       658   1253    0.5251        0.6299        0.6000 (21/35)
LD - PG        260    489    0.5317        0.6623        0.6923 (9/13)
LCW - PG       388    702    0.5527        0.7572        0.7895 (15/19)
```
- hhwoot

#3
4. ### Gollum Regular Member

Joined:
May 23, 2003
Messages:
4,533
144
Location:
Surrey, UK
Fascinating. Although I don't have time to do more than scan this, I'll be interested to look at it more closely later on.

#4
5. ### Andy05 Regular Member

Joined:
Jun 18, 2005
Messages:
478
1
Occupation:
Location:
Stockton-on-Tees, UK
I am impressed at the accuracy over the players averages.
Very interesting, well done.

#5
6. ### stumblingfeet Regular Member

Joined:
Jun 16, 2004
Messages:
1,121
13
Location:
Ottawa
Interesting stuff. I wonder if it would be possible to extend this by classifying different levels of players and looking at which point there are significant shifts in winning percentage relative to point winning percentage. I can see top players having this distribution of point percentage won vs player level where it's closest to 50% at the highest level, goes up as the player level decreases, then dips a bit as effort reduces against "easy" opponents that are reasonably skilled, then goes up rapidly as the level difference is too large.

#6
7. ### Dedes New Member

Joined:
Jul 22, 2016
Messages:
6
0
Location:
Makati
HI,

isn't there a problem with this ? I mean, there is probability for player A to win a point (Pt A) and it was assumed that probability for B to win a point was the 1 - PtA. But in fact there should be 4 percentages:
When A is serving B has a winning percent to win the point as a receiver. And then when B is serving there is a probability for him to win serves and A to win as a receiver.
Don't you agree ?

#7
8. ### SolsticeOfLight Regular Member

Joined:
Mar 29, 2012
Messages:
1,203
113
Location:
Belgium
You can make it as complicated as you want. The original post takes this complexity out of the game in order to make the maths easier. If you want to know who's serving, you have to know who won the previous point, and then your maths is going to get exhaustingly complicated.

#8