[SIZE=+2]Badminton Win Probability - From Points to Games[/SIZE] This is a statistical analysis of how the probability of winning a single point can be used to calculate the probability of winning a game. For this analysis, I assume that each point/game is independent of any other. This means that no psychological effects are considered (chance to win a point with a 10 point lead is the same as with a 10 point deficit). If we take the probability to win a single point as a measure of the skill difference between players. Then this analysis will show both the expected result of a match and the expected points breakdown in a game. [SIZE=+2]Badminton Scoring[/SIZE] The rally scoring system will be used. In which: A point is scored on each serve. A game is won by the first player to: reach 21 points with at least a two point lead over the opponent after 20:20, score a two point lead over the opponent after 20:20, reach 30 points A match is won by the first player to win two games. [SIZE=+2]Introduction[/SIZE] Consider a match between players A and B. Let: p_point = probability of player A winning any one point. p_game = probability of player A winning any one game. p_match = probability of player A winning the match. Due to the simple combinatorics of winning two games out of three, It's trivial to see that: Code: p_match = p_game * p_game + 2 * p_game * p_game * (1 - p_game) In words this means that there is just one way for A to win two games in a row (Win-Win-END), and two ways for A to win two games and lose one game (Win-Lose-Win-END, or Lose-Win-Win-END). To introduce a bit more notation, p_match can be rewritten as: Code: p_match = (2,0) * p_game^2 + (2,1) * p_game^2 * (1 - p_game) Where the caret, "^", signifies exponentiation (5^3 = 5*5*5 = "five to the power of 3"). While the notation of (n,k) signifies the binomial coefficient, the number of ways k items can be chosen from a set of n items. For (2,1) in the case, we are choosing where the one lost game can occur in the set of two games (first or second game). There is only one lost game to consider since we are calculating for when player A wins. Also, we only consider putting it in either the first or second game since A would not have a chance to lose the third game if he won both earlier games. [SIZE=+2]Analysis[/SIZE] With the introduction out of the way, the true objective is to calculate how p_point is related to p_game. This is done in essentially the same way as above, where p_match was calculated from p_game. Up until 20:20, p_game is easy to think about. Player A can win anywhere from 21:0 to 21:19. This means that for each particular score, A scores 21 points and B scores i points, where i is between 0 and 19. Also, for A to win, A will always score the last point. So the combinatorics is to chose where 20 points can be placed in (20+i) places. The formula in pseudo-code is: Code: p_game = 0 for i = 0:19 p_game = p_game + (20+i, 20) * p_point^21 * (1 - p_point)^i end 20:20 occurs when both A and B score 20 points each, this of course means that we are choosing 20 out of 40. The probability to reach 20:20 is: Code: (40, 20) * p_point^20 * (1 - p_point)^20 From here A needs a two point advantage in order to win the game (until 29:29). This means an extra factor of p_point^2. Also, going up the score charts, there needs to be a factor of 2 * p_point * (1 - p_point) every time deuce is reached. To explain in more detail, going from 20:20 to 21:21 requires one point for A and one point for B, and this can be done in two ways (A-B or B-A). The formula for 22+ point victories are: Code: for i = 0:8 p_game = p_game + (40, 20) * p_point^22 * (1 - p_point)^20 * (2 * p_point * (1 - p_point))^i end Reaching 29:29, A just needs one more point to win, the probability to win 30:29 is simply: Code: (40, 20) * p_point^21 * (1 - p_point)^20 * (2 * p_point * (1 - p_point))^9 [SIZE=+2] Results[/SIZE] After putting everything together, lets plug in some numbers and get some results. Code: p_point p_game p_match 0.5000 0.5000 0.5000 0.5500 0.7458 0.8390 0.6000 0.9086 0.9765 0.7000 0.9970 1.0000 It's quite clear that even a small change in p_point leads to big changes in p_game. Winning 55% of rallies leads to winning 75% of games. Increasing that to 60% of rallies give 90% win probability for games. The probability for how points breakdown for each game is plotted at the end of the post. The plots show results for the four p_points above for both A winning or B winning. The probability dip beyond 20 points is expected since that can only be achieved with a score of 20:20 at some point and can only be continued with deuces. The x-axis label kept getting cut off for some reason, but the last word is just "game". In the next section, I will use the same method for tennis. [SIZE=+2]Tennis Win Probability - From Points to Games to Sets[/SIZE] I had worked out the above analysis for a while now, but had been too lazy to write it all up. GameGod's post, http://www.badmintoncentral.com/forums/showthread.php?88196, was what finally motivated me to do the analysis for tennis and post both results. I will again be assuming a single probability for winning a point. This is much more questionable in tennis since the serve is such an advantage. Since this tennis analysis is just to answer GameGod's question, it should suffice. But if there is interest, I may redo the analysis to account for service advantage. [SIZE=+2]Tennis Scoring[/SIZE] There are more variations in tennis scoring than badminton. I will use the following: A point is scored on each serve. A game is won by the first player to: reach 4 points with at least a two point lead over the opponent after 3:3, score a two point lead over the opponent with no limit A set is won by the first player to: reach 6 games with at least a two game lead over the opponent after 5:5, win two games in a row after 6:6, win seven points with a two point lead over the opponent with no limit A match is won by the first player to win two/three sets. [SIZE=+2]Analysis[/SIZE] I will not be posting the tennis analysis, this is a badminton forum after all. The basic idea is the same, but there are more tricky parts. If there is interest to see the full analysis, I will consider posting them. Consider a match between players A and B. Let: p_point = probability of player A winning any one point. p_game = probability of player A winning any one game. p_set = probability of player A winning any one set. p_match = probability of player A winning the match. The main differences from badminton is the addition of a "set", this really just adds another level of analysis and is not that much of a problem. The main variations in tennis rules is due to tie-breaking. There are major differences in this aspect even in major tournaments. Tie-breaking to win a game follows the deuce/advantage system, in which a two point lead is needed and there is no limit. However, there is also "no-advantage" scoring, where a game is won by the first player to win four point, full stop. Tie-breaking to win a set is usually done with "seven point tie-break", where after reaching 6:6, a final game is played to a seven points and at least a two point lead with no limit. However, some tournaments still play with "advantage set", where the set continues until a player has a two game lead with no limit. This was shown recently with the 2010 Wimbledon first-round match between John Isner and Nicolas Mahut. To win a match, Men's usually play best of 5 sets, while Women's usually play best of 3 sets. For best of three sets, the method to get p_match from p_set is the same as that as in badminton (where p_game is used to get p_match). For best of five sets, that is left as an exercise to the reader. [SIZE=+2]Results[/SIZE] Code: p_point p_game p_set 0.5000 0.5000 0.5000 0.5500 0.6231 0.8150 0.6000 0.7357 0.9634 0.7000 0.9008 0.9998 It's again clear that a small change in p_point leads to big changes in p_game and p_set. It's worth mentioning that for the same p_point, both badminton and this tennis scoring variant give very similar p_set. This will allow us to compare the point breakdown in a game of badminton and the game breakdown in a set of tennis to answer GameGod's question. The probability for how games breakdown for each set is plotted at the end of the post. The plots show results for the four p_points above for both A winning or B winning. While the distribution moves left just as the badminton point breakdown distribution, the tennis distribution moves much faster. The x-axis label again keeps getting cut off for some reason, the last word is just "set". [SIZE=+2]GameGod's Question[/SIZE] GameGod's question from http://www.badmintoncentral.com/forums/showthread.php?88196 essentially asks how to compare scores between badminton and tennis. My answer is to compare the point distributions from badminton to the game distributions from tennis. This is done by taking groups of points in badminton and summing their individual probabilities and matching the sum to the probability for each game in tennis. This comparison needs to be done when the distributions are calculated from the same p_point. This is justified since the p_game from badminton match the p_set from tennis for the same p_point. For p_point = 0.5 Code: tennis badminton 6 : 0 0-8 6 : 1 9-11 6 : 2 12-14 6 : 3 15-16 6 : 4 17-18 7 : 5 19 7 : 6 20-29 For p_point = 0.7 Code: tennis badminton 6 : 0 0-8 6 : 1 9-12 6 : 2 13-15 6 : 3 16-18 6 : 4 19-20 7 : 5 21-22 7 : 6 23-29 Due to the integer nature of points and games, the sums do not match up exactly. However, I think it is sufficient given the coarse nature of the tennis analysis. If a comparison between a match of badminton and a match of tennis was desired, it would probably require a tedious cross multiplication between the distributions before the summation comparison. I hope this has been an interesting analysis for everyone to read. Thanks. - hhwoot
Wow, really intensive and impressive work done. Thumbs up for this, I'm sure this will satisfy a lot of curious readers. Job well done!
Supplemental - Professional Results [SIZE=+2]Supplemental - Professional Results[/SIZE] I have compiled some results of professional players from http://www.tournamentsoftware.com to show how my analysis performs. I've selected Lin Dan, Lee Chong Wei, and Peter Gade as the players and have complied the results from their 8 most recent major tournaments. I have also chosen to apply the analysis for each of their head to head matches (for only the rally scoring, of course). It should not be too surprising to see that that analysis does not match up perfectly for each tournament. Since each player only plays about 10 games total in each tournament, one game difference translates into ~0.1 difference in p_game. The results are much better for the TOTAL over 8 tournaments. We see that Lin Dan performs a bit better than his point percentage predicts, Lee Chong Wei performs a bit worse, and Peter Gade performs as predicted. However, it should be noted that the deviation is only about two games out of ~75 games. The head to head results are much more interesting. Since it does away with the variable of playing against other player. We can see that Lin Dan holds a bit of an advantage over both Lee Chong Wei and Peter Gade, while Lee Chong Wei has a more significant advantage over Peter Gade. It may be interesting to do the same analysis with top tennis players. I have not found (have not looked) for a site that provides an easy to read point/game/set breakdown for tennis tournaments. But I'm sure there's one out there. Code: Lin Dan Points Points Actual Expected Actual Tournament Won Played p_point p_game p_game 2010 TC 214 342 0.6257 0.9539 1.0000 (10/10) 2010 SO 117 206 0.5680 0.8163 0.6667 (4/6) 2010 AE 138 261 0.5287 0.6477 0.7143 (5/7) 2009 CO 210 337 0.6231 0.9503 1.0000 (10/10) 2009 FO 212 354 0.5989 0.9061 1.0000 (10/10) 2009 CM 241 440 0.5477 0.7359 0.8333 (10/12) 2009 WC 266 456 0.5833 0.8657 0.9231 (12/13) 2009 IO 127 244 0.5205 0.6067 0.7143 (5/7) TOTAL 1525 2640 0.5777 0.8488 0.8800 (66/75) Code: Lee Chong Wei Points Points Actual Expected Actual Tournament Won Played p_point p_game p_game 2010 IO 218 359 0.6072 0.9237 0.9091 (10/11) 2010 SO 173 317 0.5457 0.7272 0.5556 (5/9) 2010 TC 151 252 0.5992 0.9068 0.7500 (6/8) 2010 AE 224 401 0.5586 0.7811 0.9091 (10/11) 2010 MO 248 421 0.5891 0.8819 0.8333 (10/12) 2010 KO 210 325 0.6462 0.9755 1.0000 (10/10) 2009 CO 44 99 0.4444 0.2309 0.3333 (1/3) 2009 HKO 236 428 0.5514 0.7518 0.8333 (10/12) TOTAL 1504 2602 0.5780 0.8497 0.8158 (62/76) Code: Peter Gade Points Points Actual Expected Actual Tournament Won Played p_point p_game p_game 2010 SO 198 348 0.5690 0.8198 0.7000 (7/10) 2010 TC 118 197 0.5990 0.9063 0.6667 (4/6) 2010 SO 163 295 0.5525 0.7564 0.7500 (6/8) 2010 AE 182 360 0.5056 0.5295 0.6667 (6/9) 2010 MO 133 243 0.5473 0.7342 0.7143 (5/7) 2010 KO 222 449 0.4944 0.4705 0.6667 (8/12) 2009 HKO 239 419 0.5704 0.8247 0.7500 (9/12) 2009 FO 174 315 0.5524 0.7559 0.6667 (6/9) TOTAL 1429 2626 0.5442 0.7205 0.7250 (58/80) Code: Head to Head Points Points Actual Expected Actual Won Played p_point p_game p_game LD - LCW 658 1253 0.5251 0.6299 0.6000 (21/35) LD - PG 260 489 0.5317 0.6623 0.6923 (9/13) LCW - PG 388 702 0.5527 0.7572 0.7895 (15/19) - hhwoot
Fascinating. Although I don't have time to do more than scan this, I'll be interested to look at it more closely later on.
Interesting stuff. I wonder if it would be possible to extend this by classifying different levels of players and looking at which point there are significant shifts in winning percentage relative to point winning percentage. I can see top players having this distribution of point percentage won vs player level where it's closest to 50% at the highest level, goes up as the player level decreases, then dips a bit as effort reduces against "easy" opponents that are reasonably skilled, then goes up rapidly as the level difference is too large.
HI, isn't there a problem with this ? I mean, there is probability for player A to win a point (Pt A) and it was assumed that probability for B to win a point was the 1 - PtA. But in fact there should be 4 percentages: When A is serving B has a winning percent to win the point as a receiver. And then when B is serving there is a probability for him to win serves and A to win as a receiver. Don't you agree ?
You can make it as complicated as you want. The original post takes this complexity out of the game in order to make the maths easier. If you want to know who's serving, you have to know who won the previous point, and then your maths is going to get exhaustingly complicated.