Predicting service games
Hi! You probably know me from a variety of names, but if you do or you don’t, I’m “Arjun” (at least in this form), and this is a place for me to throw various rantings, research pieces, and other miscellaneous items. If you aren’t interested in analyzing things, well I’ll probably have ranty-pieces as well.
I’m a huge sports fan, so a lot of this will be about sports and baseball especially (because baseball is quantifiable and very prone to statistical thinking). I’ve published lots of stuff in varied places all over the internet, but at least here I can put what I want. There really won’t be a rhyme or reason to what’s exactly on here, but that’s on purpose; that’s how I think. Hey, for all I know, there might just be short stories or something here too!
Please comment! I have the little hit-tracker on the side to satisfy my curiosity/overwhelming need for ego-stroking, but if you comment, that makes me feel even better! And that’s what you want…right?
I have a list of sites on the sidebar; mostly baseball stats websites for now, but I’ll probably add more later. Give them a check if you’re interested based on the description: I promise only to link to the very best.
I’ll have a “jump” in all of my articles, just because I think it’s really cool-looking.
Without further rambling, the article! Tennis! Yay!
Anyway, I’ve been watching a fair bit of the Australian Open (by “a fair bit”, I mean “far far too much”) and generally am one of the sorts to enjoy tennis. By implication, this means that I’m also the type to be curious as to what exactly makes tennis players click – how do they do what they do? Specifically, in a recent match between Roger Federer and Nikolay Davydenko (the numbers one and six in the world, respectively. No idea how they met in the quarterfinals; if someone understands the Australian Open seeding, please tell me), it seemed that the primary difference between the players was that whenever Federer got into trouble in his service games, he had an ace (or two, or three, or even just a big serve) to throw out to even out the game, while Davydenko had an unfortunate habit of double-faulting in said situations.
Naturally, a lot of this is just from my eyes (which are, of course, simultaneously the most reliable and unreliable sources of information in the world), but this got me thinking: how well do aces and double faults predict a player holding serve? Naturally, players who have better serves will have more aces and will hold their serve more often (here in America, the player who is immediately associated with this kind of thinking is Andy Roddick), but to what extent is that true? I wanted to investigate further and, since “having a life” is beyond me, I decided to do just that.
So I used a sample size of sixty-three male players. More specifically, I looked at their success over the last year. There are several problems with this dataset: first, I used the totally-arbitrary cut-off of fifteen games played. I have no idea if this cut-off is too small or too high, while second, I used the data from the whole year, so I’m combining surfaces – never a good idea! Oh well, at least it’s a fun exercise. Third, I’m not accounting for player ability. Federer and Rafael Nadal are weighed the same as Lukas Lacko.
All statistics courtesy tennisinsight.com.
Actual data after the “jump”.
Okay, here we are. Glad you made the horrific crossing over the chasm.
So, my first idea was to see whether the ratio of aces to double faults fit well with the data for holding serve. That’s the most intuitive logic; using a baseball analogy, a pitcher is good if they get strikeouts and limit walks. The aces are free points (strikeouts) and the double faults are free points to the opponent (walks). I think the analogy works pretty well.
The data, on the other hand, did not.
The r-squared value for the regression was 38.7%, meaning that, in the dataset, aces/double faults accounted for 38.7% of the variation in serve hold percentage. So I altered my idea and went with what seemed to be correct from the data – only looking at aces.
The r-squared value for this regression was 45.0%, a significant improvement. I was relatively confident that at least some of the remaining variance in the dataset could be held because of double faults, so using the magic of Minitab, I performed a multiple regression using both Aces / Game and Double Faults / Game as predictors for service hold percentage. The r-squared value of this regression was 50.5%, meaning that using only aces and double faults, one can account for over half the variation in service game win percentage. That’s pretty incredible. The regression equation for this new regression was:
Service Hold Percentage = 77.2 + 15.0 Aces / Game – 23.2 Double Faults / Game
I’m not sure if that means anything, but I might as well pretend that it does! There were several interesting points in the data:
Tomas Berdych (world #21), Janko Tipsarevic (world #36), and Stanislas Wawrinka (world #19) all had the exact same aces/game (0.53), double faults/game (0.20) and thus aces/double faults (2.65). Their service hold percentage was, respectively, 81.4%, 78.7%, and 82.2%. I’m somewhat confident that this variation can be traced back to their relative rankings (and, from extrapolation, their relative abilities).
Fabrice Santoro (world #67), Mischa Zverev (world #78), and Rik de Voest (world #239) all had the exact same aces/game (0.38), double faults/game (0.15) and aces/double faults (2.53). Their service hold percentage was, respectively, 75.2%, 75.5%, and 77.9%. That’s pretty close and seems to fall in line with their rankings.
Another note is that the first graph maybe suggests that there’s an upper limit on this kind of regression; once one reaches a certain level, there are other factors (talent, primarily) which affect winning service games far more. Perhaps this exercise would be better done with junior-level tennis (if anyone has those statistics anywhere)?
I’d be interested to see if this data is similar for women, or if it changes depending on the tournament/era/whatnot. Nonetheless, it was a fun exercise!
UPDATE, January 30, 12:18 pm:
I added more variables and ran more regression analyses. I specifically wanted to look at the effect of the first serve, so I calculate the percentage of points won overall in the game won the first serve by each of the players in the set and added that to the regression, along with the percentage of second serve points won (unsurprisingly, those two were inversely correlated; if a player maintains a similar service hold percentage as another player and have very different percentages of points won on the first serve, logically the second player would have to make up the gap on the second serve). The r-squared value of this regression was 60.0%. Here’s the equation:
Service Hold Percentage = 12.8 + 42.3 Point Percentage Won-1st Serve + 0.867 Second Serve Win Percentage + 6.33 Aces / Game + 3.9 Double Faults / Game
One point of interest is that if one removes the double faults / game from the regression, one gets an r-squared value of 59.9% and actually a higher adjusted r-squared value (57.9% to 57.3%). I’m more inclined towards simpler equations, so here’s the equation without double faults / game:
Service Hold Percentage = 14.6 + 39.1 Point Percentage Won-1st Serve + 0.866 Second Serve Win Percentage + 6.96 Aces / Game
Somewhat surprisingly (or maybe not), the coefficients essentially remain the same. Also, as a note, the previous regression, one has a 1:2 relationship between the coefficients of aces/game to double faults/game. I wouldn’t read too much into. My gut feeling is that double faults are rare enough in the men’s game that they simply don’t influence a regression as large as this one.
I also calculated the Pearson Correlation Coefficient for every variable tested against Service Hold Percentage and surprisingly, the closest to 1/-1 was the percentage of points overall that was won on the first serve with a value of 0.615. Not a great value, but something to consider.
As a note: when going over my spreadsheet, I realized that I’d left off World #12 Gael Monfils completely by mistake. I’ve since added him, but there are no super-significant changes to the regressions.
I may attempt to do this same regression with the women just to see the differences. My gut believes that first serve percentage, aces/game, and double faults/game will play a larger role; particularly aces and double faults, the former of which is rarer in the women’s game and the latter of which is more common.


[...] serve returned. One of the primary marks of success in a Grand Slam is winning service games and as I looked earlier, one of the main components of this is having a dominant first serve. Murray’s returned [...]
ESPN’s problems and Finals thoughts « The Glint of Light
January 31, 2010 at 3:45 pm