We asked Christopher Phillips and Tim Chartier how they would describe the intersection between statistics and sports. How does one inform the other?
Christopher Phillips, author of Scouting and Scoring: How We Know What We Know about Baseball
Sports have undoubtedly become one of the most visible and important sites for the rise of data analytics and statistics. In some respects, sports seem to be an easy, even inevitable place to apply new statistical tools: most sports produce a lot of data across teams and seasons; games have fixed rules and clear measures of success (e.g., wins or points); players and teams have incentives to adjust in order to gain a competitive edge.
But as I discuss in my new book Scouting and Scoring: How We Know What We Know About Baseball, it is also easy to fall prey to myths about the use of statistics in sports. Though these myths apply across many sports, it is easiest to hone in on baseball, as that has been one of the most consequential areas for statistics.
“The most persistent and pernicious myth is that data emerge naturally from sporting events.”
Perhaps the most persistent and pernicious myth is that data emerge naturally from sporting events. There is no doubt that new video-, Doppler-, and radar-based technologies, especially when combined with increasingly cheap computing power and storage capability, have dramatically expanded the amount of data that can be collected. But it takes a huge amount of labor to create, collect, clean, and curate data, even before anyone tries to analyze them. Moreover, some data, like errors in baseball, are inescapably the product of individual judgment which has to be standardized and monitored.
The second myth is that sport statistics emerged only recently, particularly after the rise of the electronic computer. In fact, statistical analysis in sports goes back decades: in baseball, playing statistics were being used to evaluate players for year-end awards and negotiate contracts for as long as professional baseball has existed. (And statistics were collected and published for cricket decades before baseball’s rules were formalized.) As new methods of statistical analysis emerged in the early twentieth century in fields like psychology and physiology, some observers immediately tried to apply them to sports. In the 1910 book Touching Second, the authors promoted the use of data for shifting around fielders and for scouting prospects, two of the most important uses of statistical data in the modern era as well. There’s certainly been a flurry of new statistics over the last twenty years, but the general idea isn’t new—consider that Allen Guttmann’s half-century-old book From Ritual to Record, highlights the “numeration of achievement” and the “quantification of the aesthetic” as defining features of modern sport.
Finally, it’s a myth that there is a fundamental divide between those who look at performance statistics (i.e., scorers) and those who evaluate bodies (i.e., scouts). The usual gloss is that scouts are holistic, subjective judges of quality whereas scorers are precise, objective measurers. In reality, baseball scouts have long used methods of quantification, whether for the pricing of amateur prospects, or for the grading of skills, or the creation of single metrics like the Overall Future Potential that reduce a player to a single number. There’s a fairly good case to be made that scouts and other evaluators of talent are even more audacious quantifiers than scorers in that the latter mainly analyze things that can be easily counted.
Tim Chartier, author of Math Bytes: Google Bombs, Chocolate-Covered Pi, and Other Cool Bits in Computing
Data surrounds us. The rate at which data is produced can make us seem like specks in the cavernous expanse of digital information. Each day 3 billion photos and videos are shared on Snapchat. In the last minute, 300 hours of video were uploaded to YouTube. Data is offering new possibilities for insight. Sports is an area where data has a traditional role and newfound possibilities, in part, due to the enlarging datasets.
“Baseball is probably the world’s best documented sport.”
For years, there are a number of constants in baseball that include the ball, bat, bases, and statistics like balls, strikes, hits and outs. Statistics are and have simply been a part of the game. You can find from the 1920 box score that Babe Ruth got 2 hits in 4 at-bats in his first game as a Yankee. While new metrics have emerged with analytical advances, the game has been well studied for some time. As Ford C. Frick stated in Games, Asterisks and People, “Baseball is probably the world’s best documented sport.”
While this is true, the prevalence of data does not necessarily result in trusting the recommendations of those who study it. For example, Manager Bobby Bragen stated, “Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable.” This underscores an important aspect of data and analytics. Data, inherently, can lead to insight but it becomes actionable when one trusts in how accurately it reflects our world.
Other sports, while not as statistically robust as baseball also have an influx of data. In basketball, cameras positioned in the rafters report the (x,y) position of every player on the court and the (x,y,z) position of the ball throughout the entire game every fraction of a second. As such, we can replay aspects of games via this data for years to come. With such information comes new information. For example, we know that Steph Curry, while averaging just over 34 minutes a game, runs, on average, just over 2.6 miles per game. He also runs almost a quarter of a mile more on offense than defense.
While such data can be stunning with its size and detail, it also comes with challenges. How do you recognize a pick and roll versus an isolation play simply from essentially dots moving in a plane? Further, basketball, like football but unlike baseball, generally involves multiple players at a time. How much credit do players get for a basket on offense? A player’s position may open up possibilities for scoring, even if that player didn’t touch the ball. As such, metrics have been and continued to be created in order to better understand the game.
Sports are played with a combination of analytics, gut and experience. What combination depends on the sport, player, coach and context. Nonetheless, data is here and will continue to give insight on the game.