RESEARCH: Using Elo Ratings to rank MLB players

One of the elements of baseball that makes it unique compared to most major team sports is the batter-pitcher matchup. Because each matchup is independent of all other matchups, and because each plate appearance (PA) results in a discrete outcome, baseball results are particularly amenable to statistical analysis. In other sports and games, one-on-one matchups are often analyzed using Elo ratings. Originally developed in the early 20th century by Arpad Elo to rank chess players, Elo ratings have expanded beyond chess into professional sports (note that Elo is a person's last name and not the initialism ELO).

Nate Silver has written about ranking baseball teams using the Elo system, dating back to at least 2006 when he wrote for Baseball Prospectus. Since then, Silver has expanded his use of Elo ratings to generate "power rankings" in sports like Basketball, American football, and baseball, for FiveThirtyEight.com. However, Silver's Elo ratings only focus on team-level outcomes (i.e., wins and losses), and not on individual players. Baseball-Reference.com and its sister sites under the Sports-Reference umbrella used to have a system for generating Elo ratings for individual players, but these were abandoned because—since they were based on user opinions and not face-to-face matchups—they were found to be too unreliable. However, it may be possible to use Elo ratings to rank baseball players on an individual level due to the properties of the batter-pitcher matchup discussed above.

Briefly, an Elo rating begins by assigning a new player (e.g., a rookie making his first MLB plate appearance) a default ranking that is exactly average, such as 1500 (this number is arbitrary, but commonly used). If two 1500-rated players face off, there is a 50% chance of each player winning. Wins improve a player's rating, whereas a loss decreases a player's rating by the same margin of the victor's rating increase. Bigger upsets cause bigger rating swings, and ratings possess the transitive property, such that if players A > B and B > C, then we can infer that A > C even if A and C never play each other. As such, it should be possible for the Elo system to create relative ranks of batters and pitchers, even though batters never enter head-to-head batting competitions and pitchers never enter head-to-head pitching competitions. For those who are more interested in the gory details of Elo ratings, Silver's early Baseball Prospectus article is a good place to start.


Look at players and fantasy baseball differently, with Research articles like this from BaseballHQ.com all season long. Subscription plans start at $39. Winning. Fantasy baseball. Insight.


Why might Elo ratings be useful for fantasy baseball players? There are two primary reasons.

First, Elo ratings take into account quality of opposition. For example, two different hitters might have the same OPS, but if one hitter achieved that OPS against great pitchers and the second hitter faced poor pitchers, then we would consider the first hitter to have better skills than the second hitter despite the same OPS.

Second, Elo ratings can be used to make probabilistic predictions about future matchups. For instance, if a pitcher with an Elo rating of 1550 is facing a batter whose Elo rating is 1450, one simple way of predicting the outcome of that matchup would be to use this equation: 1/(1+10^[-DIFF/400]), where DIFF is equal to the difference between those two players. In the example here, where DIFF = 100, then the probability of the pitcher earning a "point" against the batter (where 1 point is credited for a win and 0.5 points is credited for a tie) is 64%. Being able to predict the outcome of each PA has obvious implications for daily transaction leagues and even for informing decisions about weekly lineups. In other words, Elo ratings may provide a useful adjunct to BaseballHQ.com's Pitcher Matchups Tool.

Method

The current study sought to preliminarily establish Elo ratings for MLB players and to describe some of the findings. One of the challenges with applying Elo ratings to batter-pitcher matchups is determining whether a plate appearance results in a "win" or a "loss." Therefore, two different approaches will be used here.

1. The first way to define a plate appearance as a "win" or a "loss" is to use the Three True Outcomes (TTO) model. Under this paradigm, a batter would be credited a "win" if he walks or hits a home run. In contrast, a batter would get a "loss" if he strikes out. All other outcomes (balls in play, hit by pitch, errors, etc.) will be considered a tie. This will generate ratings that are more consistent with statistics that attempt to strip luck away from outcomes, like xERA.

2. The second way to define a plate appearance as a "win" or a "loss" is to take a more granular approach that uses balls in play (BIP) in addition to the TTOs. As such, batter "wins" would be any of the following: single, double, triple, home run, walk, and hit by pitch. In contrast, a batter "loss" would include strikeouts and any ball in play out. Ties would occur when a batter reaches on an error or a fielder's choice. These ratings will incorporate luck (e.g., hit rate, or h%) but may also provide evidence for whether or not a player has true ball-in-play skills.

Both of the above approaches were applied to per-PA data dating back to 2002. An initial rating of 1500 was assigned to each new player; this 1500 value also represents the average Elo rating of all players. Because a single PA should not cause wide swings in Elo ratings, a small k-factor of 3 was used. Also, because pitchers are more likely to "win" any given PA, a gamma value was applied, which essentially helps grade batters on a curve and ensures that the average player has a rating of 1500 regardless of whether that player is a batter or a pitcher. This gamma adjustment (-35 for the TTO approach and -127 for the BIP approach) was derived using an iterative approach to deriving Elo ratings and allows all players to be compared on the same numeric scale.

Results

The top 10 batters and pitchers, as measured by each of the two approaches, are shown below. These ratings represent the estimated talent of each player as of the last day of the 2017 season and will change as the first day of the 2018 baseball season unfolds.

Batters - TTO-based ratings

Rank    Player            Elo
==============================
1.      Joey Votto        1576

2.      Mike Trout        1569
3.      Edwin Encarnacion 1558
4.      Jose Ramirez      1557
5.      Carlos Santana    1554
6.
      Josh Donaldson    1554
7.
      Mookie Betts      1554
8.
      Anthony Rendon    1553
9.
      Joe Panik         1552
10.     Zack Cozart       1552

Pitchers - TTO-based ratings

Rank    Player           Elo
=============================
1.      Kenley Jansen    1605

2.      Craig Kimbrel    1597
3.      Andrew Miller    1588
4.      Aroldis Chapman  1576
5.      Corey Kluber     1571
6.
      Chad Green       1568
7.
      Chris Sale       1567
8.
      Carlos Carrasco  1563
9.
      Roberto Osuna    1561
10.     Ken Giles        1560

Batters - BIP-based ratings

Rank    Player            Elo
==============================
1.      Joey Votto        1602

2.      Aaron Judge       1592
3.      Mike Trout        1585
4.      Tommy Pham        1583
5.      Carlos Correa     1578
6.
      Josh Donaldson    1578
7.
      Eric Hosmer       1576
8.
      Jose Altuve       1575
9.
      Cesar Hernandez   1573
10.     Kris Bryant       1571

Pitchers - BIP-based ratings

Rank    Player            Elo
==============================
1.      Kenley Jansen     1627

2.      Corey Kluber      1607
3.      Craig Kimbrel     1601
4.      Andrew Miller     1599
5.      Justin Verlander  1593
6.
      Pat Neshek        1585
7.
      David Robertson   1584
8.
      Sean Doolittle    1584
9.
      Roberto Osuna     1583
10.     Aroldis Chapman   1580

Differences between Elo ratings are meaningful. For instance, using the BIP-based approach, a Kenley Jansen (1627) vs. Joey Votto (1602) matchup would predict that Jansen has a 70.5% probability of earning a "point" versus Votto (1 point for a win; 0.5 points for a tie, which here includes reaching on error or fielder's choice). Similarly, using the TTO-based approach, a Kenley Jansen (1605) vs. Joey Votto (1576) matchup would predict that Jansen has a 59.2% probability of earning a "point" versus Votto (1 point for a win; 0.5 points for a tie, which here includes any outcome that is not a walk, home run, or strikeout).

One of the potentially useful outcomes from using both the TTO and BIP approaches is that for any given player, the relative difference between their two Elo ratings could be interpreted as reflecting a player's BIP skill. For instance, the plots below show two batters and two pitchers. One batter (Carlos Santana, 1B, PHI) has quite a small difference between BIP and TTO, whereas the other batter (Miguel Sano, 3B, MIN) has a large difference between their BIP and TTO ratings. This might suggest that Sano has a real skill at reaching on balls in play, whereas Santana does not. The same interpretation could be made for the two pitchers shown.

A few other interesting tidbits that can be gleaned from these Elo ratings. The high and low points of all players since 2002 can be examined, although this is more for trivia purposes than for fantasy baseball purposes. These outcomes are shown in the following table (note that these ratings are most applicable to players whose careers began in 2002 or later, because the ratings do not exclude any of these players' PAs).

System  Player High (Rating), Date      Player Low (Rating), Date
=====================================================================
TTO     Kenley Jansen (1616), 5/18/17   Daniel Cabrera (1441), 9/6/09
BIP     Koji Uehara (1655), 4/5/14      Mark Mulder (1411), 7/9/08  

Discussion

Presented here is a preliminary overview of using Elo ratings to rank MLB players, including some of its features, capabilities, and interesting findings. Although the world of baseball analytics may not need another statistic to rate players, there is a case to be made that Elo ratings may add some incremental contributions beyond existing statistics such as OPS, BPV, wOBA, xERA, FIP, WAR, and so on. For instance, most baseball statistics are retrospective, meaning that they simply describe what has happened in the past. In contrast, Elo ratings can be converted into expected win probabilities for future events. Because fantasy baseball is all about making predictions about the future, Elo ratings may have some added value.

However, before Elo ratings can be trusted to perform as desired, they must be validated. Therefore, future research will report on this author's attempts to predict the outcomes of the 2018 season with these Elo ratings to determine how well they perform when compared to other systems for projecting player performance. So keep your eye on this space, both in the pre-season and during the regular season, for more details about how well (or poorly) the Elo system does at predicting player success in the future. This may include some adjustments to some of the parameters discussed above (e.g., gamma and k-factor) to generate the most accurate predictions. This may also include more complicated methods to differentially weight higher-value outcomes, like home runs. Assuming the Elo system is found to be valid for predicting per-PA outcomes, then a comprehensive list of player ratings will be provided, and we'll dig deeper into the potential value of this approach.


Click here to subscribe

  For more information about the terms used in this article, see our Glossary Primer.