RESEARCH: Extreme Makeovers: PQS Edition

BaseballHQ.com has used our Pure Quality Starts (PQS) metric to measure game-level starting pitcher performance for well over a decade. It is a brilliant way to cut through the noise of bad fielding, bad timing, and bad luck to determine the truth about a single start—or a season’s worth of them—all with a metric you can calculate in your head.

But major league benchmarks have changed dramatically since PQS was introduced back in 2002. Because PQS isn’t allowed to use modern statistical conveniences like league indexing or long division, the time has come to update PQS to reflect today’s game. We briefly touched on the issue in a column last June, but with the new season, it seemed like the perfect time for a revamp—and maybe fix a few other problems while we’re at it.

The problem(s)

At its core, PQS makes perfect sense: look at a starting pitcher’s box score, cross your eyes, and count up to five:

  • Did pitcher pitch deep into game? 1 if Yes, 0 if No.
  • Did pitcher limit hits? 1 if Yes, 0 if No.
  • Did pitcher strike out batters? 1 if Yes, 0 if No.
  • Did pitcher limit walks? 1 if Yes, 0 if No.
  • Did pitcher limit home runs? 1 if Yes, 0 if No.

For a given start, this works well: a score of 4 or 5 is relatively “dominant” (or "PQS-DOM"), a score of 0 or 1 is a relative “disaster” ("PQS-DIS") and a score of 2 or 3 is somewhere in between. When you start to aggregate PQS data, however—for a given pitcher-season, for all pitchers in a given season, or for all pitchers across multiple seasons—you begin to sense there may be a problem. Here is the distribution of PQS scores for all pitchers from 2012-2014:

Many 5s, 4s, and 3s, very few 2s and 1s, and a bunch of zeroes. The zeroes are due primarily to the caveat that any game in which the starter records fewer than 5.0 IP is automatically scored a PQS-0 regardless of how many “Yes” points he earns otherwise. During this period, the “average” PQS score was 3.1/5.0 (without automatic zeroes, the average would have shot up to 3.4/5.0), and the DOM:DIS ratio was 50%:21%. Looking back all the way to 2002, distribution of PQS scores still looked similar to the above figures, but not quite as heavily skewed toward the high end.

In simpler terms, the “average” start is WAY above average. And if you start filtering out the pitchers that are unlikely to be rostered in a typical fantasy league, then the “average” shoots up even higher.

So why is this happening?

Two words: faulty inputs. Specifically, each PQS criteria is far too likely to be met. Again using data from 2012-2014, we find:

  • In 62% of starts, IP>=6
  • In 57% of starts, H<=IP
  • In 69% of starts, K>=(IP-2)
  • In 67% of starts, K/BB>=2 or BB=0 & K>=1
  • In 86% of starts, HR<=1

So, this off-season, we set out to redefine PQS to better reflect today’s statistical era, without losing what we love about the PQS methodology. Here is the challenge we gave ourselves:

  1. Must still be able to “calculate” PQS in one’s head. After all, this is PQS’s signature feature.
  2. All PQS variables must be available in the starting pitcher’s simple box score. Even dipping below the fold to find things like ground balls vs. fly balls allowed is too much work when it comes to PQS.
  3. Must maintain the familiar 0-5 scale. This lets us keep our useful DOM:DIS definitions intact and makes it relatively straightforward to restate historical data.
  4. Must account for recent league-wide uptick in strikeouts. Strikeouts affect two of the five PQS categories, so they are especially important to get right for today's game.
  5. Should fix abnormal distribution of current PQS methodology. Ideally, PQS would be more normally distributed, with few 0s and 5s, a few more 1s and 4s, and even more 2s and 3s.
  6. Must still pass the “smell test” at the individual game level. When your starter tosses a PQS-5, it should feel great, and when he hurls a PQS-0, it should hurt.

We considered all sorts of new methodologies, but they all violated at least one of the above constraints. Then suddenly the skies cleared and rainbows appeared and a very simple solution presented itself: We should set the threshold for each PQS category so that, on a given day, an average major league starting pitcher has a roughly 50% chance of earning that PQS point. This should naturally result in the average PQS score settling closer to the midpoint of 2.5/5.0, and yield a distribution of PQS scores across pitchers that looks more “normal” (in both the intuitive and statistical sense).

New PQS

We looked at data from 2002-2014, with a particular focus on the more recent period of 2012-2014, to determine what game-level thresholds deliver the 50/50 balance we want for each PQS category. This is what we settled on:

Innings Pitched:

  • Old PQS: >=6 IP, auto PQS-0 if <5IP
  • New PQS: >6 IP. This is a subtle but important difference as 21.4% of all starts 2002-2014 were exactly 6IP. Essentially, new PQS requires that a starter successfully "pitches into the seventh inning" to earn his PQS point for innings. Also, we removed the auto PQS-0 caveat.

Hits allowed:

  • Old PQS: H<=IP
  • New PQS: H<IP. Another subtle difference. This was more of a coinflip using the entire time period, but the change gets us closer to a 50/50 component success rate in recent years. 

Strikeouts:

  • Old PQS: K>=(IP-2)
  • New PQS: K>=5. This change is not so subtle. We wanted to simplify the strikeout component in general, and this change lets us both simplify and modernize in one move. Using an absolute count of five strikeouts (a) is a tougher threshold than K>=(IP-2) for starts that end before the third out of the seventh inning, (b) doesn't raise the strikeout bar too high for starts that last into the eighth inning and beyond, and (c) holds up well in early exits that are no longer automatic PQS-0s.

Command:

  • Old PQS: (K/BB)>=2 (or if BB=0, K>=1)
  • New PQS: (K/BB)>=3 (or if BB=0, K>=3). This makes sense, as the average K/BB is now closer to 3.0 than it is to 2.0. Something like 2.5 could also have worked, but it's a bit harder on the brain and 3.0 gets us closer to a 50% success rate in recent years.

Home runs:

  • Old PQS: HR<=1
  • New PQS: HR=0. Giving up a HR in a given start is essentially already a 50/50 chance, so this was an easy call.

Now, new PQS has close to 50/50 achievement rate on all five components, and far closer than old PQS:

And, as we had hoped, new PQS is distributed more normally:

From 2012-2014, the average DOM:DIS ratio under new PQS is closer to 1:1 (vs. 2.5:1 under old PQS), and the new average PQS score is 2.40 (vs. 3.08 in old PQS) … roughly in the middle of 0-5 scale as we intended. On both of these fronts, we are comfortable with the league average DOM:DIS and PQS being on the slightly low side because these figures are weighed down by poor starts from pitchers who are typically not rostered even in the deepest of leagues.

Having gotten this far, we then took our new toy for a spin on 2015 full-season data to see how well everything held up when transitioning from old PQS to new PQS in a season not already included in our initial analysis data.

You read this table by first looking at the row header (down the left side) for what the previous PQS score was, and then looking at the column header (across the top) to see what the new PQS score was for that same start.

In the first row, we see lots of PQS-0's have become higher scores. This is exclusively the result of removing the automatic zero rule. In doing so, we let 0s and 1s represent true disasters, and raise the value of starts that had some redeeming qualities. To illustrate, look at Clayton Kershaw's last start of 2015—his tuneup start before the playoffs. He threw only 60 pitches by design, pitched 3.2 innings, gave up 2 hits, no walks, no home runs, and struck out 7. Under the old PQS system, he gets a PQS-0. But was that really a "disaster" start? Of course not, and new PQS reveals the truth (this earns a PQS-4 under the updated PQS). Not every example is this extreme, of course, but the spirit is the same—new PQS gives starters credit where credit is due.

A few rows down, you can see where some 2s become 3s, 3s become 4s, and 4s become 5s. These are all starts in which the pitcher failed to meet the strikeout threshold under old PQS but met or exceeded it under new PQS. One example is R.A. Dickey's 9/2/15 start vs. Cleveland. He pitched 9 innings, gave up 4 hits, no walks, and struck out 6. Under the old system, this was a PQS-4; under the new, it’s a PQS-5 because he doesn't get "penalized" for failing to strike out that seventh batter just because he went the distance.

Now for the situations where pitchers lose PQS points. First, we have some PQS-1s that drop to PQS-0s. Why? Because either the pitcher gave up exactly one home run (thus earning a point in old PQS but not in new PQS) or they went 5ish innings with only 3 or 4 strikeouts. Both of these situations would have earned a point before, but not anymore. We think rightly so.

Next, there are some PQS-2s and PQS-3s that drop to 0s or 1s. These are due to the same home run and strikeout issues in the previous situation, but may also fail to meet the innings threshold tweak or the more substantial command threshold change. An example of a PQS-3 that would now get classified as a PQS-0 disaster would be Hector Santiago’s 5/2/15 start vs. TEX. He pitched only 5 innings, gave up 5 hits, 3 runs, 3 walks, struck out 4, and served up a home run. Sounds much more like a disaster than one step away from dominance.

Finally, we have the real controversies: starts that were classified as DOMinant under old PQS but are now classified as DISasters under new PQS. These fall squarely into what we'll call the "no good answer" zone. Because PQS involves a binary yes-or-no decision for each threshold, there will always be cases that are right on the borderline in every PQS component metric. In 2015, 2.4% of starts (115 of 4,858) fell into this group.

Let's look at the most extreme cases—the five starts in 2015 that were full PQS-5s under old PQS but are now classified as full PQS-0's (!) under new PQS: Odrisamer Despaigne on 5/21/15 vs. CHC, R.A. Dickey on 8/12/15 vs. OAK, Rick Porcello on 4/8/15 at PHI, Eduardo Rodriguez on 9/21/15 vs. TAM, and C.J. Wilson on 7/5/15 vs. TEX. Reviewing these starts together is very easy because in all five cases the starter pitched exactly 6 innings, gave up exactly 6 hits, walked exactly 2 batters, struck out exactly 4 batters, and allowed exactly 1 home run. Do these deserve to be called a DISaster? Probably not. But do they deserve to be called DOMinant? Equally probably not. They are almost the very definition of borderline: they don't hurt you, but they don't really help you either. Both old PQS and new PQS involve a simple sum of component parts, and starts like this just now happen to fall on the wrong side of our new equation.

Now that we’re comfortable with how our new PQS works at the game level, let’s step back out and look at how performance shakes out in aggregate for each new PQS score:

At every step up the PQS ladder, results get demonstrably better. And now we see some real separation between the absolute disaster and absolute dominant starts (PQS-0s and PQS-5s) from the simply very bad and very good starts (PQS-1s and PQS-4s) both skills-wise and results-wise.

Finally, let’s aggregate these into their most used form: PQS-DISaster vs. PQS-DECent vs. PQS-DOMinant starts:

The best place to start here might be in the middle—our newly termed PQS-DECent starts (PQS-2s and PQS-3s). Look at how much these have shifted toward more, well, average results between old PQS and new PQS. That’s because with new PQS we’re now pulling more statistically average starts down from the DOMinant level, pushing very poor starts from DECent down to the DISaster level, and even bringing up some not-so-bad-just-really-short starts up from DISaster to DECent.

At the DISaster level, the results are still awful whether you’re looking at old or new PQS, but notice that this now incorporates a much larger group of outings, 32% (roughly 1 out of 3) rather than 21% (roughly 1 out of 5). The average innings pitched are also way up with new PQS, because it is no longer automatically weighed down by every start that lasted less than 5 innings.

Lastly, we see the DOMinant level, which is where we think new PQS really shines. The PQS-DOM group is now just over half of its previous size under old PQS. And because of this, the term DOMinant is now reserved for only genuinely valuable fantasy starts. On average, a DOMinant outing means that the starter pitched into the eighth inning, gave up only 4-5 hits, 1-2 walks, 0 home runs, and struck out close to 7 batters. And a just barely DOMinant start might be something like 6.1 innings pitched, 6 hits, 3 walks, 0 home runs, and 6 strikeouts. Still pretty dominant.

The 2016 season is now under way and we're ready to roll. BaseballHQ.com is now using this modernized version of PQS wherever you see it on the site. Let us know what you think in the comments below and in the forums.


Click here to subscribe

  For more information about the terms used in this article, see our Glossary Primer.