RESEARCH: Expected Dom & Swing Metrics

Introduction

BaseballHQ.com and Baseball Forecaster readers are acquainted with expected Dom (xDom), which describes the relationship between swinging strikes (SwK) and Dom. In addition, the Pitcher Matchup Tool also makes use of recent SwK% to predict upcoming starts.

Today, we’ll examine whether additional swing and contact metrics can add to our understanding of expected Dom. We’ll also review several models for the expected Dom for the upcoming season.

Methodology

We use season totals for all pitchers from 2002-2018, downloaded from fangraphs.com. For multiple year comparisons, we’ll weight the results by the harmonic mean of innings pitched over the seasons in question. The parameters we will consider are as follows:

  • Dom (K/9)
  • Ctl (BB/9)
  • Fa% (fastball percentage)
  • O-Swing% (% swings at pitches outside the zone)
  • Z-Swing% (% swings at pitches inside the zone)
  • Swing% (% of all pitches swung at)
  • O-Contact% (% of swings outside the zone resulting in contact)
  • Z-Contact% (% of swings inside the zone resulting in contact)
  • Contact% (% of all swings resulting in contact)
  • Zone% (% of pitches in the zone)
  • FpK% (% of first pitches that are strikes)
  • SwK (% of all pitches swung at and missed)

[Note: Contact includes foul balls]

The original work on swinging strikes at BaseballHQ.com used data from 2005-2008, and found that:

xDom = 335.93 * e ^ (-4.9048 * Contact%)

Revisiting that formula, we see it still holds up pretty well, with an R2 of 0.61. Still, there is a bend in the scatter plot that suggests the exponential function isn’t the best fit.

Using the updated data set, let’s revisit how Dom is related to each of SwK and Contact%:

 

Each of these linear relationships is marginally better than the previous exponential, with SwK yielding an R2 value of 0.63, and Contact% at 0.68.

Next, let’s dump all our plate discipline metrics into a linear model and run a backward regression, removing the insignificant factors. We are left with a formula for Dom with an R2 value of 0.74, a little bit better. Here is the formula with factors listed in order of significance:

xDom = 22.9 + 53.6*SwK - 15.9*Z-Swing% - 12.3*Z-Contact% + 2.17*Fa% - 2.28*Zone% 
       + 1.59*O-Swing%

Most of this makes sense intuitively:

  • SwK is the most important and every percentage point increase is worth about 0.5 K/9
  • Z-swing%: more swings at good pitches means fewer Ks
  • Z-contact%: more contact obviously means fewer Ks
  • Zone%: more pitches in the zone, fewer Ks
  • Fa%: this isn’t immediately obvious—by itself, Fa% is negatively correlated with Dom, but once we account for SwK, Fa% interacts in the opposite direction
  • O-Swing%: more chases leads to more Ks

The last thing we’ll do is check whether this formula holds up over all seasons. We’ve included the SwK-only and Contact%-only models as well:

The new model does better in all seasons. Additionally, there appears to be a systematic change in the relationship, seen by all models. Fortunately the models do well in recent years, so we'll leave any further investigation into the variability for another time... though a guess would be when batting and pitching styles change quickly, our ability to correlate results is temporarily diminished.

Behavior of outliers

Now we have an updated xDom. What happens to pitchers when Dom exceeds xDom, and vice versa?

This line fit has an R2 value of 0.06, so there is only a small tendency to regress. Still, on the end there are potentially some useful trends. Filtering only those with >100 IP (harmonic mean) over the two seasons:

  • Pitchers whose Dom exceeds xDom by more than 1.5 K/9 (the upper 10%) will see their Dom drop by 0.7 ± 1.3 K/9 on average.
  • Pitchers whose Dom falls short of xDom by more than 1.5 K/9 (the lower 10%) will see their Dom increase on average by 0.7 ± 1.1 K/9.
  • Because of the weakness of the relationship, there is a large spread in individual outcomes.

So, while this seems like a good model for within a season, let’s see if we can now build a model better at predicting next year’s Dom.

Next Year’s Dom

Let’s see if we can use xDom along with Dom in Year 0 to predict Dom in Year 1. For a baseline, Dom by itself is well-correlated with next year’s Dom with R2 of 0.52. [Note: this is again weighted by the harmonic mean of IP in the two seasons.]

If we also include xDom, we only get a little better, with an R2 value of 0.53.

Finally, if we throw all the swing parameters we have at this, we can still only get up to 0.54. As we found last month with batters, adding in the swing metrics only slightly increases correlations and is probably not worth the work.

So, as we did last month, we’ll create one more model for next year’s Dom for the cases when we have three years of history:

The formula is:

DomY1 = 1.51 + 0.49*DomY0 + 0.24*DomY-1 + 0.11*DomY-2 - 0.039*AgeY0 + 2.68*O-Swing%Y0

This gives a somewhat improved R2 value of 0.63.

Conclusions

For correlation with Dom within a season:

  • We reviewed previous expected Dom formula, which used an exponential function of Contact%, and found that a linear fit now appears to work better.
  • Using all available swing metrics, we can improve our model’s correlation with this year's Dom.
  • Using the improved xDom, the difference between xDom and Dom was found to be a weak predictor of Dom change in the following year.

For predicting next year’s Dom:

  • A model that makes use of a single year of additional swing metrics doesn’t significantly improve the prediction of next year’s Dom.
  • With three years of Dom results, plus Age and last year’s O-Swing%, we can do about 20% better than using one year of data.

Many prediction systems already make use of three years of data. The addition of the swing metrics in this case gives only marginal improvements. As with last month, we conclude that a pitcher’s Dom tells us 99% of what we need to know about their skill.

For analysis of pitchers who can expect Dom improvements in 2019, check out Stephen Nickrand’s recent piece on expected Ctl & Dom gainers.


Click here to subscribe

  For more information about the terms used in this article, see our Glossary Primer.