MASTER NOTES: Statcast and me, part 1 of many (I hope)

In the July 22nd edition of the BaseballHQ Radio podcast, I talked at length with Cory Schwartz, the Director of Stats for MLB Advanced Media. Cory is part of the team that has developed and installed a lot of the advanced systems for gathering ever more granular and detailed info about player performance.

The latest system is Statcast, which uses a series of ultra-high-res optical cameras and radar equipment to accurately track the location and movements of every player on the field, including batter, pitcher, all defenders and baserunners. They don’t do the drunk guy who jumps out of the bleachers and staggers around the outfield, or the security guys who run him down.

(Not yet, anyway.)

Statcast provides the 30 clubs, MLB broadcasters and, increasingly, everyday fans and baseball researchers with a vast amount of information. Much of it is proprietary—the clubs do their own data analysis and keep their results to themselves.

But there’s a lot of info we can use, too. Data from an earlier data-collection system, PITCHf/x, is widely available. And we can all access some early data from Statcast through the BaseballSavant.com website.


Unlock all of our insights on your way to a fantasy title! See what our season-long coverage offers and then subscribe to BaseballHQ.com.


I’m pretty new to that site and still just nosing around, but I’ve already dug out some interesting data on “exit velocity” (EV), the speed of the ball off the bat once it is struck.

I wanted to see how strongly EV explains HR performance. I started by downloading exit velocity data on every hitter with more than 30 “Batted Ball Events” (BBE), which MLB defines as “any batted ball that produces ... an out, a hit, or an error.” I narrowed the resulting list of more than 500 hitters down to the 270 whose HR/600PA (HR600) this season is 20 or more.

The first step many researchers use when assessing the connection between two variables—like, in this case, between EV and HR—is to run a quick Excel correlation between them. Correlation values run from -1.0 to +1.0. A -1.0 is a perfect negative correlation—as one variable goes up, the other goes down in perfect proportion. A +1.0 is the opposite, a perfect positive correlation where the two variable rise proportionately together. A 0.0 is no correlation—the two variables are moving independently of each other.

The Statcast batter data available at Savant offer three EV measures for each hitter: Maximum, Minimum, and Average. I ran correlations for all three against HR600. Average EV had a weakish correlation of 0.36. Maximum EV, surprisingly, was weaker than Average EV, at 0.24. And Minimum EV, not surprisingly, had almost no correlation at all, at 0.09.

Fortunately, the data set also includes EV by batted-ball type, one for GBs and another for LDs and FBs combined. And the EV for this latter trajectory correlated with reasonable strength, at 0.53. One way to think of that measure is that LD/FB EV explains about 53% of HRs. I tried to improve the correlation my multiplying in each hitter’s FB percentage and his FB+LD percentage, but the correlation did not improve.

From some external reading, I learned that deeper research into the relationships between Statcast batted-ball data that EV for HRs is usually above 95 mph. So it seems intuitive that batters with a lot of BBEs with FB/LD EV at or over 95 mph should generate more HRs than those under.

So I sorted the hitters high-to-low using their Average FB/LD EVs. And whaddya know: the top decile of Average EV included sluggers like Nelson Cruz, Khris Davis,  Chris Carter, Jake Lamb, Josh Donaldson, Giancarlo Stanton, Mark Trumbo, David Ortiz, Jose Bautista, Miguel Cabrera, Miguel Sano, Yoenis Cespedes and Joc Perderson.

Given that, it seemed possible that we might expect more HR production from such high-EV hitters as Justin Bour, Eric Hosmer, Christian Yelich and Tommy Pham.

At the same time, we might want to be suspicious of the HR production of batters with low Average FB/LD EV, like Adam Duvall, Brandon Moss, Khris Davis and (gulp) Bryce Harper.

But I suspect you might have some questions about all this. Like, “Huh? What? Bryce Harper? Huh?”

And well you might ask. Remember the correlation said EV was only explaining about half of HRs. Research has shown another Statcast metric called “launch angle” (LA) is also critical to HR production. LA is the angle the ball takes as it leaves the bat. Parallel to the ground is 0˚, straight up is 90˚, and straight down is -90˚. The optimum LA for HRs is a matter of some dispute among analysts, but a consensus seems to have formed setting the LA range for HRs at 20˚ to 35˚ (some say it’s more like 25˚ to 30˚).

So it’s possible that there could be low-average-EV batters who hit a lot of optimal-angle FBs with gusto, and who, as a result, hit more HRs than their average EV would imply. Conversely, some of the high-average-EV batters with low HR production might be hitting too few of their high-EV flyballs at that 20˚-35˚ LA, or hitting too few of those optimally angled FB with enough EV.

Unfortunately, the Statcast data at Savant don’t include aggregated launch-angle information on each player. (And a count would be better.) You have to click on an individual player and dig into his BBE record to check his LAs. I did that, randomly selecting one seeming HR under-performer (Christian Yelich) and one seeming over-performer (Adam Duvall).

Here’s what I saw:

Yelich had 228 BBEs. Of those, 21 had both the 95-MPH EV and the optimum angle. Seven of those were HRs. Seven more were extra-base hits, six doubles and a triple, and six flew more than 350 feet. He had one HR that had the perfect 29-degree angle and a robust 96.5.

Duvall had 205 BBEs, of which 38 had the optimal velo-angle combo. Of those 38, 21 went yard. He had another HR that had a bit too high a LA (39˚) but made the seats because of its thunderous 107-MPH EV. His other optimals resulted in six doubles, a triple, and four flyouts of more than 370 feet (one of them a 409-foot moonshot to dead-center in Coors Field).

These examples suggest that Duvall, as an individual hitter, is full value for his HRs, despite his low average EV, because he hits many balls in that optimal combination zone. Similarly, we might expect Yelich as an individual hitter to pick up the HR pace, although we’ve been waiting for that a while already.

Of course there are other factors at play: Yelich plays his home games in a terrible park for LHH HRs, and he also plays divisional games in WAS and ATL, which also suppress LH power. Duvall’s Cincinnati home is one of the most homeriffic in MLB, and gets a healthy number of road ABs at Wrigley Field and Miller Park, which also boost RH power.

These data look to have tremendous potential to identify hidden or sleeper power sources. Regrettably, we can’t yet use the Statcast data from Savant to check all the hitters quickly. For the moment, we need to identify HR outliers based on aggregated FB/LD EV data, then go into the individual player records.

Until, that is, we can query the database to pick out batters with lots of BBEs in that HR velocity-angle sweet spot to more accurately—and quickly—pick out and assess outliers high and low. I can hardly wait.

In the meantime, even if you aren’t a number cruncher, go take a look at the data for yourself. The full URL is baseballsavant.mlb.com. Start with Statcast Leaderboard link and go from there. Have fun!


Click here to subscribe

  For more information about the terms used in this article, see our Glossary Primer.