Photo by Pierre-Etienne Vachon on Unsplash

2022 Machine Learning Baseball Projections: A Look Back at Things to Come

John Pette
6 min readApr 15, 2022

--

We’re back! Happy baseball season, everyone. Last year, I published my foray into building baseball projections using machine learning models. I did this in two parts:
1. Baseball and Machine Learning: A Data Science Approach to 2021 Hitting Projections
2. Baseball and Machine Learning Part 2: A Data Science Approach to 2021 Pitching Projections

I updated my models slightly this year, but mainly wanted to expand the input data set so that the most recent season was a normal (ish), full season, rather than the 60-game 2020 season that played out while the world was falling apart. So my models this year basically reflected the same methodology I discussed last year. I already have some thoughts about how to improve the models for next year. I’ll probably work on those during the year and will hopefully do a write up before the season starts next time.

For now, I want to take a look at how my projections did vs. successful, published projections and then lay out where this year’s differences were with those same published projections.

2021 Hitter Outliers: Results

To recap, last year, I ran my projections through my scoring system, and did the same for THE BAT projections, in order to see where the major differences were. This is what I had. First, the players I projected to outperform THE BAT.

2021 Hitters Projected to Outperform THE BAT

Table by author

Huh. Well, that doesn’t look very good at all. I’m calling that 2–12–1. Maybe we can be kind and give me the win on Crawford to make it 3–12. Not great. I was pretty sure a few of these would fall flat. I had zero faith in that Aquino projection. Many of the others at the top of the list had obvious playing time concerns, and the model was not going to pick those up. I felt pretty good about the Grichuk, Crawford, and Walker picks, though. Walker was hurt all year, so maybe we can give that one a mulligan.

I do have to remind myself that I am going up against projections that outperform most (or all) others. Still…that’s not satisfying. What about the hitters I projected to underperform?

2021 Hitters Projected to Underperform THE BAT

Table by author

A little better here: 5–8–2. As I wrote last year, the model punished poor 2020 performance fairly heavily. It punished 2020 injuries quite a bit, too. I thought it would do better with Stanton and Judge, but they ended up playing more than predicted. Neither of these were particularly great performances, but the model was definitely better at predicting underperformers than it was at picking winners.

2021 Starting Pitcher Outliers: Results

Okay, full disclosure here…I screwed up this section of my pitching article from 2021. Kind of severely. My only explanation is that this was the last thing I did and I was fried by this point in the process. Basically, I pulled from the wrong column for THE BAT numbers. My scores were correct, but THE BAT scores were completely wrong, and, consequently, the differences between the two sets were also completely wrong. I have redone these tables below and added the results. OK, then. Here are the ones where my projections predicted pitchers would outperform THE BAT. For this analysis, I only considered pitchers who were projected to throw at least 100 innings.

2021 Starting Pitchers Projected to Outperform THE BAT

Table by author

I cut a few entries from this section for the pitchers who missed the whole year with injuries (mainly those we knew going into the year: Strasburg, Syndergaard, and Sale). If you think I am being too liberal calling some of these “pushes”, well…you may be right. The way I was thinking about them, though, was where the players outperformed THE BAT projections significantly, even if they were still substantially short of my numbers. Honestly, I could probably even call those wins by that logic, but I’m leaving them in the push column. (Also, Woodruff should totally be a “W”, but, ironically, his win total kept that from happening. I’ll own it, but I choose to be bitter.) Right. So, what’s that, 3–8–4? Again, not spectacular, but the projections certainly identified a few pitchers here who outperformed the published numbers.

The underperformer projections chart is a little tough. Due to the 100 projected inning stipulation, I did not really have enough pitchers whose scores were that far below those of THE BAT. Really, I had 15, but seven of those ended up missing too much of the year to qualify (for the record, those were Strahm, Porcello, Anibal Sanchez, Soroka, Kyle Wright, Leake, and Lucchesi). As a result, the final table is a little sparse, but this is where we landed:

2021 Starting Pitchers Projected to Underperform THE BAT

Table by author

4–4. Not bad at all. Except for Walker Buehler. That one was pretty a huge miss. I actually did not really buy into the projection for him, but I have to report it. The pattern we saw with hitters mostly held for pitching: the ML models were much better at flagging underperformers than they were at pinpointing exceptionally good performers.

2022 Projections

So how are we going to look this year? This is what my latest models say…

2022 Hitters Projected to Outperform THE BAT

Table by author

My models just loved Jeimer Candelario this year. Not totally sure what’s up there. They also loved a bunch of backup catchers, but I set the threshold for this analysis to 350 projected plate appearances. But if Zack Collins, Victor Caratini, or Jonah Heim kill it this year, well, my models had an idea about that.

2022 Hitters Projected to Underperform THE BAT

Table by author

This list is terrifying to publish. There are some big time names on there. The Trout projection is clearly leaning on his recent missed time. I have no idea how Juan Soto ended up here. I won’t lie: I thought about cutting this list off at 14 just to save face on that one, but it is all about integrity, dear reader.

2022 Starting Pitchers Projected to Outperform THE BAT

Table by author

My cutoff threshold for the pitching projections was 100 projected innings. Clearly, this was completed before deGrom’s injury. Actually, the original model projected him at about 90 innings and I adjusted it to 150 in my final projections. Oops. I probably should have used the final model numbers and dropped him out, but here we are. The model loves Woodruff again. Makes sense.

2022 Starting Pitchers Projected to Underperform THE BAT

Table by author

So there we go. Happy baseball season. Let’s see how it plays out.

--

--

John Pette

Data Scientist at NYU | Ex-U.S. Diplomat | Music Historian and Discographer | Armchair Sociologist | One-time Chemist. None of it makes sense to me either.