Projections Are Curvier Than Vampire Kate Beckinsale

Finally, a movie about Kate Beckinsale wearing a vinyl suit! Wait, what? That's not the plot of Underworld?
This article is only about projections, not curvy women, Kate Beckinsale, or vampires. I thought I should make that perfectly clear so as not to let anyone down.
My original intro post (which I deleted because it sucked) had some definitions of basic concepts that are easily overlooked when people present advanced statistics. One of those is the purpose, intent, and interpretation of projection systems. For the purpose of this article, a projection system is any scientific attempt to estimate future performance using past statistics. Some commonly referenced systems include CHONE, Marcels, Oliver, ZiPS, and PECOTA.
The purpose of a projection system is simple, we want to use all available data to get a feel for what to expect from a player. Projection systems have the advantage of having internal consistency. What I mean by this is that all players are treated ‘objectively’ by the system and everything sums up correctly (i.e. expected runs scored equal to expected run allowed). They also eliminate a number of biases an individual would suffer from. One of the biggest flaws with projection systems is that they are often blind to certain factors, usually injury, body type, or major changes that affect a player’s true talent level.
The intention of a projection system is often misunderstood. A projection system attempts to take a player’s estimated true talent level and parlay that into a distribution of future production. The numbers you see reported are merely the average expectation. Tom Tango had a fine illustration of this recently. His example demonstrates that if we project Albert Pujols to hit 31 homeruns, what we’re really projecting is the average of a distribution of outcomes. Here’s a sample graph, the horizontal axis is homeruns and the vertical axis is likelihood of hitting that many expressed as a percentage.
Recall, we’re working in a hypothetical world where Albert Pujols is predicted to hit 31 homeruns. What our projection system would say if it could talk is “Albert Pujols might hit 10 homeruns or he might hit 55 runs. If I had to pick one number I think he will hit, I’m going to choose 31.” Therefore, the true intent of a projection system is to provide a distribution of possible outcomes and pick the number it thinks most likely.
Knowing that, you can probably already guess what I’m going to say about interpreting a projection. It’s important to realize that the numbers you look at when you read a fangraphs CHONE projection are really the average of a distribution. The reported number is the one the projection system thinks is most likely but for every player there is a considerable range on either side that is also rather likely. For most players, these projections will look roughly like a normally distributed bell curve. And remember, as I noted before, projection systems are often blind to certain things like injury status. As an obvious example, if you know that Joe Nathan blew out his UCL but the projection system doesn’t, then you know better than to expect 70 IP from him.
So what should we take away from this? You might be wondering at this point if/why projections are useful if all they’re reporting is an average of a distribution. For us fans, the most important aspect of a projection system is that it helps inform our own internal expectations. Sometimes it can used to confirm our own intuition and sometimes it can be used to temper our bullish opinions of our hometown heroes. The next time you look at a projection or someone quotes one to you, remember it’s simply the average of all likely outcomes. It’s a tool, not an answer.
___
A related aside:
Occasionally arguments occur where two fans from different teams are arguing about which team/player is better. Sometimes these arguments are between what I’ll simply call a traditionalist and a stat-geek. In such arguments the stat-geek inevitably turns to projections to make his argument. Equally inevitably, the traditionalist retorts with a comment like “O, if you know all the outcomes then why do we play the game.” The fuel at the heart of this argument is a misunderstanding over what projections are. If you find yourself being the stat-geek in one of these arguments, take the time to explain what a projection is (assuming your adversary isn’t a 13 year old troll).
___
Second aside:
If a projection system says Ryan Howard will hit 42 home runs and he hits 49, some people might say the projection was wrong. That is not necessarily true. From the graph above, we can intuit that 42 homeruns might occur 18% of the time and 49 might occur 14% of the time. As long as those underlined percent likelihoods are accurate, then the projection was correct despite appearing to be off by 7 homeruns.
We can tell if those percents are right by looking league wide. If the system expects 1,000 homeruns and 993 are hit, then we know those percentages were pretty accurate. If 1,400 are hit, then we have to wonder whether the system is working as intended.
___
Third and final aside:
I mentioned that most projections will have a normal distribution. For those familiar with normal distributions, the following should make perfect sense. An effective projection system will be accurate on 68% of players within one standard deviation. 95% of players will fall within two standard deviations, 98% within three standard deviations, and the remaining 2% will be waaay off. So when a projection misses on a player by a wide margin (think ’09 Ben Zobrist) that’s not a failure of the projection system, it’s a feature.



And I thought elementary stats class was a waste… I just understood what you were saying!
TL;DR, not enough boobies.
I accepted the tradeoff.
With with Ruckus on this one.
PECOTAAAAAAAAAAA!