The Greatest Danger…

Because Chris insists my posts begin with a picture

Using baseball analytics in a blog setting is hazardous work.  Hell doing any statistical analysis in any setting is rife with danger.  No, not the palpable danger of an axe murderer on the loose.  Or the kind that arises when you’re stuck on a jungle island with a bunch of hungry dinosaurs during a hurricane and the power’s out because Newman from Seinfeld was being a dick.  No, the danger is much more mundane, it’s data presentation and the conclusions that are drawn from the data.  I’m currently putting the finishing touches on a post about Roy Halladay and Johan Santana.  There’s no doubt in my mind that Doc is better than Johan.  The numbers support me, scouting data supports me, many non-Mets/Twins/Phillies/Blue Jays fans agree with me (you know, the fans without any stake in the argument), yet the malaise of uncertainty lurks.

The purpose of my post is to show Doc’s superiority using a couple of popular metrics that are better than ERA and W-L.  The reason I chose to do that is not because of some viscous dislike of Johan Santana, but because I saw an opportunity to educate the Phillies fans that visit this site by means of disparaging the Mets.  We love bashing the Mets right?  My goal is to give the reader another tool to do it while hopefully piquing their interest.  Some people have to be convinced that analytics are a good thing…

The first step in my analysis was to confirm my hypothesis that Doc was better than Johan.  I headed over to the fangraphs page and put Johan and Roy side by side.  After perusing the page, I was satisfied with my hypothesis and plucked the data I intended to share on this blog.  The data as it’s presented is clear and is not intentionally misleading.  However, because I’m not trying to write a book or blow your mind, it is a simple and one-dimensional snap shot of the two players.  I picked out the pitching metrics that I think are a nice first step to working with advanced stats and I presented them.  And because it’s a blog I drew a conclusion.  And it’s misleading.  So to summarize, it’s not intentionally misleading, but it is misleading.

It’s misleading because I take my single dimensional data and I draw a conclusion from that data.  It looks like a strong conclusion.  It looks indisputable.  The numbers I have say that Roy Halladay is much better than Johan Santana.  That result scares me because it’s not true.  The numbers I presented don’t consider that Johan is friggin masterful at stranding baserunners.  Most pitchers, including Halladay, exhibit little to no control over the number of baserunners that score.  Johan does.  He also has a strong track record of suppressing opponent’s BABIP.  This means that fewer runners will get on base via balls in play and fewer of those runners will score than the stats I presented would expect.  Which puts Johan an awful lot closer to Halladay (still worse mind you, you just have to squint a little more to see it).

I guess the point of this post is this, be skeptical.  Whenever you read an analysis, ask yourself good questions.  Stuff like: Why does the author say that FIP, xFIP, and tERA are better than ERA?  What is being left out in an effort to simplify the post?  What might the author be completely missing?  Why don’t I want to look at Wins and Losses?  If you don’t have the answer to those questions, leave them in the comments.  If a blogger isn’t happy to answer those questions (or at least direct you to where they have been answered before), then (s)he probably isn’t worth reading.

My goal is to present statistics in an engaging and interesting way.  Be prepared to be unintentionally misled.

Leave a Reply

Long Drive Community
Baseball Closeouts - Cheap Baseball Gear, Free Shipping.