Saturday, March 28, 2020

The Runs Saved Controversy at BillJames.com

This month, in a new series of statistical analyses, Bill James introduced some new ideas (to most of his readers, anyway) about defensive measurements.  His posts at BillJames.com and the comments can't be read if  you haven't subscribed to the site, but I'm repeating a few excerpts for non-commercial use only (I hope he won't mind that), and in any case, this post is written mainly for other subscribers to the site. 

In the post that kicked off this controversy, Bill wrote:

"It is not a perfect and unassailable truth that Offense and Defense are perfectly balanced, that Scoring Runs is half the game and preventing them is half the game.   It is not a perfect and unassailable truth, but it is a general and usable truth which can be validated in various ways.   If offense and defense are equal then, on a "league" basis—understanding that the league is no longer a completely self-contained entity—but on a league basis, runs prevented are equal to runs scored.  If there were 11,449 Runs SCORED by National League teams in 2019, there must also have been 11,449 Runs PREVENTED by National League teams—not perfectly, because the league winning percentage was not exactly .500, but we can adjust for that.  The question is, who prevented how many of those 11,449 Runs that were Prevented by Defensive Performance?"

In the comments section to that article, nine different people, including myself stated either that they did not understand this argument or that they did not agree with it.  Several others said they did agree with it, but the majority of respondents certainly did not. 

Bill replied a few days later with a long rant (there's really no other word) in another post, informing all of us who had questioned this conclusion, in essence, that he had done everything he could to help us understand this argument, about which he evidently had no second thoughts, and that if we didn't get it, it was our problem.  He also described any "argument" or "challenge to my work" as an"asshole question" to which he wasn't going to respond, and disclaimed any interest in any opposing arguments that we might make.

If Bill doesn't want to read this post of mine, that's his business.  I am writing it for the other posters on his site whom he essentially ordered not to continue the discussion there.  I think that the above statement is wrong, as is another follow-up statement he made in a later post that we will get to, and I want to explain why and solicit comments from other reasons on what they think.

Let's start with some simple logic.  Let's look at the paragraph above. First I want to clarify something that could be clarified better, which I have run into myself: the reason league runs scored don't equal league runs allowed nowadays is interleague play.  One league always scores more runs than the other in interleague play, and that unbalances each league's totals.  That's a minor point.

But what about the statement that runs prevented must equal runs scored?  That, it seems to me, is obviously wrong, for at least two reasons.  The first is the simplest and most important.  Runs allowed are not equal to runs prevented. Runs allowed are equal to RUNS NOT PREVENTED.  That, to me, is so obvious that any further arguments are extra.

Yet there are further arguments.  Bill also argues that what he is trying to do is to disaggregate runs prevented in the same way that the runs created formula disaggregates runs scored.  It is true that every run scored is scored as a result of hits, walks, stolen bases, and a few other miscellaneous things, and no runs would be scored in the absence of those contributing factors.  But runs prevented or runs saved is not comparable in that respect, particularly with respect to fielders  One could argue that it is comparable for pitchers, since they could indeed prevent every single run scored by the opposition by striking everyone out, or by inducing easy chances in the field. But for fielders it isn't true.  There are very large numbers of runs that no fielder could prevent, which are scored thanks to walks, to home runs, or--critically--to balls in play which no fielder could possibly turn into an out.  The offense is a factor in every run scored.  The defense (the fielders) is not a factor in every run allowed--if the offense was good enough, there wasn't a damn thing the defense could do about it.  (Come to think to think of it--the same argument does in a sense apply to pitchers as fielders.  While pitchers could in theory strike everyone out or pitch nothing but no-hitters, no one has ever been that good, any more than any three outfielders have ever been good enough to turn every ball hit beyond the infield into an out.  But that's a side issue.)

Another fallacy in Bill's thinking emerges in a third, most recent post.  Analyzing the 2019 Houston Astros, Bill calculates that based on the league averages of runs scored/allowed, they could have been expected to allow 840.32 runs.  He then says:

"The 'zero point' for them is twice that number.  If they had allowed twice that number of runs, that would be 1680.64 runs allowed.   They actually allowed only 640 runs, or 1040.64 runs less than they theoretically might have allowed, had they had zero talent on their pitching staff and in their defensive play."

What Bill seems to be doing here is to find a baseline for calculating actual runs saved that is different from the average number of runs scored/allowed by every team in the league, a method which he repeatedly rejects.  (And, for the record, a record which I, along with certain other sabermetricians, do use.)  The selection of a "zero point" that is twice the park-adjusted league average, however, seems completely arbitrary.  In fact, as one other commenter said on the first post, a team with zero defensive talent would never retire a batter and would allow an infinite number of runs.  A team whose pitchers and fielders were half as good as an average team--that is, that walked twice as many men, struck out half as many, and allowed twice as many hits of all kinds--would, it seems to me (I haven't tried to do the whole calculation), allow twice as many runs as the average.  That's a very bad team--I don't think there has ever been a major league team that bad--but it isn't an infinitely bad one, or a team with no talent at all.

I am not going to comment on the way Bill has chosen to handle this controversy.  I have said many times in print that I understand a great many things thanks to him, that his work has given me many hours of pleasure, and that the baseball books I have written never would have been written without him.  I will say,  however, that from my own experience in my own career as an historian--which is quite comparable to his career as a sabermetrician, as you can see if you want at ALifeinHistory.com--I know that no level of skill, no amount of work, can exempt anyone, in any field, from criticism, particularly if one's work is genuinely original.  And no truly intelligent person should ever be afraid to admit that they might have been wrong, as Bill has many times in the past.

Feel free to comment!


3 comments:

  1. David,

    Thanks for this post. I have one million thoughts on this topic... so I'm going to try to stay organized here. I agree with you on this topic, particularly the following: "Runs allowed are not equal to runs prevented. Runs allowed are equal to RUNS NOT PREVENTED." That was my gut instinct. With that said, I will talk about a few other things.

    1. I'm willing to give Bill the benefit of the doubt, and see what the research says in the end. So I'm not really complaining about what he's done so far; at this point, he's just taking us through his thought process, which I'm fine with.

    2. I think that what he's done here is so similar to what he did with the idea of "marginal runs" in Win Shares that I'm not actually sure why there are people who defend Win Shares on his site who suddenly disagree with this approach. It's very, very similar... identical, one could say.

    3. The real fault of Win Shares is in its analysis of defense. The "worst" a defensive player could be was 0... that's obviously untrue. Defensive players can be FAR worse than 0, which leads to a systemic overrating of big-bat-no-glove players by that system. Bill sees that, I think, as a function of comparing things to average. The problem was not comparing to average; the problem was that anything below 0, Bill counted as a 0. You can actually SEE this problem all over his explanation of Win Shares, and it shows up in everything he's done with it. (It's the corollary to not publishing Loss Shares, which would completely fix the Win Shares methodology - because then you could compare players to replacement and/or average.)

    4. You point out, with that 2018 Astros example, that Bill has made a new fallacy - with the 840 and 1680 runs. Actually... that's not new. That's the same thing he's been talking about the whole time. First, you calculate average runs, given ballpark. Actual runs minus expected runs gives you runs scored (duh); 2 times expected runs is the "zero" for runs allowed, and you count DOWN from that one. Of course this doesn't really work for a million reasons (play around with the Pythagorean theorem at the extremes of runs scored and runs allowed, and you'll find it's true with those, too). But it probably WILL work in the end, because, for like 95% of teams in history, a run scored and a run saved are basically the same thing.

    ReplyDelete
  2. 5. But THAT is where we get to the heart of the problem with Bill's reasoning. He hates average; he says average is bad. But literally this ENTIRE EXPERIMENT is based on the idea of average. How do you get expected runs? You first find average! How do you get the "zero point"? You double average! The fact of the matter is, you NEED average in order to find the "zero." So Bill thinks he's avoiding average, but he's just unnecessarily reinventing the wheel. He could just compare the whole thing to average. And then replacement. It would work fine. You can actually use average (or replacement) to find "zero," too... but Bill wants to do it this weird way. And you know what? It will probably work just fine. But I think Bill is fooling himself when he says he's avoiding "average," because literally all he's doing IS using average.

    6. This is a nitpick, but Bill also doesn't always convert things the way he should. In a number of these examples so far, he's talked about who saved the "most runs" or "least runs"... but the problem with that is that, while he tries to normalize a lot of things for era, he doesn't got back and normalize RUNS for the era. In yesterday's (3/27) article all but three of the top ten teams in runs saved are from very high-scoring leagues. He says we'll take that all out at the end... but that means that, so far, that article told us literally nothing. Let's put it this way: it's not how I'd present my findings at this point in the process. And it's why WINS is a better currency than RUNS. (Actually runs is the better currency, but you'd have to convert runs to wins, and then convert those back to a historically-normalized number of runs... but that's a whole different discussion.)

    End of the day, I think Bill's probably not going to do anything that's going to rock our collective world here when it's all said and done. It's probably not going to be all that different than the information we already get from WAR. But Bill insists on blazing his own trail, even when there's already a better trail somewhere else. It's what makes him lovable, and what makes him so intolerable. So for my part, I'm going to mostly just sit on my hands until the end, wait to see what we get, and then decide whether this new information is worth anything or not.

    Thanks for creating this thread. To say that my wife would have no interest in listening to me talk about these things is the understatement of all-time. So I appreciate the forum in which to share my thoughts.

    ReplyDelete
  3. I think the best explanation of why Bill's "runs scored = runs prevented" thing can't be true came in a comment made by dfan on the article where he first made the argument. Imagine a league that scores X runs in a season, so we're going to say that they prevented X runs. Now imagine an alternate reality where everything happens the exact same way in that league, with the only difference being that in one game an outfielder muffs a fly ball with 2 outs and the bases loaded. In the original timeline it was the 3rd out, inning over. In the alternate timeline, 3 extra runs scored. So we're saying that in the alternate timeline league, where the only difference was that one error, there were also 3 extra runs prevented? Taking the delta, that means the value of that error is that the league defense collectively prevented 3 more runs???

    Also, as a somewhat younger person who came to Bill's writings later that many others, I have been "catching up" on his books and online articles over the past few years, and I have been shocked several times by his intolerance for anyone questioning him. I knew from his old books that he could be an ornery cuss, and I honestly found it kind of amusing. But enter the online realm where people can talk back at him and he just can't seem to handle criticism at all. Oh well, none of us are perfect.

    ReplyDelete