On Twitter, pals and co-authors Tom Tango and Mitchel Lichtman had a fun little argument about two interesting baseball statistics, OPS (On Base Plus Slugging Percentage) and wOBA (Weighted On Base Average). Tango made the solid argument that OPS is a mathematical crime -- you are adding together fractions...

Does anyone know why the wOBA formula values an HBP (.72) more highly than an unintentional walk (.69)? The only difference I can think of is that an HBP is a dead ball and so no runner can advance, but that would only matter if first base is open. And the odds on any given walk of a runner getting thrown out trying to steal third seem minuscule — I can’t remember ever seeing it.

Perhaps, and I have no idea if this is remotely close, it has something to do with how a HBP can eject the pitcher and force a reliever to come in with little warm up time, so it’s slightly more advantageous even though that scenario rarely happens.

Walks are given out disproportionately with 1B open. So walks have slightly less runners-moving-over value. HBP are fairly random.

What in the world is wrong with Jose Ramirez this season? Joe, we need 1000 words on this.

There have been steroid rumors mentioned, as well as his infatuation with launch angle. Whatever it is, he’s a mess, and he appears unfixable, at least for the Tribe. Amazing that a player has gone from being one of the 5 best players to one of the five worst regulars in less than 12 months.

One of my favorite days as a young fan was learning how to calculate batting average. It was like the world of baseball suddenly made sense, and I couldn’t wait to figure out Rico Carty’s average, as well as my little league average. Oh, and that Boog Powell card? Had several, and I just wish he was wearing the red pants too.

The reason many people find BA easy and wOBA hard is that BA is “LEGALLY” hard, whereas wOBA is MATHEMATICALLY hard.

I’m quite a mathematical person, and so (like Tango and MGL), the idea of wOBA is very simple to me. BA weights all hits equally, and walks not at all. SLG weights a home run four times as much as a single. Neither is quite right. Of course a home run is better than a single, but every three-run homer needs two men on base first. One way to think about it is that say a single drives in about half of the runners on base, and scores half the time, whereas a home run drives in *every* runner and scores *all* the time. So a single is worth about half as much as a home run. The actual numbers are slightly different because my estimate was so rough, but that’s the basic idea: add up every thing the hitter did, and give credit or blame based on how they help score runs.

On the other hand, the rules for what counts as an AB (or a hit) seem arbitrary and ridiculous to me. It counts only if you hit it. Or if you don’t hit it but get out. And if you hit it and get out but it helps your team it’s not an AB, but only sometimes.

But the idea is the same. Add up everything the hitter did and give them credit or blame based on some system. The old system is “legal” with all of its rules and exceptions. The new, better system is mathematical: we look at how much singles, doubles, outs etc. have helped or hurt teams in the past.

The issue is that many people DO NOT like the mathematical way of doing things, and would rather a long list of exceptions and exceptions to exceptions. It’s just how people’s mind work. You can see it in other contexts: the Supreme Court is famous for being 9 brilliant minds who are flummoxed by basic mathematics. There’s nothing WRONG with thinking this way, but it is true that the best statistics are created the other way.

So MGL and Joe are completely right. The challenge is to (as well as using the most accurate stats) use stats that are mathematically simple, even though they may be legally complicated. OPS is great for this. It’s a remarkably good stat for being somewhat arbitrary.

I would love to see a stat that involves very little math, like BA or maybe OPS, but that kind of reverse engineers something like wOBA by working backwards and reverse engineering what you include/exclude so as to make the numbers work. OPS does this wonderfully: you take a stat where singles are overweighted and add it to a stat where they’re underweighted. Maybe with a careful accounting of what “counts” as a “hit/total base” and an “AB/PA”, you could make a ratio stat that tracked more with actual production.

Being mathematically inclined myself, I agree with Andy that the formula for wOBA is somewhat explainable in how it values 1B/2B/3B/HR (although Michael’s comment about why uBB and HBP are valued slightly different isn’t obvious at all).

That being said, NFL telecasts are constantly dropping QBR (quarterback rating), which is about as ad hoc as a statistic can be (see https://slate.com/news-and-politics/2001/08/how-does-the-nfl-s-quarterback-rating-system-work.html if you want to see for yourself). Somehow, the TV watching public is accepting of that without making disparaging comments about geeks in their parent’s basements.

Thanks for the post, Joe. This conversation between Tango and Lichtman captures one of the hard truths about the use of statistics in baseball. Whether or not BA is an accurate measure of a player’s offensive prowess (we know that it isn’t), or whether or not OPS is an accurate measure of offensive value (better than BA, but problematic for the reasons stated above), is not really the point.

BA and increasingly OPS, have acquired the power of language. Power that goes beyond the actual calculations. And once that happens, the complexities that underpin each statistic are less meaningful. We use those measures to signify excellence or lack thereof–like Joe’s table of OPS values between .600 and 1.000+, or what we think of when we see that someone has a “three hundred” batting average. Think of what a particular number signifies when you read it on the back of a baseball card; that significance is far more important than how the number is calculated. My dad doesn’t care about the mathematical calculations from which those numbers are derived. He cares about what those numbers communicate. That’s the power of language.

Bill James wrote a book about baseball managers. In that book he has an article called “the Manager’s Record” that discusses, among other things, what kinds of statistics you could put on the back of a manager’s baseball card. Among the other illuminating parts of that article is his description of what makes a useful statistic. He emphasized that the object of a stat to put on the back of a baseball card was not to provide data, but to provide helpful information. Information that every body will understand, and everybody can use. He was looking for something accessible and simple that could communicate that information.

For better or for worse, BA was the vehicle for such communication for over a hundred years. OPS has, perhaps, acquired some of that communicative power the last decade. Maybe WAR too. I could be wrong, but I doubt that wOBA will acquire that kind of communicative power. Perhaps if we use it for long enough.

For those of us who are a certain age, we had to calculate Strat-o-Magic Batting Averages and ERAs by hand. Even before calculators were common. So yeah, we get ERAs and Batting Averages. I don’t think advanced metrics would have become a thing without computers. But since we have computers, we might as well use them.

To me, it is not just a matter of simplicity, but rather objectivity versus subjectivity. OPS (and its cousins BA, OBP, Slg., etc) tell a fan what objectively happened. wOBA (and its cousins bWAR, fWAR, WAA, etc.) tell a fan how the formula’s inventor, such as Tom Tango, subjectively values what happened.

My point is illustrated by Michael’s question about why wOBA values walks less than HBPs. The simple reason is that Tom Tango, as the inventor of wOBA, believes HBPs are roughly 4% more valuable than uBBs. Likewise, Tango apparently believes that a double is roughly 43% (1.27-.89=.38; .38/.89=.426) more valuable than a single. Tango might be right, but he might also be wrong. Perhaps a double is only worth 25% more than a single; perhaps a double is 100% more valuable than a single. Tango undoubtedly put a lot of thought and effort in arriving at these values and there is certainly a lot of merit in the wOBA formula. But regardless of how much thought and research went into creating the formula, it is based on Tango’s subjective viewpoint. And there is obviously a lot of difference in opinions about how to accurately measure all of the events that take during a game and a season. If there were not differing opinions, then there would be no need for so many different formulas. Everyone would have accepted VORP (or whichever early formula you like) and we would not need fWAR, bWar, WAA, wOBA, etc. Ultimately, these formulas tell us a lot about the players but they also tell us a lot about how their respective inventors view the game.

Statistics like OPS and batting average, however, objectively measure exactly what they claim to measure. A player’s OPS is the same whether you look it up on MLB.com, fangraphs, or baseball-reference. That obviously is not true for these subjective formulas. wOBA is not even found on all baseball-related websites and we all know that WAR changes based on which website you are using. But OPS, BA, Slg. %, and OBP, regardless of how incomplete you think they may be, all give an objectively identifiable number on which everyone agrees.

I do not mean to disparage these “subjective” formulas like wOBA because they are quite helpful and I often review them. But when I want to know what kind of season a play had or is having, I will always first look at the objective numbers. And then I will see how various websites and formulas value those numbers.

It’s a bit misleading to call wOBA subjective. The weights for all the pertinent outcomes weren’t subjectively chosen at all, but were determined by statistical correlations with actual game results. This approach is actually a pre-cursor to the “big data” training that is so common in the tech industry right now. All major machine learning approaches train their weights on data in somewhat analogous ways, so to call the wOBA weightings subjective is like saying that Siri or Google Translate are subjective.*

(* Of course, the choice of training data for machine learning can be subjective and leads to some of the concerns about biases when data population samples are not fully representative of the eventual subject set. That isn’t the case for baseball statistics that feed the advanced stats.)

With all due respect to Tango, who I have been reading for years, of course accessibility and simplicity matters! The delineation of the numbers by hundreds as Joe did above matters as well. There are boundaries that are easily recognized. I don’t know offhand what a good wOBA is. Looking it up on Fangraphs it seems as if the median for qualified players is .336, and half the players are between .313 and .368. Not nearly as different sounding. Yelich and Rendon (mentioned in the article) are 2nd and 3rd overall at .454 and .451, and Ramirez is 159th out of 166 at .269. (After finishing third in the majors last year. Has there ever been a more precipitous fall in a shorter time by a player who wasn’t injured or suddenly old? Not that I can remember. Also, when I bench him for a day in my fantasy league, he steals two bases.)

When OPS became mainstream, it led to OBP and Slugging being shown on the scoreboard at ballparks and on TV. It led to mainstream (not just stat geek guys) awareness of the value of walks and power. It indirectly led to fan interest in other deeper stats (including Tango’s) as well.

I have no doubt that wOBA is a much more accurate measure of a player’s prowess at the plate. Hell, I have a stat I developed that deals more with runs and outs (Baseball functions on outs, not PAs) that blows OPS out of the water. (I have even used some of Tango’s work over the years to make it better) But when I am just watching a game I still look at OPS to give me a basic idea of how a player is doing rather than breaking out a calculator (and looking up the rest of his more obscure stats on the computer) to run my formula on him. wOBA is, I am sure, more accurate in telling you how good someone is as a hitter. But I don’t think I’ll be seeing it on a scoreboard or a TV screen anytime soon, and until I do OPS will remain the more relevant measure, even though it may be the less accurate one.

I’d be curious if using OPS consistently and significantly overrates or underrates players compared to wOBA, or if it mostly evens-out. For example Joe’s list above, would wOBA have them in the same order as OPS? Maybe a couple players flipped, but mostly the same?

wOBA is scaled like OBP. That’s why I chose the name: weighted On Base Average. So, rather than everything in the numerator getting a “1”, they are weighted, but centered around 1. Walks are at 0.7, HR at 2.0, and overall, all the safe events average to 1.

So, if you know what is a good OBP, you now know what a good wOBA is.

I also prefer Standard wOBA, so that we don’t have the coefficients changing based on the run environment. I provided that here:

http://tangotiger.com/index.php/site/comments/standard-woba

However, when I periodically ask if we should make this the official version, I get push back. The majority (of my followers anyway) WANT the complexity.

But, we’ll see…

Also, the weighting system is correct. So, to the person who said: “Tango might be right, but he might also be wrong.” I can tell you: “Tango is totally right.” It’s math. And the beauty of math is that when you can prove something mathematically, you are automatically right!