Showing posts with label playing with stats. Show all posts
Showing posts with label playing with stats. Show all posts

Wednesday, September 16, 2009

The Somethingiest Something of the Aughts: The Hitters

Funny thing about writing a daily blog with no remuneration and no one to hold you accountable: sometimes life gets in the way and you don't really feel like writing anything. Sorry about my absence on Tuesday, but real-life Monday sucked like you wouldn't believe. Things now are...not okay, but they're not getting any worse, so here's your new thing.

As he often does, Rob Neyer made me think of something today. He pointed out (via some link to somebody else) that there's something of a race for the batting champion of the decade, with Ichiro! and Pujols running pretty much neck-and-neck. Which left me wondering who led in all the various other categories, and by how much. And as long as I was wondering, I thought I might as well write about it. Verducci, the "somebody else" in Neyer's post, did much the same thing, but I don't care about that, and I'm going to look at some different categories and in a different way. So away we go, stats through Monday night:

Home Runs: Alex Rodriguez, 430
No surprise here. A-Rod led his league in homers five times this decade, and this is the first year he's likely to finish out of the top eight (and he's only four out of the top ten, with at least two of the dudes in front of him out for the rest of the season). What's a little surprising is by how much A-Rod leads: he's up by 62 over Jim Thome's 368, meaning he's hit about 17% more homers than anybody else this decade. The 1990s' leader was Mark McGwire, with 405. The 1980s? Mike Schmidt, with 313. Eight players have hit more than 313 homers from 2000 through 2009, and I suppose Andruw Jones or Lance Berkman could make it nine or ten with a couple hot weeks.

Runs Batted In: Rodriguez, 1227
That's right, the unclutchiest choker ever leads the decade in the lazy man's ultimate clutchy stat, by a comfy 125 over Pujols (approximately one season's worth, which is appropriate since Pujols didn't start playing until 2001). Your 1990s leader was Albert Belle (really?) with 1099, and 1980s was Eddie Murray with 996. Murray's total would place 10th in the 2000s, right between Big Papi and Bobby Abreu.

Runs Scored: Rodriguez, 1181
That A-Rod guy? He's a good player. And one who stayed pretty healthy for an entire numerological decade, which has at least as much to do with it. This is a closer contest than the ones above, with Johnny Damon close behind at 1110. Derek Jeter and Bobby Abreu mean that four of the top five have spent at least some of the decade as Yankees. 1990s: Barry Bonds, 1091. 1980s: Rickey Henderson, 1122. Hey, score one for the eighties, almost!

On-Base Percentage (min. 3000 PA): Barry Bonds, .517
What what what? Bonds OBP'ed over .500 for the whole decade? Somehow that shocks me. But I guess OBPing .559 in 2001-2004, four of his five full years in the decade, will do that. Todd Helton is a distant second with a Coors-aided .439, with only three other players within 100 points of Bonds. Frank Thomas led the nineties at .440 (Bonds just behind at a merely fantastic .434); 1980s, Wade Boggs at an equal but more dominant .440.

Slugging Percentage: Bonds, .724
Naturally, and well ahead of Pujols at .630 (though Pujols will end up with nearly 2000 more plate appearances in the decade). 1990s: McGwire, .615 (Bonds right behind again at .602); in the 1980s, Schmidt at .540. In the aughts, you'd have to go to #19 before you drop below .540; Schmidt slots between Teixeira at .542 and Bagwell at .534.

OPS+: Bonds, 221
Well, duh. Pujols second at 173, then Manny at 160. Theoretically, this should be pretty constant across the decades, and it almost works that way, but doesn't. Bonds paces the nineties again at 179, Schmidt the 80's at 153.

Stolen Bases: Juan Pierre, 455
That surprised me a little, but Pierre has played since 2000 and was a regular from 2001 until late 2008, while Carl Crawford (#2 but way behind at 359) didn't play full time until 2003 and missed about a third of 2008. 1990s: Otis Nixon, 478; 1980s: Rickey Henderson, 838. Rickey led that decade by a whopping 255 (over Tim Raines) and missed leading the 1990s by 15, coming in second place. He was #105 in the 2000s.

Hit By Pitch: Jason Kendall, 155.
Up by 17 on Jason Giambi. I never thought of A-Rod or Jeter as guys who get plunked a lot, but they're both in the top ten; lots of plate appearances -> lots of stray inside fastballs, I guess. Chase Utley has been hit 104 times despite not becoming a regular until 2005. Craig Biggio was hit 147 times in the 90s (and was fourth in the 2000s at 132). Don Baylor crushed everyone else in the eighties with 160, 52 more than Chet Lemon and more than three times as many as #8 Lloyd Moseby.

Sacrifice Flies: Mike Lowell, 76.
Now that's a surprise. One leadoff triple by Denard Span could mean that Lowell gets tied by the even more surprising Orlando Cabrera, now at 75, and don't count out the less surprising Carlos Lee (74). After that, you hit Abreu at 66, and I don't think he's getting ten sac flies in three weeks. Frank Thomas had 82 in the nineties, Andre Dawson 74 in the eighties.

Double Play Groundouts: Miguel Tejada, 222.
Again, the identity of the leader is surprising, but even more surprising is the margin; Miggy is crushing Paul Konerko and his 193. Belle led the 1990s at 172, and Jim Rice predictably dominated the 1980s with 224. Rice's 224 trumps Tejada's 222 by more than it looks like, considering that (a) Julio Franco was second in the eighties at 166, which would've been seventh in the aughts, and (b) Tejada took over a thousand more plate appearances than Rice did to arrive at his total.

Plate Appearances: Bobby Abreu, 6864
This one could very easily change hands before the end of the decade, as Derek Jeter is only six behind Abreu and is batting leadoff for the best offense in the majors. Next is Tejada, a hundred behind Jeter. Biggio had 6794 in the nineties and Dale Murphy had 6540 in the eighties.

Hits: Ichiro!, 2005
He's 85 ahead of Jeter or anyone else for the decade, which is especially impressive when you consider that he was in Japan for the year 2000. Going down the rest of the list, Pujols is the next one you'll see that did not play at least a little big-league ball in both 2000 and 2009 (he's ninth at 1697), and to find the next such player, you'd have to go all the way down to #33 and Jeff Kent, who retired after last season and may end up 600 hits behind Ichiro for the decade.

Thursday, September 10, 2009

Three Comparisons

One: Half-Season MVP Division
Through their first 42 games with their new, National League teams:
Manny Ramirez, 2008: .395/.478/.743 (1.222 OPS), 29 R, 14 HR, 43 RBI
Matt Holliday, 2009: .379/.437/.702 (1.139 OPS), 33 R, 12 HR, 41 RBI
(thanks to the StL P-D for that one.)

Two: I Told You So Division
Orlando Cabrera, since August 1: .256/.283/.353 (.636 OPS), -6.2 UZR (yes, -6.2 runs in 34 games. I mean, what?)
Nick Punto, season: .220/.320/.275 (.595 OPS), +1.4 UZR

Three: Obviously, They're Just Being Cheap Again Division
Since June 3:
Nate McLouth: .264/.353/.439 (.792 OPS), -5.2 UZR
Andrew McCutchen: .278/.355/.470 (.826 OPS), +2.4 UZR

Tuesday, August 18, 2009

Prince Albert and the Crown

The other day, I opined in passing that, standing first in HR and RBI and (then) fourth in batting average, Albert Pujols had the best chance to win a Triple Crown that we'd seen in a good long while.

And, well, does he, really? I mean, it's obviously still not likely (it never is), but what are the chances? You probably know by now that I'm not going to sit here and give you precise mathematical odds, but let's look at the English major's version of the question: can we envision it actually happening?

Albert went 1-for-4 on Monday, so this morning is batting .325. Leader Hanley Ramirez's Fish didn't play, and he's been on fire lately and now stands at .356. Already not looking good. Pujols does have the HR lead by one over Mark Reynolds, though (39 to 38 after both hit one yesterday), and is just two behind Prince Fielder for the RBI lead, 105 to 107.

I'm going to commit a big no-no right off the bat and assume away HR and RBI. ZiPS calculated for the rest of the season thinks Albert ends up with 50 HR and 138 RBI, and that that will best Reynolds by two in the former and drop six behind Fielder in the latter. So even in the two categories he's closest in, he's only a favorite to hold one of them. But I'm going to assume he does get both; it just feels like the more likely result to me, and anyway, the bigger hurdle will obviously be the batting average. Also, if Pujols goes on the kind of hot streak he'll need to in order to win the batting title, odds are he'll be piling up the HR and RBI too. So in reality, I'm sure there's not even a 50% chance that Pujols ends up leading in both HR and RBI, but let's just say he does it.

Now. The Marlins have 44 games left, and Hanley has averaged 3.88 AB per game played. Say he starts every one of those 44 games; at that rate, he's got 170 more AB. This season, he's been BABIPing out of his head, with a .404 batting average on balls in play that's unsustainable by anybody; his pre-'09 career BABIP was approximately .340. So say he reverts back to that, and maintains his current HR and strikeout rates. He strikes out in 18% of his ABs (31 times), homers in 4.2% (7), and gets a hit in 34% of the remaining 132 (45). That makes him 52 for 170, a .305 BA over the rest of the season (seems unrealistically low, doesn't it? Wonder if I'm doing something wrong...oh well, pressing on). That still puts his overall 2009 batting average at a robust .340.

By the same AB/G * Games Remaining formula, Pujols ought to have 147 AB left in his season. He'll need 57 hits in those 147 AB--a .388 batting average the rest of the way--to put him at 192/563 = .341 for the year.

Pujols has been a bit down on BABIP this year (.294), either because he's been unlucky or because he's hitting more flies and fewer liners. But let's assume, again, that he gets back to his career BABIP (.321) and keeps the other rates the same. 11.4% Ks (16), 9.2% HRs (14), and 32.1% of the remaining 117 ABs are hits (38). That makes him 52 for 147 (.354), and puts him at just .332 for the year...but if just five more hits fall in (or leave the yard) somewhere in there, he's right where he needs to be.

Doesn't sound too bad, right? Not likely, sure, but with just five hits' worth of better-than-average luck and with a slide back toward the mean by Hanley, it could happen! And just last year, from July 10 to August 31, Pujols played in 45 games and hit .392. So I'm not sure there's anything Pujols can't do, but if there is, hitting .388 in 43 games ain't it.

So, sure, it can happen. If Hanley slips back to .340 or so (if he stays at .356, Albert has to put up a .450 average the rest of the way to catch him). And if the current #2 in average, Pablo Sandoval at .330, doesn't finish just as strong as Pujols does. And if Pujols holds off Reynolds for the HR title and Prince for the RBI one.

So the odds of this actually happening are probably tiny. Not statistically insignificant, not one in a million, but small enough for most of us lay folk to write it off more or less completely. Still, though, it's absolutely possible (certainly more likely than Mauer hitting .400, which we're still hearing a lot about), and probably the "best" odds at this point in the season that anybody has had in many years. I think it's something we should really keep an eye on for at least the next week or two (though if he goes 0-for-9 in the next two days or something, it's basically all over).

Thursday, June 18, 2009

Weird Wright

Hey, real baseball!

By any reasonable analysis you want to do, David Wright is having the best offensive year of his career. He has (through Tuesday) a career-high 161 OPS+, .430 wOBA, and already has 6 wins above replacement according to BP's WARP3 (which is insane). He's leading the NL with a .365 batting average (40 points over his career high) and a .458 OBP (42 points over his career high), while posting a .526 SLG that's right in line with his career average of .532. He's even stolen 18 bases, second in the NL (though he leads in CS with 8, already a career high in that category, so he's barely breaking even when he runs and probably should go back to being more selective).

The amazing thing you probably already know is this: Wright, who has a career full-season low of 26 HR, is doing all this while having hit just four homers all year. He's on pace to hit 11 all season, or three fewer than he hit in 283 PA as a 22 year old rookie in 2004. He's balancing some of that out with doubles, but he's only on pace for 8 more of those than in '08 (50 total, but he's always hit a lot of doubles), so his Isolated Power is down 70 points from '08; that SLG is being sustained mostly by that astronomical batting average.

Some have written that it's too hard to hit HR in the Mets' new park, so you might think that had something to do with it. Doesn't look like it, though; while overall scoring at Citi is pretty low, it's actually been the fifth most homer-happy park in the Majors so far, and in fact Wright has hit three of his four homers at home.

It gets weirder still. Look at these numbers (lifted straight from FanGraphs):
GB/FB: 0.95 (2008), 0.94 (2009)
LD%: 25.6% (2008), 25.9% (2009)
GB%: 36.2% (2008), 35.9% (2009)
FB%: 38.2% (2008), 38.2% (2009)

So Wright is hitting line drives, grounders and fly balls in almost exactly the same proportions as he did last year. Even fewer of those fly balls (4.6% this year, 7.6% last) are staying in the infield. We'd expect him to be hitting HR at more or less the same rate, even a tiny bit better...but, well, obviously, that ain't happening. You have to assume he's getting unlucky, homer-wise; he has to be hitting the ball pretty hard to maintain that BA, but the ones in the air just aren't carrying quite far enough.

So, we should expect the homers to come around. He's not likely to hit 30 again this year, but it's not unreasonable to expect him to hit 'em at a 30-HR pace from here on out (which would give him a total of about 22 for the season).

But there's a big, huge, flashing neon warning sign for Wright that has nothing to do with his HR power or batted ball types, and this is the incredible part to me: Wright is putting up that huge batting average not only while keeping the ball in the park when he does hit it, but while striking out once per game. He's struck out between 113 and 118 times in each of his four full seasons, but now he's already struck out 61 times in 61 games, which over a full season would top his career high strikeout total by 40+. His walk rate is up very marginally, while his strikeout rate is up by over a third. That's bad.

It's been a while since I've talked about BABIP, so let me just remind you: that sort of thing (a strikeout per game + a .365 BA) just doesn't happen. It varies a little based on the percentages of GB/LD/FB players hit, but when they don't hit a homer or strike out, we expect everybody to have a 30% or so chance of getting a hit (that is, a .300 BABIP). Wright's BABIP right now (well, through Tuesday) is .485. By comparison, Joe Mauer is hitting a ridiculous .429 right now, and his BABIP is "only" .443. Ichiro! is hitting .354, pretty close to Wright's BA, but with a BABIP of .374; he's done it by striking out about 1/3 as often as Wright.

A different perspective: Wright's .485 BABIP leads the #2 (PA-qualified) guy in the majors in that category, Kevin Youkilis, by 76 points. There is no one within 76 points of Wright, and then there are 43 guys within 76 points after Youk. The 2008 leader BABIP'ed .396, 89 points below Wright's '09 number.

So you get the point by now: it's not going to last. Something's got to give--Wright has to start making better contact, or his batting average will start coming way, way down, and then if he doesn't also start hitting home runs (and playing better defense, which is another weird thing I haven't even touched on here), it'll take a huge chunk of his value right down with it.

Wright has had an amazing first 62 games, and is an amazing player. There's really no telling what this guy can do. But I'm pretty confident in this: whatever he does, he'll look like a very, very different player over these last 100 games than he did over the first 62.

Tuesday, June 9, 2009

If it's May 9 rather than June 9...

...and your team's MVP candidate is hitting .228/.343/.447, do you worry?

Because that's Ian Kinsler's line since May 6 (the season started on April 6, so if this were a month earlier that would take us back to about game 1). Fortunately, back in the real world, he hit .321 and slugged .652 for the first five weeks or so. So since May 6 he's lost 47 points of average, 14 points of OBP and 103 points of SLG, but he's still a .905 OPS second baseman, not some .228-hitting disappointment. For now.

Another one: his season numbers are still awe-inspiring, because he hit .400 for the first month or so. But do you think Miguel Cabrera would be getting ESPN.com feature stories right now if the first baseman had put up an .839 OPS with 3 homers through May 9, rather than from May 6 to June 9?

On the other hand, how do you suppose the New York media would react if Mark Teixeira had waltzed into the city and hit .350/.417/.761 with 12 HR in his first month-plus, rather than his second?

Do you think there would be any doubt about his All-Star chances if Ichiro! had hit .400/.439/.538 in April-May rather than May-June? Would the media get off David Wright's back a little bit if he had been hitting .388 with a .500 OBP on May 9?

One thing that drives me crazy is the way that, at least with regard to position players, each passing month is a little less important to us than the last, until you get to September (and that's assuming you're in a pennant race). If a guy hits .400 in April but then hits .200 in May, he's still a good bet to make the All-Star team, while if he hits .200 in April and .400 in May, he's probably still considered a disappointment come June (unless somebody noticed and gave him the Player of the Month Award or something). The April stats count for all the hype, and the October stats count for who's "clutch" and who's not, and all the stuff in the middle just kind of happens.

But if the Mets win by a game or two, Wright's enormous early-May-to-early-June will have been as big a part of it as anything Delgado or Reyes or Beltran could possibly do in August or September. With that decimated lineup, being only three games out at this point is a miracle you can attribute almost exclusively to the wonders that are Wright and Santana. Yet if Wright slips a bit in September (or even if he's his usual stellar self, but is perceived as being "not clutch"), he'll be widely regarded as a failure again. These games (and these stats) count too, people...

Saturday, May 23, 2009

In Defense of Compassionate Sabermetricism

If I'm going to have a horribly unhealthy, gut-busting, productivity-killing Friday lunch, I'm a big fan of Panda Express' Orange Chicken. And there's a decent copycat place a couple blocks from the office, but it was a nice day yesterday, and I was up for a walk, so I went for the real thing. To get that, you have to head to the James R. Thompson Center, a big gathering point for a lot of Chicago that, as I understand it, houses some government offices and whatnot. The Panda Express is really all I'm interested in.

So I get there, and there's this big protest going on right outside the building. Up close, people are waving signs about the right to life and how gay marriage is destroying our families, milling about in the general neighborhood of someone who is speaking ineffectively into a megaphone, while across the street is another group of people doing their best to drown out this first group with shouts like "What do we want?" "Abortion rights." "When do we want it?" "Now!" and "Fascists go home!" and I'm thinking to myself, what are these people (any of them) doing here, really? Do they expect to convince anyone by labeling the other side murderers or fascists, or by just being louder? Or do they just like to hear themselves talk? Is there just nothing better to do on a pleasant Friday leading into a holiday weekend?

That's basically how tHeMARKsMiTh sees the world of baseball fans and writers: the internet-savvy sabermetric crowd against the talk-radio-and-newsprint traditional crowd, both sides trying to shout each other down, never getting anywhere. (Of course, that doesn't even remotely do justice to his post. Read it yourself; I'll still be here when you're done. Ready now? Good.) A couple basic things to get out of the way:
  1. I agree with most of his main points. There's a lot of shouting into the abyss that goes on on both sides, a lot of name-calling and making fun, and it's hard to see how any of it does anything at all other than making people on the same side feel smug and superior at the other side's expense. (Okay, I have to make an exception for these guys, who were just too funny. And JoePoz, who's kind of a fence-straddler, anyway. But otherwise, I don't see the point.)

  2. I don't think traditional stats (or most of them, anyway; sorry, Holds and Fielding Percentage) are completely worthless. You've seen me use HR and RBI a bunch of times already. Stats like those give context; even if you believe that VORP or WAR or Win Shares are a perfect measure of player value, think of the traditional stats as the splash of color in the crystal-clear black-and-white picture. They tell the story: what kind of hitter he is, where he likely hit in the lineup, and so on. WAR will tell you that Mark Teixeira and Carlos Beltran were almost exactly as valuable as each other in 2008, but don't you want to know a little more than that? That's where I think runs, RBI, HR, SB, and so forth come in handy.

  3. Another main point of Mark's is that neither side has it completely right. I agree with that, too: there's not much "right" about picking an MVP based on who has the most HR or RBI or Saves, and sabermetric analysis is certainly far from perfect as well -- all you need to do is look at how much the various metrics (WARP vs. WAR, plus/minus vs. UZR) disagree with each other.
But where I disagree with Mark is: I don't see this as being like the abortion or gay marriage debate at all. In those debates, like in the "dialectic" Mark envisions, there are really only three plausible truths: (a) one side is correct; (b) the other side is correct; or (c) the answer is somewhere in the middle. If you have one side that believes that abortion should be legal in all circumstances and one side that believes it should be banned in all circumstances, that's as far as it goes; it can't be more legal than the first side wants it, and it can't be more illegal than the second side wants it. So the one true "right answer" has to be either one of those extremes or something between them.

Not so here. Our advanced metrics are flawed, but the answer isn't some compromise between them and the traditional stats; the answer is more research, and more metrics. The metrics we have have grown out of the more traditional statistics. Saying you prefer HR and RBI to VORP and WAR isn't at all like saying you prefer "Choice" to "Life" or vice-versa; rather, it's like saying you prefer Betamax to Blue-Ray.

Here's how Mark defends the traditional crowd:
Those who follow counting numbers have a point (among many). Baseball revolves around the run. It determines who wins and who loses. Therefore, should you not pay attention more to runs, RBI’s, and home runs? Home runs automatically score a run (making them slightly important) and bring in whoever is on base (making them more important). If the point of the game is to score runs than the other team, home runs and RBI’s are awfully darn important, which gave Howard the edge [over Pujols for 2008 NL MVP].

But this ignores the critical weakness of run and RBI totals (and this isn't a criticism of Mark, who I know understands this: it's just that I don't think there's any way for anyone to successfully defend this position), which is that, in every instance in which you don't hit a home run, your runs and RBI are totally dependent upon your teammates either getting on base for you or driving you in.


This doesn't work well for the NL race, because Howard actually did do a phenomenal job of knocking runners in in 2008 (Pujols was still the clear MVP for other reasons), but take a look at this list (I hope). In 2008, Justin Morneau finished 2nd in the AL MVP voting, while his teammate Joe Mauer finished a distant 4th, based largely (or rather, entirely) on the fact that Morneau had 129 RBI and Mauer managed just 85. If that link went to the right place, though, you'll see that when they batted with runners on base, Mauer and Morneau drove in those runners at almost exactly the same percentage: 19.0% to 18.6%. Morneau gets that huge edge in RBI because he batted with 151 more runners on base than did Mauer. Morneau actually batted with the most runners on base of anyone in the league. Part of that, of course, is because he's not a catcher, and thus got to play every day. But a huge part of that is that he got to hit behind Joe Mauer, and his 2nd-in-the-AL OBP!

So the RBI stat tells you who was at the plate for the final event resulting in the creation of a run, but it can actually distort your sense of how that run was created. Mauer was, hands down, a better hitter than Morneau in '08, and played a much bigger part in how the Twins' runs were scored. When you add in defense and adjust for position scarcity, it's not even close. They're very nice complementary pieces, but Morneau is the Scottie Pippen to Mauer's MJ.

So, yeah, runs are awfully important. On the team level, you could almost say they're all-important (almost). But to look at the HR, runs or RBI a single player has as a way of judging that player's value is never a good idea. Even with Howard: make him the MVP because he drove a bunch of guys in, and you're ignoring Pujols' 100+ points of OBP and 100+ points of SLG, amounting to 100+ fewer outs and many more runs for Pujols' team, and Pujols' vastly superior defense, all for the sake of (a) Howard's good fortune of having 50 more runners on base during his PAs than Pujols had in his and (b) a 2% edge in his success at driving those runners in. It doesn't add up, or even come close.

More to the point, every one of those traditional stats is totally encapsulated in some more advanced metric or other. Whatever skills you think RBI measures, that's also measured, and better, in SLG; or, if you think hitting with runners on base or "in the clutch" is a skill that's worth measuring, stats like WPA/LI do a better job with that. Batting average is a fun little stat for what it is, but OBP tells you the same thing and more. Fielding Percentage is totally encapsulated by all advanced fielding metrics, like UZR and Plus/Minus.

You might think that these things (well, save OBP) are less-than-perfectly accurate, but that's not an argument in favor of going back to the old things; it's an argument in favor of doing more research and finding better new things. UZR may not be perfectly accurate, but it's always, in every possible instance, going to do a better job of telling you who is the better fielder than fielding percentage will. FIP may not be perfect, but it's better than just comparing two players' ERAs. There may be slightly different ways to measure OPS+, but it's always going to be better than not adjusting for era or ballpark factors at all. And so on. We can argue about how good the new stuff really is, but it's just plain better than the old stuff (the well-grounded stuff that gains some level of acceptance, that is, not just any old thing someone thinks up).

So that's the point: I'm not going to use the term "flat-earthers" around here. I try to avoid mudslinging of all types. I have nothing against people who rely solely on traditional stats, and I think those stats have their place. But their place isn't in player analysis, not anymore. If you're going to argue something like that Howard was the 2008 NL MVP and base it on traditional stats, you're going to be wrong -- simply, objectively, obviously wrong. And I'm sincerely sorry to say that. But I'm not trading in my DVD player for a VCR, and I'm not giving up my numbers for a set that tells me the same stuff, but less of it, and with more static.

Friday, May 15, 2009

Luckiest and Unluckiest Pitchers So Far

One of the most interesting of many, many interesting things on FanGraphs is the pitching leaderboards' E-F stat, which is simply the pitcher's current ERA minus his FIP (Fielding Independent Pitching, which I've mentioned a few times--an attempt to measure what his ERA "should" be, with defense, park and luck taken out of the equation). A negative number means the pitcher has been lucky -- the ERA is lower than it "should" be -- while of course a positive number means the opposite. So here are your leaders on both ends of the spectrum so far:

AL's Luckiest: Trevor Cahill, A's.
Cahill has put up some awfully strong-looking numbers for a rookie on a terrible offensive team: 2-2 with a 3.69 ERA in seven starts. His FIP, though, is an astronomical 6.18. Why? Well, he's not striking anybody out, at just 3.23 per nine innings, and yet he's walking more than one batter for every two innings, which gives him an awful 0.70 K/BB ratio. He's getting by right now on some combination of luck, defense, and forgiving ballparks (he's made four of his seven starts at home in the pitcher-friendly McAfee Coliseum, and another one at Safeco), having held batters to a very lucky .256 BABIP.
Prognosis: the kid's 21 years old and a solid prospect, with a minor league history of very solid K rates (one of the best in the minors in '08), respectable walk rates and almost no homers allowed, which makes me think the current flyball rate is a little fluky. He's probably not really a 3.69 sort of pitcher right at the moment, but I doubt he's a 6.19 one either. He should be fine.

AL's Unluckiest: Gavin Floyd, White Sox.
Funny enough, Floyd was one of the luckiest in 2008, with a FIP of 4.77, essentially identical to this year's 4.63. But his ERA in 2008 was 3.84; in '09 to date, it's 7.32. What goes around, I guess. Floyd is having more control trouble this year (4.81 walks per 9 to 2008's 3.05), but is balancing it so far by giving up fewer HR (0.92 to 1.31). The big difference, natch, is the BABIP: he got unbelievably lucky last year at .268, and is unbelievably unlucky so far this year at .380.
Prognosis: Problem is, I don't think the Sox or their fans would have been happy with even just a 4.63 ERA this year after what he turned in last year. So if you were expecting that, you'll be awfully disappointed. Also, the HR rate drop doesn't seem real; he's giving up about the same percentage of line drives and fly balls and has an almost identical GB/FB ratio to '08, so the only difference is that fewer of those fly balls have gone over the fence so far. That's likely to regress, so if Floyd can't find the strike zone more often, he could be in for a very rough year indeed. Just not 7.32 rough.

NL's Luckiest: Jair Jurrjens, Braves.
3-2 with a 2.06 ERA in 8 starts (48 innings), Jurrjens' start has led at least one dude (the bald guy from Princess Bride again) to believe he's quietly becoming one of the best pitchers around. But Rob Neyer always points out that it's really, really tough to succeed while striking out less than five per nine, and Jair is at 4.5, with a very unsustainable .244 BABIP. Accordingly, his FIP is 4.09 -- still very respectable, but more than two runs higher than his current ERA.
Prognosis: Well, his opponent BABIP in 2008 was a very typical .311, but his strikeout rate was a much more palatable 6.64, and so he still posted a 3.68 ERA with a FIP that essentially matched it. And he's only 23, so there's reason to believe he'll improve on even those solid numbers. His pitch speed and selection are very similar to what they were in 2008. If he can get that strikeout rate back up and start getting grounders again when it is put into play (his GB/FB ratio is less than half what it was last year) -- and I don't see any immediate reason to believe he can't -- he should be totally fine, even considerably better than the above-average pitcher his current 4.09 FIP suggests he is. He just hasn't suddenly become Pedro Martinez or something.

NL's Unluckiest: Ricky Nolasco, Marlins.
Strkeouts are good (7.5 per 9). Walk rate is up, but still very good (2.6 per 9). But his ERA is 7.78. FIP says it "should" be 4.34. Problem is, when a batter doesn't strike out against him, he's hitting almost .400.
Prognosis: That BABIP obviously can't last, even with the Marlins', um, unspectacular defense behind him. He is getting hit quite a bit harder than he was in '08 -- 26% of balls put in play off of him are line drives, compared to just 19% in both 2007 and 2008 -- which is why that 4.34 FIP is up about six tenths from last year's. He'll be fine. I mean, he won't win a bunch of games with the way the Fish are going right now, and he might not be the potential ace he looked like last year, but he's at least an average pitcher, and is probably considerably better than that.

Thursday, May 14, 2009

There Goes the Only Reason to Pay Attention to the Nationals

A few words [on/tangentially related to/somehow inspired by] Ryan Zimmerman's just-ended 30-game hitting streak:
  • Not naming names (or linking links) here, but I can't stand it when my fellow sabermetrically-inclined folk say that they're bored by, or otherwise downplay, events like hitting streaks and no-hitters. Look, they're really just oddities, not statistically meaningful. I get all that, and I bet most non-statheads would too, on most levels. But if you can't get at least a little excited about or intrigued by this sort of thing, you're giving credence to the tired old refrain that we're all just misplaced accountants who don't really like to "watch the games." To each her own and all that, but if you can't bring yourself to appreciate the human interest angles of little stories like this, totally fine, but please do the rest of us a favor and shut the hell up about it. It's not like there aren't other things to talk about.

  • On the opposite end of the spectrum, David Pinto has been all over the streak these last few days, with pithy little tidbits like this and this (along with a bunch of other, more news-y updates). My favorite part is this, explaining why the league-wide "hit average" going up eight points has led to a hugely increased frequency of long hit streaks:
    So the probability of a player getting a hit in a four at bat game prior to 1996 was 0.646. In the later period, that’s up to 0.66. That doesn’t seem like much, but remember, we’re talking about long streaks here, so we’re multiplying. The chance of a player hitting in the next 29 games goes from .00000314 to .00000584, nearly double. Now, figure that over all possible players playing at least 29 batting games, and you can see how batting streaks would have increased.
  • I'd really like to be good with numbers.

  • There have been 199 hitting streaks of at least 20 games since 1980, by my count, which is probably six or seven times as many as I would've guessed. Zim's is just the fifteenth in that span, however, to last as long as 30 games. Of those fifteen, Zim's is the eighth to have ended at exactly 30 games. Kind of weird, right?

  • I just remembered that I was at one of those streak-snapping 31st games, Sandy Alomar's at the Metrodome in July of 1997. That's one of the least enjoyable notable games to be present for, since of course you're really there hoping he does get a hit (even when he's on the other team...especially when your own team sucks).

  • Of the fifteen thirty-plus-gamers, only three -- Hal Morris, Vladimir Guerrero, and George Brett -- had career batting averages of over .300 through the year of their streak, though four more of them were over .290. Zimmerman's career average sits at .288 (though, interestingly, he's never had a full season end that high). Anyway, they're all over the map. Eric Davis had the lowest career average at the time of his streak, at .269.

  • A more common thread connecting the 30-game-streak club is that they're all free swingers; you don't get a hit a day by walking a whole lot. None of the fifteen had ever walked 80 times in a year as of the season in which he had his streak (Vlad, Brett, and Luis Gonzalez did it in seasons coming after their streaks...but all with the aid of more than 20 intentional passes), and for most of them, even 70 walks was a pipe dream. Benito Santiago, for instance, hit .300 with a .324 on-base percentage (16 walks) in his "streak year" of 1987. Rollins, Guerrero, Morris, Alomar Jr., and Nomar have very little to talk about with the likes of Jack Cust and Adam Dunn at hitters' cocktail parties.

  • The best performance during a 30-game streak, predictably, was by the great George Brett; in the middle of his .390 season of 1980, Brett hit .467/.504/.746 (1.250 OPS) while hitting in 30 straight games from July 18 to August 18. Paul Molitor deserves a mention, too: he's had the longest streak in this time frame, a 39-gamer in 1987, and posted a 1.178 OPS throughout.

  • The "worst" performance during a 30-game streak, also predictably, was turned in by Jerome Walton. He won the Rookie of the Year Award in 1989, his only decent year with the bat, and hit in 30 straight from July 21 to August 20, putting up an .801 OPS that wasn't all that much better than his year-long .721 line. Dishonorable mention goes to Willy Taveras, he of the 74 career OPS+, who hit in 30 straight games while still managing only an .830 OPS (though that was a good sight better than his putrid year-long .672).

  • In one of his posts on the subject, Pinto wondered whether this year's Nationals were the worst team ever to have a hitter with a streak this long, and the answer, since 1980, is...well, probably. Vlad's 1999 Expos lost 94 games; at 11-21 entering today, the Nationals would have to play .438 ball the rest of the way (57-73) to lose only 94 games. Not a terribly lofty goal, but I don't see it happening, do you? [Edit: Benito's '87 Pads lost 97. So the Nats will have some fairly stiff competition for that title, actually, but I still have faith in them.]

  • The stat report I set up to look at all these streaks, if you're interested, is here.

Tuesday, May 5, 2009

Re-Projecting Youkilis

Content is going to (continue to) be a little light over here for the next couple days. Real work beckons.

Here's a fun little exercise. Everybody knows it's early...but it's not that early. Lots of guys are doing a lot better, or a lot worse, than anybody expected. What if we were (well, specifically in this case, PECOTA was) right about those guys all along...starting now? That is, from today forward, the hitter performs exactly as we expected. What does that end up looking like?

We're going to start with the guy they used to call the Greek God of Walks.

Kevin Youkilis' entry in Baseball Prospectus 2009 lauds Youk's sudden transformation "from an above-average, patient hitter into a legitimate power threat," but then hints pretty forcefully that it's all a mirage. The book notes that a number of his homers just barely cleared the wall, and that he put up an awfully high .347 BABIP that we can expect to come back down. Faced with his impressive .312/.390/.569, 29 HR, 91 R, 116 RBI from 2008, PECOTA saw this line from him in '09, which must've been awfully disappointing to The Nation:

AVGOBPSLGHRRRBI
.275.366.475218184


To date, though (through Sunday, actually), Youkilis has put up this line, leading the league in average, OBP and SLG and in the top ten in just about everything else:
.407.519.71462320


If we start with that line and then give him another 491 PA/441 AB (PECOTA's projected PA minus the ones he's already had) at exactly the rates that PECOTA projected for him above, then (so, he hits .275/.366/.475 the rest of the way), we get this final combined line:
.296.393.518248989

The runs and RBI still look a little low, and honestly, it's hard to see anybody hitting in the middle of that Red Sox lineup and not ending up with 100 of both. Otherwise, though, that line is a pretty gigantic jump from what PECOTA had him pegged at. If PECOTA was exactly right about his true talent and he performs exactly to that talent the rest of the way, his hot start nonetheless lets him coast to near-superstar-level numbers. On the other hand, if, as is at least equally likely, PECOTA was wrong and 2008 was a lot closer to his true talent, this start could propel him to a runaway MVP season. Amazing what one little month can do.

Wednesday, April 29, 2009

The Importance of Catching Strikes

We're going Twins-related again (and graphics-free today), and then yet a third Twins post tomorrow, probably, and back to regularly scheduled programming with a non-Twins gameblog on Friday morn.

If you have Extra Innings, or MLB.TV, or live in Minnesota or central Florida, try to take some time out to catch an inning or two of the Twins-Rays game tonight. Not because I expect it to be a great game, really; they're two pretty interesting teams, I think, and Kazmir is on the hill, but I don't expect it'll be making Lar's Most Interesting this morning or anything.

But, see: Mauer is set to be back for Friday's game, and the Twins are off tomorrow, so this should be the last chance you get for quite a while to watch Jose Morales catch.

After a rough start, I've come around on Morales. He's a switch-hitting catcher, which is rare enough in itself (there's a chance he might move into a tie for 48th place tonight on the all-time-plate-appearances-by-a-switch-hitting-catcher list, with 50), and he can hit a little. But that's not why I want you to watch.

He might be the worst defensive catcher since Matt LeCroy, and that's kind of entertaining -- his throws to second seem to stop for cheese and crackers somewhere above the mound, and he's lost a couple of very routine foul pops -- but that's not it, either, not really.

No, I'd like you to watch part of this game because I'd like you to notice how Morales catches each pitch. That's it! See, as I'm sure you know, most professional (and college, and a lot of high school) catchers practice a technique called framing the pitch, whereby you subtly nudge your glove back toward the strike zone as a close pitch comes in, hoping to get your pitchers a few extra called strikes over the course of the game. (Little white lies make up about 40% of baseball, if you haven't noticed.)

Morales, I've convinced myself, does exactly the opposite, stabbing at pitches that should be strikes and effectively driving them well out of the umpire's idea of the strike zone. I've seen pitches that defined the very concept of "down the middle" called balls because Morales almost falls on the pitch, pushing it down toward the batter's ankles as he catches it. Just watch and see if you see what I see, I guess, because I can't believe I haven't heard anyone comment on it.

Like I said, I like Morales. But he's very likely going to be getting an all-expenses-paid trip to Rochester tomorrow, and this is something he's going to have to work on. Not only is it frustrating to watch, but an extra ball here and there can make a much bigger difference than most people realize.

Say you have an average AL hitter on an 0-1 count. If the next pitch is a strike (and called such), you have the hitter at a huge disadvantage; the American League as a whole hit .172 with a .245 SLG on PAs with the last pitch coming on 0-2 in 2008, and just .185 with a .274 SLG in PAs in which the count was 0-2 at any point in the at-bat! Meanwhile, the league hit a shocking .330 BA/.519 SLG swinging on 1-1 counts.

Look at those numbers again...I think everybody knows that the count is important, but that important? An average hitter becomes an average-hitting pitcher on an 0-2 count, and the same hitter becomes an MVP candidate when he swings on a 1-1 count. So if Morales stabs at an 0-1 pitch and turns what should have been a strike into a ball, he's essentially transformed the hitter from Roy Oswalt into Lance Berkman (if the hitter swings at that pitch, that is -- the stats after a 1-1 count are much closer to the overall league average, because the possibility of a strikeout comes back into play -- but still: would you rather face a league-average hitter or Oswalt?).

I don't really believe in the surpassing importance of catcher defense; I don't think having a guy with a cannon arm or superior wild-pitch-avoiding ability is going to win that many games for you. Matt LeCroy could have caught for my team just about any time, back when he could hit. But from watching Morales and looking at those stats, I'm starting to believe that whatever else he can or can't do, a catcher who doesn't know how to frame a pitch can lose his share of ballgames for you.

Do any other catchers do this? I feel like framing is such an ingrained practice that every single professional catcher does it without drawing attention, but maybe this sort of shortchanging one's own pitcher is more common than I think and I just haven't been paying attention? I'm sure there's a study to be done there (adjusted called strike percentage for catchers against average, or something)...

Tuesday, April 28, 2009

Thing Fifteen: Solving the Twins' Outfield

To this point the blog has, if nothing else, justified its name, with this being the fifteenth new thing in fifteen days. And yet, aside from the occasional cheap shot at Alexi Casilla or Delmon Young, I've completely avoided talking about my own favorite team. The main reason for that is that my goal is to write one relatively succinct, digestible thing per day, and as I'm sure you've seen, I've struggled with that a few times already; if I start writing about the Twins, odds are I'm going to just prattle on forever. But I'm afraid that's a chance I'm going to have to take today. It's just time.

The general thinking is that five outfield/DH types -- Young, Denard Span, Carlos Gomez, Michael Cuddyer and Jason Kubel -- are all good enough to be playing every day somewhere, but only four spots are open to them. So the question coming into the year was: who's the odd man out?

Well, so far, Gardy has done his best to answer that with: "well, nobody! Or everybody, depending on how you look at it!" Through the first 20 games, he's started the following combinations (left-center-right):

Young - Span - Cuddyer: seven times
Span - Gomez - Cuddyer: six times
Young - Gomez - Span: four times
Young - Gomez - Cuddyer: two times
Kubel - Span - Cuddyer: one time

All told, Span has started six in left, eight in center, and four in right; Gomez has started 12 games, all in center; Young has started 12 games in left and one at DH; and Cuddyer has started 15 in right and two at DH. Kubel has essentially been the full-time DH, starting against both righties and lefties, though two others have spelled him there in addition to Cuddyer and Young.

Let's take a look at who these guys are. Two career numbers for each player are given below; the first is wOBA, a system that's about as good as any for assigning one number to the offensive value of a player, and it works on essentially the same scale as OBP (.300 is bad, .340 fine, and .400 great); the second is UZR/150, which attempts to measure how many runs a player saves or costs his team per 150 games played against the average at his primary outfield position, relying on play-by-play data.

Michal Cuddyer (.339, -6.3): the elder statesman of this group (but still a week or so younger than me), Cuddyer had an excellent year with the bat in 2006 (.282/.362/.504, 24 HR, .370 wOBA), but slipped in 2007 and was hurt for most of 2008, and is off to a terribly slow start in 2009. He has a reputation among Twins fans as an excellent outfielder, but fans often confuse excellent arms with excellent outfielders; Cuddy has a cannon, but doesn't get around well at all. His defensive numbers through his first 15 starts this year are bizarrely good (26.4 UZR/150), but his real ability tops out at about a minus-five-run right fielder. He hits righties well enough to justify playing every day for most clubs, but his real talent is hitting lefties, against whom he's a career .280/.368/.439 hitter.

Carlos Gomez (.287, 18.7): Just 23 years old, Go-Go can be both a delight and absolute torture to watch. He swings from his heels (often falling to his knees off a particularly ambitious miss), never walks, is prone to mistakes on the bases, and, in 2008, would often bunt (often foul) with two strikes. But he might be the fastest player in baseball, and he absolutely is the best defensive centerfielder in baseball. As such, he needs only to get on base about 30% of the time, as he did in 2008, to be a useful everyday player. With his youth and talent (and he has a very nice swing on the rare occasion that he keeps it within reason), he still has the potential for much more than mere usefulness.

Jason Kubel (.338, -20.0): He's a better hitter than his career wOBA suggests; that's brought down by a poor first year back from surgery in 2006. He had a .345 wOBA last year and is tearing the cover off the ball in the early going this year, at .417. A typical lefty, Kubel has a career OPS 120 points higher against righties than against southpaws. With his reconstructed knee, he moves like he's about eighty. A team without Justin Morneau might try him at first base, but he has no business "running" around the outfield.

Denard Span (.364, 12.0*): a former first-round pick, Span had pretty much obtained "bust" status heading into 2008, and then suddenly exploded. With an excellent 2008 in both the minors and majors and a similar start to 2009, it seems safe to conclude that Span did suddenly become a player: great eye at the plate, good bat control, good instincts on the bases, some gap power. He can apparently hit left-handed pitching despite being a lefty himself. He's not quite the centerfielder Gomez is, but he can more than hold his own out there, and is an incredible asset in either corner.
* The 12 UZR150 is a reasonable guess; he hasn't played enough games at any one position to really trust the numbers. What's clear is that he's an excellent defensive player at any of the outfield positions.

Delmon Young (.321, -15.8): Bill James recently wrote that Young must be the worst percentage player in baseball, and at this point, frankly, you could almost take "percentage" out of that label. Young, like Gomez, is just 23, but unlike Gomez, he has shown few flashes of potential and no currently useful Major League skills. He's hit around .290 in both of his two full seasons, was once considered the #1 prospect in baseball, and had 93 RBI in 2007. That's enough to convince some people that he's a useful or promising player. Watch him every day, though, and you see something different. In the field, Delmon looks uninterested at best, clueless at worst; he frequently misses routine plays and routinely makes even minimal challenges into adventures (or doubles, or triples). He hasn't balanced that by showing any power, hitting a total of just 23 HR in 1220 AB in 2007-08, and he's drawn just 52 unintentional walks (against 232 strikeouts) in that same period. Even his minor league stats are largely underwhelming. I tend to believe that any player the scouts loved as much as they once loved Delmon must have something going for him, and maybe Delmon will show that something someday. But right now he's here, and here is very, very, very, very far from there.

So what should he be doing with these guys? I see a few things that should be just blindingly obvious:
  • Span should be starting somewhere every day. Not only is he the best overall player of these five, which he clearly is because of his defense; he might even be the best pure offensive player among them. Whatever else you do, if Span is healthy, pencil him in in the leadoff spot and one of the outfield positions. He's already sat out two of the first 20 games, and that's two too many.
  • Young should not be starting anywhere on a contending team. Look, I get the argument. He's a promising player, or people consider him as such, and needs to be playing every day. But if this team intends to compete in the Central -- and this year, every team in the Central figures to compete in the Central -- Delmon has no place on't. Let him start every day in Triple-A (where, it should be noted, he's never exactly proved himself), coach him heavily on defense and pitch recognition, and hope you don't have to call him up before he's ready because of an injury.
  • If you've got a flyball pitcher in the game, Gomez has to be in the game too. The thing about the Twins' five outfielders is that only two of them are good defensive outfielders. So if you've got a guy on the mound who gives up a lot of fly balls (and that's most of the Twins' rotation), your best chance to win is to have both Gomez and Span in the outfield, even if you take a hit on offense.
  • Kubel shouldn't DH versus lefties. Even in 2008, his best offensive year, Kubel had just a .704 OPS against left-handed pitching, worse than the overall OPS of Nick Punto and about equal to Casilla's. Unless Gardy has some reason to believe Kubel has completely come around in that area -- and I really don't think he does -- that's just not a designated hitter.
So here's what I'd do (assuming demoting Delmon isn't an option):

Against RHP: Span LF, Gomez CF, Cuddyer RF, Kubel DH
Against LHP: Span LF, Gomez CF, Young RF, Cuddyer DH

So yeah, first, I'd play Gomez every day, flyball pitcher or no. I really think his defense is just that good, and, like a lot of people do with Young, I want to see him play every day to see if his bat will come around. Moreover, Span blanketing left allows Gomez to shade toward right, minimizing the damage done by playing Cuddyer and/or Young, who can pretty much just guard the line.

Second, I'm never putting Young in left, where his numbers have been uniformly terrible (I've watched him miss a relatively easy foul fly against the Rays as I've been writing this). For some reason, his numbers from about a season's worth of playing right field with the Rays are above average (6.0 UZR/150 in 163 career G). That might just be a blip (and probably is), but it might also be that he had to depend less on his range and more on his strong arm in RF than he does in LF. There seems to be less foul territory in right field in the Dome, and the fence is closer. At least by putting him there you'd be giving him a chance of being a useful player, rather than just watching him flail helplessly around in left every day (as I have to currently).

Third, a Kubel/Cuddyer DH platoon is actually an above-average DH, whereas the current Kubel/Kubel setup is a serious weakness against lefties, especially in a lineup where your two best hitters are lefties.

Is this really worth spending all this time thinking over? ...Well, yes, by somebody (probably not by me, but what can you do?). A Span/Gomez LF-CF would save about 40 runs on defense over the course of a season compared to a Young/Span one, which makes about four wins. And you give a little bit of that back on offense, but honestly, until Delmon actually shows something, it's not all that much (and then Gomez takes a little back again on the basepaths). Four extra wins in the 2009 American League Central could very well mean the playoffs. To Gardy's credit, he knows what his best defensive outfield is, frequently subbing Gomez in for Young and shifting Span to left in the late innings of close games. Now someone needs to explain to him the kind of difference having that for nine innings could make.

All that said, if Joe Mauer doesn't come back on May 1 and knock the ball all over the park for 130 games, it's not going to matter. But they might as well put their best lineup out there until we know for sure...

Thursday, April 23, 2009

Getting in His Head, Tangibly

On Monday night, in what I've determined looking back must have been the Pirates-Marlins broadcast, the local (almost certainly Marlins') broadcasters were chatting about the importance of the leadoff man getting on base, and the commentator said something very much like this: "Also, of course, you can never know how much impact that is going to have on the pitcher, because he's going to be distracted by the fast runner on first base, um, so you don't know what kind of effect that's going to have on his pitching." The play by play guy agreed, calling it "one of those intangibles that is just so important on the baseball diamond." That's totally paraphrased just to give you an idea, but I do know he used the word "intangibles."

Of course, if someone is talking to you about baseball and uses a form of the word "intangible," the odds are very, very good that you're in the process of being told a lie. Baseball people have this way of equating things that they do not know with things that cannot be known. And that's silly.

The impact that a good baserunner being on base might have on a pitcher's effectiveness at pitching to the next batter may be difficult to measure (and it is -- very, very difficult, or at least very, very time consuming), but it's as "intangible" as the chair I'm sitting in. Do pitchers pitch worse with Jose Reyes on first base than they would with Jose Molina on first base? Better with Carlos Delgado taking his lead than with Carlos Gomez? It's a pretty simple question. The fact that nobody has spent hundreds of hours digging through game logs to find the answer yet (as far as I know) doesn't mean that answer is "intangible"; it means the answer isn't important or interesting enough to spend all that time on.

So here's a little exercise; it's not at all meant to definitively answer the question, but just to show that the issue can in fact be, um, tanged (real word! ...but not at all the root of "intangible"). There's no way I'm going to go out and figure out how individual pitchers performed with various individual runners on base. But how about this: if speed on the bases makes pitchers less effective, shouldn't we expect guys who hit behind speedsters to do better with runners on first than batters who hit behind slow or average runners do with those base-cloggers on first?

Chase Utley is a good example. The Phillies' most common 2008 lineup had him batting in front of Rollins and Victorino, both top ten in stolen bases. When one of them wasn't in front of Utley, they were replaced by Jayson Werth (who stole 20 out of 21) or fleet-footed backups Taguchi and Bruntlett. We can assume that very nearly every time Utley came up with men on base, those men were very good baserunners. If Utley did better with guys on base than without, and the difference is more than the typical difference between hitting with the bases empty and occupied (and there is usually some difference regardless; pitchers just do better throwing out of a windup), that might start to suggest that a "speed guy" getting on base really does impact the pitcher's effectiveness.

Of course, a pitcher has enough to worry about with Utley at the plate, and that's too small of a sample size anyway. So I'm going to take a bunch of guys in the National League who are likely to have hit with a lot of speed guys on first base in 2008 (a pitcher might be just as scared of a good runner on second base, but counting that performance would be more likely to catch other runners than the ones I have in mind), and see if any patterns emerge, though of course the sample size will still be way too small to tell. Batting orders change more than you might realize -- Victorino, for instance, usually hit behind Rollins, but also spent time hitting behind anti-Rollinses Pat Burrell and Ryan Howard; Castillo hit mostly behind Reyes but also behind Brian Schneider. So this isn't perfect, or even especially meaningful. But I'm confident that, when each of these players came up with a guy on first in 2008, there was a better than 50% chance that that runner was a threat to steal 20 or more bases.

Baseline: The 2008 National League put up a .731 OPS with the bases empty, and jumped up to .761 with a runner on first, so your average hitter will increase his OPS by about 30 points with a runner on first. If the fast-runner-distraction effect is real and significant, these guys should do considerably better than that as a group.


Utley: Batted second or third in every game he started, behind at least one stolen base threat approximately 100% of the time.
- Bases empty: .886
- Runner on first: 1.014
- Difference: +128 points


Victorino: Batted second behind Jimmy Rollins 81 times; led off (so rarely came up with a runner on) 14 times; batted 5th or 6th 37 times.
- Bases empty: .819
- Runner on first: .797
- Difference:
-22 points


Jeremy Hermida: Batted second in 87 games, almost all behind Hanley Ramirez. Also batted third, sixth, seventh and eighth a handful of times each.
- Bases empty: .718
- Runner on first: .773
- Difference: +65 points


Luis Castillo: batted second behind Reyes in 58 starts; batted seventh or eighth about 20 times.
- Bases empty: .629
- Runner on first: .643
- Difference:
+14 points

Andre Ethier: batted second 80 times and fifth a bunch of times; hit behind Furcal, Pierre, Kemp or Martin about 80% of the time, but also occasionally behind Kent or Manny.
- Bases empty: .935
- Runner on first: .623
- Difference: -312 points


Ryan Church: hit directly behind Beltran or Reyes about 80% of the time; Delgado occasionally came between him and Beltran.
- Bases empty: .703
- Runner on first: .971
- Difference:
+268 points


Ryan Theriot: hit second 104 times, usually behind Soriano.
- Bases empty: .752
- Runner on first: .786
- Difference:
+34 points


J.J. Hardy: seems to have batted behind either Weeks or Hart in about 3/4 of his starts.
- Bases empty: .832
- Runner on first: 1.052
- Difference:
+220 points


Average [fake, unweighted average, just a basic add-and-divide of the above]: +49 points


So these eight guys, overall, did get a bigger bump from having a guy on first than the 30 points the league as a whole got. What does that tell us? Well, absolutely nothing. Take away Ethier, and it's a huge difference; take away Hardy or Church, and it's a smaller-than-average bump. But these guys as a whole could be benefitting from pesky baserunners getting on base in front of them. To figure out whether or not they are would take a much, much longer and more sophisticated study, one which neither you nor I have the time or inclination (or, in my case at least, skill) to get into.


But here's the point: it could be figured out. And someday it probably will be, if it hasn't been already in some study I'm not aware of. The day when people who are paid to write or talk about baseball generally stop referring to certain very tangible things as though they're mystical and unknowable just because those people can't be bothered to take the time to know (or even honestly think about) them will be a very happy day.

Wednesday, April 22, 2009

Fun with Small Sample Sizes

  1. The Yankees sit at 8-6, but are on pace to score 810 runs and allow 972. This would make their expected (Pythagorean) record about 66-96.
  2. Then again, if Chien-Ming Wang were allowed to make 30 starts at his current pace, he'd give up 230 runs (in just 60 innings). This would be a record since 1901, narrowly edging out Snake Wiltse's 1902 effort (in 300 innings). The record since 1950 is Phil Niekro's 166 in 1977 (in 330 innings).
  3. Miguel Cabrera (through Monday, prorated): .489/.538/.787, 635 AB, 149 R, 310 H, 54 HR, 162 RBI
  4. Carlos Quentin: 87 HR, 162 RBI, 150 R...12 2B, 0 3B
  5. Brian Giles is hitting .151/.211/.189 (through Monday) and is on pace for twelve runs scored, zero homers...and 87 RBI. That's how you know RBI is an awesome and totally not at all context-dependent stat.
  6. Washington Nationals (through Monday): 27-135 (.167), 770 RS, 1040 RA, Pythagorean W/L: 57-105.
  7. Raul Ibanez: .383/.442/.830, 176 R, 68 HR, 149 RBI, 14SB/0CS, about four defensive runs saved. Which totally makes sense considering the following hilarious evidence (from Lookout Landing): 1, 2, 3, 4, etc. So, yeah...it's a long season.