TCS Daily

Baseball, Hotdogs and Apple p

By Bruce Bukiet - September 20, 2002 12:00 AM

How do you determine the best batting order for a baseball team? How much difference does it make?

Lots of statistics are collected in baseball. These statistics have been used mostly to rank individual players. Who has the most home runs, the highest batting average, the highest slugging average and on and on. It's more difficult to use the stats to understand how an entire baseball lineup should perform. But with some clever thought and the simple mathematical tools of addition, subtraction, multiplication, division and elementary probability, one can answer interesting baseball questions.

Modeling a Lineup

Brute force attempts to compute the expected number of runs for a lineup are too inefficient. Even a simple method - considering only walks, singles, double, triples, home runs, and outs - could entail considering the number of runs scored by more than 1031 possible sequences. And this would give the results for only one potential lineup! There are more than 360,000 possible batting orders for 9 players.

A great improvement in efficiency comes from noting that there are only 25 situations that can occur in a half inning in baseball. There are zero, one or two outs. There are 8 baserunner situations: no runners, a man on first base, men on first and second ... or bases loaded. Finally, there are three outs when the inning is over.

By realizing that a batter¡¦s plate appearance turns one of these situations into another, the growth of work reduces from exponential to linear. Considering 40 batters takes just twice as much work as 20 batters. Of course, one must keep track of the runs scored.

In a nutshell, this analysis yields the probability of a team being in any inning/baserunner/out/number-of-runs situation after a given number of batters have been up. The result is a lineup's expected number of runs and its run distribution: how often it could expect to score no runs, one run, etc.

Making a Great Lineup

Now, the best lineup is, of course, the one that scores the most runs on average. By considering the 362,880 possible orders for the 9 players, as above, we can order the lineups from best to worst.

My colleagues and I studied the 1989 National League to ascertain principles common to optimal lineups and reduce the number of lineups we needed to test. We ranked players by Scoring Index - the number of runs a team would score on average if it had 9 copies of the given player. Interestingly, we found that the slugger - the player with the highest Scoring Index - should bat second or third on 3/4 of the teams and bat fourth on only 1/4 of the teams considered. We also found that the pitcher should almost never bat last. (These two findings are, of course, not in keeping with the way most managers construct their lineups.) The rules to consider in arranging a lineup are:

  • Place the best batter (by scoring index) second, third, or fourth.
  • Place the second best batter in the first five positions.
  • Place the third and fourth best batters in the first six slots.
  • Place the fifth best batter first, second, fifth, sixth or seventh.
  • The sixth best batter should bat in any position except eighth or ninth.
  • Place the seventh best batter either first or sixth through ninth.
  • The eighth and ninth best batters should bat in the last three positions.
  • Either the second or third best batter must be placed immediately before or immediately after the best batter.
  • The worst batter must be placed four through six positions after the best batter.
  • The second worst batter must be placed four through seven positions after the best batter.

This leads to less than 1000 batting orders to consider and yields a lineup that on average scores within 1/3 of a run of optimal for a 162 game season.

How Important Is the Batting Order?

To compare lineups, it is not enough to know the average number of runs a team should score. A team that scores 5 runs on average cannot expect to win 100% of its games against a team that averages 4 runs. We use the run distribution for the two lineups and, taking extra inning games into account, find the probability of a team winning a game. The best lineup against the worst - where the pitcher leads off - should usually win about 85 games and lose 77 ¡V a four game difference. The most extreme case found was 5-1/2 games.

Our model can be used to determine the influence of a trade on wins. Is it worth it to trade a home run hitter for a high average singles hitter? Trading Ichiro Suzuki of the Seattle Mariners for Sammy Sosa of the Cubs should lead to 5-1/2 more wins for Seattle and 6-1/2 more losses for the Cubs in a season.

Our method can rank Most Valuable Player candidates. It can be used to determine the single player who could give an average team the most additional wins. More information on how the method has been applied and the results is available here.

Why apply math to baseball? It demonstrates the power of math in the context of a great American game. Baseball, hotdogs and apple p are perfect together.

The writer is a professor of math sciences at the New Jersey Institute of Technology.

TCS Daily Archives