In March of 2020 as the pandemic was shutting everything down and sending everyone home I found myself with a lot more time on my hands. On a recommendation from my brother, I read Joe Peta’s “Trading Bases” his personal story of, and instruction manual for, betting on baseball. I figured I would try to implement what he described to fill up the time. With no access to a sports book (Washington State still doesn’t allow sports gambling, and all casinos were closed anyway even if there were games being played) this was mostly a theoretical exercise. I back tested the system I developed, and decide maybe there was something there, maybe there wasn’t, but the lack of access to a legal book, combined with the time and effort it took to actually come up with even a single pick cause me to drop it.
Fast forward several months, and I still had time on my hands, but my interest had shifted to machine learning. The best way for me to learn concepts is to directly apply them to problems I’m interested in, so rather than work on the class assignments I instead applied the concepts I learned to baseball betting. Using the same inputs as the Trading Bases approach, I tried several machine learning and deep learning techniques on my set of baseball data. This blog post is my attempt to explain the version of my deep learning/Trading Bases based system as best I can.
I will start with an overview of Joe Peta’s approach, and then a very quick description of my first (ie. Non-ML) attempt. I will then dive into the latest version of my system, with some background on how I got where I am. I will finish up with results and a short discussion of future approaches.
Before I start, I should state this section should probably be titled “Joe Peta’s System As I Understand It”. I make no guarantee this is the correct interpretation. I encourage you to read his book if you want to learn how he did it: [ https://bookshop.org/books/trading-bases-how-a-wall-street-trader-made-a-fortune-betting-on-baseball/9780451415172 ] (not an affiliate link).
The Trading Bases system is based on one core concept: you can predict the probability of team A beating team B based on their respective winning percentage. The meat of the system is the steps involved in coming up with the “right” winning percentage to apply to each team. I’ll go through an example using final season numbers from 2019.
The first step is to build a probability table showing all the possible outcomes of any two teams playing on any given day. To keep this concrete let’s look at the Chicago White Sox who went 72-89 (.447 winning percentage) and the Minnesota Twins who went 101-61 (.624 winning percentage). On a day when both teams are playing (NOT necessarily against each other) there are 4 possible outcomes: both win, the Twins win and the Sox lose, the Twins lose and the Sox win, both lose. Putting this into tabular format we have:
Of course when the two are playing each other, the first and last outcomes cannot occur, which leaves you with the middle two rows. Once these two are normalized to 100%, you have these expected winning percentages against each other:
So in predicting an individual game, you start with .672 and .328 respectively as the probability of a Twins or White Sox victory.
As most Sox fans probably know the actual results for their head to head matchup was not nearly so good for Chicago, with the Twins taking 13 of 16. But this is irrelevant for our example, and in the book Peta claims that when looking at major league games from 2006 to 2010 this approach lines up remarkably well with actual outcomes. My spot check of more recent seasons seems to give credence to this, though a more thorough investigation is left as an exercise for the reader.
With this as the basis for the system, the final probabilities can be computed by first accounting for each team’s starting pitcher. Peta uses the advanced metric “wins above replacement”, or WAR, for this. The key insight is that while a team may play .400 ball over the course of the season, if they have one great pitcher they may be closer to a .500 or even a .600 team on the days when he starts. To make the appropriate winning percentage adjustment we need to add the pitcher affect as if he were pitching every day. Most teams use a 5-man rotation, so the starters WAR value is multiplied by 5 and added to the team’s overall win total.
For our example let’s assume a particularly beneficial matchup for the White Sox. Let’s say the Sox starter is Lucas Giolito who racked up an impressive 5.8 WAR in 2019, while the Twins are sending Martin Perez to the mound who totaled 0.0 WAR. 0 makes it easy for adjusting the Twins. For the White Sox we tax Giolito’s 5.8 WAR and multiply be 5 to get 29 more wins added to the Sox total, giving them a winning percentage of .623. We now have a new set of probabilities:
Which when normalized becomes:
If this seems like an unreasonable shift, keep in mind how great a year Lucas Giolito had, and how great a team would be if it had 5 starters who all had years like that.
Peta also makes WAR based adjustments for bullpen strength and lineup changes due to injury or resting a player. I will not go through these because 1) the process is more or less the same as for staring pitchers, and 2) I am not currently using either of these in my model.
Finally we adjust the winning percentage to account for homefield advantage. Homefield advantage has remained surprisingly consistent in major league baseball for over 100 years (https://www.theringer.com/2020/9/30/21494861/playoffs-home-field-advantage-wild-card). Each probability is then adjusted to account for the historic .540 winning percentage home teams have. Let’s say the Twins are in Chicago. Adjusting for this 8% difference in home vs. away, we now have:
All that’s left now is to convert those adjusted probabilities into odds, and compare to what’s available at your favorite sportsbooks. If you can find value, you place the bet!
My System V1
I’m only going to touch on this quickly, mostly for the sake of completeness, but also because some of the decisions I made on what statistics to use, and the sources for those stats, ended up being the same decisions I made for the deep learning version I’ll describe in the next section.
In early 2020 after reading Trading Bases I implemented the strategy described above using Excel. I took advantage of historic odds from https://sportsbookreviewsonline.com (every few months I try to go to his site and click on a few adds as payback for providing this odds archive) and tried to back test the strategy for 2019. One of the pitfalls of back testing of course is accidentally “peering into the future”. In order to try to replicate what it would have been like using this in real time I tried as much as possible to use only information available at the time. This raises the question: how do you know how teams and players will do before they play the games? My solution then (and now) is to leverage PECOTA projections for players and teams from https://www.baseballprospectus.com. PECOTA standings are available every day for free on the website, and PECOTA player projections are available as part of their annual membership plan.
My back testing with flat bets ($100 for underdogs, odds for favorites, no adjustment for certainty) ended up being…inconclusive. One interesting observation was that it seemed to do better earlier in the season. Of course this is all within two standard deviations of performance, so I can’t say much. The primary reason I abandoned it was because it just took way too long to run the analysis for any individual game, and there didn’t appear to be enough there to get me to automate it at the time.
My System V2
Around the beginning of 2021 I started becoming more interested in machine learning, wholly independent of any wagering related applications. To become better familiar with these various techniques I needed a practical application I could focus my efforts on. So I revived my baseball betting system, conceptually at least, and begin trying various methods. I tried several methods (and am still trying, going back to re-apply these and other methods as I become more familiar with both the tech and the nuances of money line betting) such as logistic regression, random forest, SVM, but ended up focusing on non-convolutional artificial neural networks.
Conceptualizing The System
The first step to creating V2 was deciding what the inputs to the ANN were going to be. This is where my deep learning version and the “straight calculation” version described above share a fundamental concept, that being:
Team winning percentage, the day’s starting pitchers, and who has home field is enough to give us an edge
Keeping the model this simple has some obvious downsides, but my inclination has been to try to keep things as simple as possible and only add complexity when it becomes clearly necessary. Part of the magic of ANNs, in theory, is that they can figure out which inputs are important and which are not, but I felt it was more important to guard against over fitting and be sure my ANN would only be able to take into account the variables I conceptually thought were important.
Two of the three inputs are straightforward but the third, starting pitcher, required some thought and a little bit of experimentation. Joe Peta uses wins above replacement (WAR) for his metric of player “goodness”, and initially I did as well. The problem with WAR is that it is a cumulative statistic, and I’m trying to predict the outcome of a single game. A star who is injured for half the season but returns fully healthy or a rookie phenom who is called up mid-season could have the same WAR by the end of the year as some serviceable fourth starter who plays the whole year. Just using WAR would give no advantage to a team sending the star or phenom to the mound versus the team sending their pedestrian fourth rotation man. This seemed obviously wrong.
A better statistic, and indeed one designed to capture something much closer to what I’m interested in (how much will this pitcher add to expected win probability today) is DRA. A full description of DRA can be found at Baseball Prospectus [ https://www.baseballprospectus.com/news/article/48108/dra-and-dra-a-starter-guide/ ] but in short DRA is designed to give an accurate description of a pitcher’s effect on run prevention, accounting for contextual factors like opposing hitter, park, pitch framing effects, etc., some of which should already he captured in team winning percentage. For my model I actually use DRA- which is simply DRA indexed to the league. This normalization should better allow application across years as league DRA fluctuates.
With my fundamentals in place, it was time to train my ANN!
Training the ANN
I trained the neural net on 2019 data. My data source for historic odds was the SportsbookReviewOnline.com odds archive [ https://sportsbookreviewsonline.com/scoresoddsarchives/scoresoddsarchives.htm ] and BaseballProspectus.com for DRA- for pitchers. Team winning percentages are available in many places, but my go-to source for stats like this is baseball-reference.com. These data sources are processed into a single spreadsheet using Excel and VBA. Each game is represented by two lines in the spread sheet (the home and away), so in order to avoid the obvious dependence of one data point with its corresponding data point from the same game, I take a random line from each game to build my training dataset. A sample from the spreadsheet can be found at [ http://dklo.com/deeplearning_mlb_moneyline/ann_input_example.csv ]
I’ve mostly moved over to Python for both ML and data processing, but because of the V1 work that was already done it was easier to just use Excel to build this training set, and then import into Python. Everything described from this point forward is work done using Python and freely available Python libraries.
In training the ANN I chose to use season totals for teams and pitchers, not any sort of rolling average for either. Part of this was practical (the work involved in building a day-by-day database of a past MLB season would have been substantial and costly), but the decision is also based on my belief that the best assessment of a team or player’s ability is the entirety of the major league season, not a subset of games. Much work has been done to show that “streakiness” is real, and very little evidence exists to support it. Given this I felt it was more appropriate to use season totals for teams and pitchers when training the neural net. Of course this doesn’t account for injuries or other short term effects that might impact performance. I’ve accepted this, and in future work hope to directly account for such things. But it is my belief that this downside isn’t enough to justify using a set of games smaller than the whole season to judge team and player performance.
With the data processed, the building of the ANN is fairly simple. The ANN I am currently using takes as its input a feature scaled vector of [home/away, winning percentage difference, DRA- difference]. This is fed into an ANN with 3 hidden layers, each with 10 nodes, all using the sigmoid activation function. In experimenting with various activation functions using sigmoid at all level seemed to perform better than ReLu. Binary cross entropy was used as the loss function. The code to build and train this ANN is below.
[ https://gist.github.com/daveklotz/212cef18a03a34fcfe584ed9d15ff7c2 ]
The full training, which you can play with yourself using the input sample file above, can be found at:
[ https://gist.github.com/daveklotz/a798c85edb64352d104921a1521c3238 ]
One very quick, very crude check for fitness I run on the ANN is to see what its prediction is for a situation in which the only difference is home field. As has been mentioned, the winning percentage for home teams has stayed surprisingly consistent in major league baseball around .540. If my input vector [1, 0, 0] doesn’t produce a predicted probability of about 54%, and [0, 0, 0] doesn’t produce close to 46%, I immediately throw that ANN out. This particular ANN produced 54.3% and 46.3%.
Now that the ANN has been built, the next step is to run it!
Running the System
Due to all the weirdness around the Covid shortened 2020 season I have chosen to completely disregard it. My back testing then was limited to the current season (2021) and the two seasons preceding 2019. I will start by describing my approach to creating a way to apply the ANN to today’s games, and then go through the differences when applied to back testing.
The 2021 MLB season started on April 1st. Since my system relies on team and player performance, an obvious question is where do I get this on day 1 (and 2, and 3…) of the season? The answer is, mostly, Baseball Prospectus. Every year BP releases it’s PECOTA projections for players and teams. At the beginning I use the 100% PECOTA projections for pitchers and the PECOTA standings for winning percentage. As the season progresses the published PECOTA standings are updated to account for in season results. Player projections are not, so every few weeks I create a new DRA- spreadsheet by combining the projections with current season stats, weighting by projected games and actual games played. These projections and weighted projections provide the input to the PREDICT method of the ANN.
As an aside, Baseball Prospectus says this approach to weighted averaging is exactly what NOT to do, and has provided its own approach and weighted updates [ https://www.baseballprospectus.com/news/article/67888/prospectus-feature-information-based-updates-to-projections/ ]. I have only recently become aware of this and have not had time to incorporate this information into my system. At least they do say that weighting by IPs or PAs is “directionally correct”.
Having bailed on going further with the first version of my system mainly because of the workload involved in running it, I knew it was going to be important to take as much of the manual process out as possible. So my first step after training the ANN was to create a Jupyter notebook to apply it to the day’s games with little to no input from myself.
At this point I’m still trying to leverage free as much as possible, but good, free sports data is difficult to come by. Thankfully MLB (as of this writing) still provides a free API, which can be accessed through a couple of Python libraries. I used the mlbgame library [ https://github.com/panzarino/mlbgame ] ) which provides a Python interface to the MLB API. (I am slowly transitioning over to MLB-StatsAPI [ https://github.com/toddrob99/MLB-StatsAPI ] because it appears to be better maintained.) This allowed me to get both the day’s games as well as the scheduled starting pitchers. Live money lines were acquired through the The Odds API [ https://the-odds-api.com ] which is a paid service but offers a free tier that provides up to 500 requests per month. This was more than enough for my needs.
Between these two data sources I was able to retrieve the live information I needed to run the system daily. The code to produce a daily report is shown below, but process is as follows:
- Get the days money lines from The Odds API
- Get the day’s games and probably starters from MLB using mlbgame
- Build a Pandas dataframe with a row for each team playing by:
- Looking up the team and opponents winning percentage from a hand built spreadsheet
- Looking up the team and opponent’s probable starters’ DRA- from either PECOTA projections or a merged spreadsheet of projections and current season DRA-
- Feed the winning percentage difference and DRA- difference, along with home/away, into the ANN to get the predicted win probability
- Comparing the predicted probability with the implied probability of the money line so decide whether to bet or not
- Do a little bit more to make visual inspection easier, like columns to explicitly show when the difference in (d) is greater than 0%, 5%, how much the stake is, etc.
This dataframe is exported out to an Excel spreadsheet for easier reading
One thing to note here is each team-opponent combo is treated as an individual event. In theory the ANN could predict that a bet should be placed on BOTH teams, which is obviously wrong (the system is not designed to find arbitrage or middling opportunities and any that did show up would be purely by accident), but in practice this has rarely happened.
The main loop to generate this is show in the following Gist:
[ https://gist.github.com/daveklotz/83f0da28919fcd383d81707255544225 ]
The notebook to back test previous seasons is fundamentally the same, with some additional calculations to make seeing results easier. Flat betting results are tallied for several different confidence levels, as well as Kelly Criterion stakes, assuming a $1000 bank roll. $1000 is somewhat arbitrary, but was chosen as a number that would be close to the amount wagered per day for the flat betting strategies.
I also compute something I’ve called, for lack of a better phrase, “market support”. This is set to true for any situation where the difference between the implied probability of the closing line is smaller than the difference between the prediction and the opening line, false otherwise. In short, if the line moves towards my prediction, rather than away, I consider that to be a signal of support.
Evaluating the System
In back testing the 2021 season I have been able to leverage the live information that I’ve collected for daily running. As mentioned previously I periodically update team winning percentages using the latest Baseball Prospectus PECOTA standings, and pitcher DRA- by a weighted merge of PECOTA predictions and the latest stats from the season. For previous seasons I don’t have access to such fine-grained updates. I have chosen to compromise by breaking the season up into three parts, using the following information for each:
Opening day through May 31st: Use only PECOTA projections
June 1st to July 31st: Use an average of PECOTA standings and final standings, and a weighted average of PECOTA and final season stats
August 1st to the end of the regular season: Final standings and stats
This of course falls into the trap of “peering into the future”, at least after the first two months. It also fails to take advantage of timely data I would have access to in real time. I don’t have a good solution to this problem right now.
The back testing results are provided here mostly for completion, and because it would raise an immediate question if they were not. The results themselves are mixed, though for all three seasons tested (2021 up to July 27th, 2018 and 2017) the model produces a positive ROI both with flat bets and using Kelly as outlined above. My “market support” metric is less rosy, showing less than 50% two out of three years, though it is over 50% for 2021.
The tables below show results from 2021 up to July 27th, 2018 and 2017.The first four lines show success for bets with increasing differences in the predicted probability vs. the implied probability from closing lines. The fifth line shows results for Kelly as described above. My “market support” statistic is in its own table below the main results for each year.
[ https://gist.github.com/daveklotz/604742e1f400afb50140ea9cd5518382 ]
Summary and Future Work
While the success of my system might be mixed with respect to actually winning bets, it has been successful as a tool to help me become more familiar with both deep learning and sports betting concepts. Having said that I am still just scratching the surface of both subject matters, and there are many concepts I still plan to apply to the domain of baseball money line betting.
One of the areas lacking in my system is different data. At a minimum accounting for single game lineup changes would seem to be imperative. Holding a highly productive player out of a lineup may not affect the overall winning percentage, but clear has an impact on the day’s probabilities. Going even further, adapting the system to account for full lineups, rather than using team winning percentage as a stand-in, seems worthy of investigation. These, as well as other approaches, should allow me to fill up the 2021 off season and hopefully be ready to make 2022 a profitable wagering year.
Contact David at:
Emails: klotz (at) pobox (dot) com