Analytics, Basic Stats, and Recent Historic Context: The ABC’s of Penn State Football — Forecasting the Nittany Lions’ 2021 Season

 Sponsor: Hey, it’s us! For The Blogy! Join our 2021 FTB Donors Club – the best way for you to show your support and keep this train rolling – and receive an exclusive FTB zipper bottle Koozie as a gift! Sign up HERE.

*Please remember to click the ‘Share My Address With For The Blogy’ box when checking out so we know where to mail your gift! 

Introduction

Do you hate math? Do you also hate crude attempts to over-quantify something with tremendous uncertainty? Well, you’re in the wrong place then. But, since you’re already here, might as well stick around and see what my computer says is going to happen in each Penn State game during the 2021 season.

Last year, I developed a system (WAR) that allows us to estimate a team’s offensive, defensive, and overall efficiencies based on previous game-by-game performances. For the most part, it’s a useful retrospective tool that can also effectively predict the outcome of future games. In some unpublished work, I used this method to predict the entire 2019-2020 bowl calendar. Out of 39 games, my system correctly picked 31 straight-up winners (79%), 25 winners against-the-spread (64%), and 20 O/U (51%). Yes, it was a really small sample size, but that trial run showed there’s some potential in this system as a prognosticating tool.

Now, the obvious benefit of using this system during bowl season is that you have at least 12 games worth of recent data. In this exercise – forecasting each 2021 Penn State game – all our data is pretty dusty. Except for injuries, opt-outs, or suspensions, the bowl team is THE team. In August, the team is some amount of last year’s squad and coaching staff blended with new recruits, new transfers, and new schemes. Throw in COVID-related schedule and roster variances throughout college football from 2020, and this exercise gets even more difficult.

So, before we start, an ask of you dear reader – don’t nitpick this. Teams go up and down every year and some meet expectations, some exceed expectations, and some are Michigan (i.e. constantly underperforming). The dog days of August aren’t the best time to try and predict what will happen in November or December, but we’re doing it anyway. So don’t bust open your piggybanks just yet. This is nothing more than some fodder as we sweat through summer and await the fall.

The Methodology

The foundation for WAR is Offensive Efficiency (OE) and Defensive Efficiency (DE). The equation for each metric is pretty much the same and involves a combination of two calculations – scoring efficiency (Points / min-TOP) and ball control – [(Yards/Possession*m-TOP)^0.5]. For OE, we’re measuring points scored or yards gained and for DE it’s points allowed and yards allowed.

Using stats, we can easily calculate OE or DE during a season and after a season…but how do we estimate how a team will do going into the season? Well, this is where it gets a bit complicated. In the past, I would’ve simply weight-averaged the last 3 seasons, giving the most weight to the previous season and then discounting the next season by 50% and the third season by 75% and made an adjustment for returning production (using Bill Connelly’s estimate) and an overall talent change (using the 247 talent composite versus the previous seasons).

But the weighting doesn’t feel right this year. Should 2020 be given equal weight as a “normal” season? I don’t think so. Look at the cliff that PSU fell off last year, for example, versus the previous four years. Then, there were teams like Coastal Carolina and Liberty that finished in the Top 20. So they boomed in 2020. Therefore, going into 2021, I am weighting 2020 equally with 2019 (0.5 units each) and giving 2018 half of that weight (0.25 units). Also, as Bill C. pointed out recently, returning production is kind of bunk this year so I’m leaving that out of the calculation. And there’s no updated 2021 talent composite so that too is out. These changes allow us to calculate 2021 OE or DE as:

2021-OE = (0.5*2020-OE + 0.5*2019-OE + 0.25*2018-OE) / 1.25

To not eliminate the variability year-over-year though, I am also including the standard deviations of 2018-2020 in the score calculations. This allows us to account for when teams have large swings in efficiencies and also stabilizes teams who have more consistency (Iowa defense as an example).

Now we’re ready to set up a game prediction – Team 1 and Team 2. To do this, we first estimate the OE range for each team and generate an expected minimum and maximum. This is done by the following equation:

Team 1-OEminimum = 10* (Team 1 2021-OE + Team 2 2021-DE) / 2 – [2 * (Team 1 2021-OESTDEV + Team 2 2021-DESTDEV) / 2]

Team 1-OEmaximum = 10* (Team 1 2021-OE + Team 2 2021-DE) / 2 + [2 * (Team 1 2021-OESTDEV + Team 2 2021-DESTDEV) / 2]

To summarize, it is the average expectation of Team 1’s offense against Team 2’s defense plus and minus two times the standard deviation of expectation of variability of each performance. The min and max are multiplied by 10 to allow for an improved distribution in the simulation in the next step. This process is repeated for Team 2.

The simulation is run via a Monte Carlo simulation of 5,000 trials of each game. Monte Carlo is not the most sophisticated way to run this prognostication, but it is a typical/basic method used for game simulation. Basically, a random number within the Team’s OE range from above is generated and converted to a score. This is done using the following equation:

SCORE = 1.9438 * (Team-OE / 10) ^ 0.7916

This relationship of OE to points has an R-squared value of 0.97 across the previous 12 seasons of FBS football, indicating that there’s a strong correlation.

Home teams are given a 3-point home-field advantage added to each simulation. Others have done research to produce customized home-field advantages –for instance, the home-field advantage of a Beaver Stadium Whiteout and a Noon Pitt game in front of friends and family at Heinz Field shouldn’t both be 3 points. But, for 2021, considering the uncertainty of COVID restrictions still looming, we’ll stick with the 3-point standard.

Following the 5,000 simulations, we charted the average score for each team, the probability of each team winning the game, and by how many points.

Got it? If you don’t, just lie and say you do so we can move to the fun stuff.

2021 Predictions

We broke our 2021 Penn State football forecast into two tables – 1st and 2nd halves of the season.

Games 1-6

 

Games 7-12

Each column lists both PSU and the opponent’s win probability by 10-plus points, 5-9 points, or 1-5 points, and then each team’s overall win probability. Note that the overall win percentages don’t typically add up to 100% because ties are possible in these simulations. Finally, we listed the average predicted score and predicted outcome.

So, what do we see? Using only the average scores, Penn State is favored in 9 games . The Nittany Lions are very close underdogs against Wisconsin in the season opener and slightly deeper dogs against Iowa. Not surprisingly, OSU is a heavy favorite vs. Penn State.

Assessing Penn State’s record on a win-% basis, the case is a bit worse with an expectation of 7.6 wins and 3.9 losses. Still though, this includes 75% chances to beat Ball State and Auburn and 65% to topple Michigan. And on the flip side, the Lions are predicted to beat Wisconsin 1 out of 3 times, Iowa 1 of 5 times, and OSU 1 out of 10 times.

Again, all these numbers are based purely on the calculations above and have no SGF (Strong Gut Feeling) added. My gut tells me Penn State beats the Badgers in Week 1 and starts the season 5-0 before a tough test in Iowa City – a tough test that James Franklin, Sean Clifford and Noah Cain all passed in 2019.

The season is almost here and it will be a pivotal one for Penn State. Will they come back to 2016-2019 levels or was 2020 a harbinger of a new program trajectory? I firmly believe it’s the former and that this team will go at least 10-2. What say you?