IndyRealist

12-06-2017, 11:37 PM

Real plus-minus (RPM) was developed by Jeremias Engelmann, formerly of the Phoenix Suns, in consultation with Steve Ilardi, University of Kansas psychology professor and former NBA consultant.

It follows the development of adjusted plus-minus (APM) by several analysts and regularized adjusted plus-minus (RAPM) by Joe Sill.

RPM reflects enhancements to RAPM by Engelmann, among them the use of Bayesian priors, aging curves, score of the game and extensive out-of-sample testing to improve RPM's predictive accuracy.

This guide is going to be broken into two posts. The first will be some brief history on the statistic, and what it’s trying to accomplish. The second will be issues with RPM, and how we use it wrong. This is not meant to be exhaustive, or detail how to actually calculate RPM, but rather to give everyone a good idea on what the stat is and what it can (and can’t) do. Almost all the math will be skipped, because no one really cares about that. There will be a link at the bottom if anyone really wants it.

1. What is RPM

RPM stands for Real Plus-Minus, which is really just marketing talk. It is a derivative of xRAPM, or Expected Regularized Adjusted Plus-Minus, which was created by Engelmann. xRAPM is, as you would expect, itself a derivative of RAPM, which is a derivative of APM, which was an attempt to fix the OG plus-minus.

PM in it’s most basic form is how much the score changes when a player is on the floor. This can be looked at for individuals, combinations of multiple players, or entire lineups. The issue with PM at an individual level is noise. There are 9 other players on the floor at any given time (4 teammates and 5 opponents) all contributing varying amounts to the final score. How can you attribute that to one player? It becomes even more difficult when two or more players play so much of their minutes together that PM cannot tell them apart. This is called collinearity. The example I give is Kobe Bryant and Derek Fisher. Whenever Fisher was on the floor, Bryant almost always was as well. Essentially, PM said they were equally impactful, despite how absurd that statement is.

APM uses linear regression to model the minutes the players are apart to attempt to isolate one player from the rest. Unfortunately, this does not resolve collinearity, since the minutes Fisher and Bryant were apart were so few that you ended up with extremely small samples wildly swinging the data.

RAPM attempts to solve this problem by applying ridge regression, which pulls the data toward a prior, or predetermined expectation. RAPM uses a player’s previous seasons as the prior. If a player rates higher or lower than previous seasons would indicate, RAPM assumes that is a fluke that will average out long term, and pulls the rating closer to previous seasons. The unfortunately has the effect of throwing out a lot of relevant data.

Consider a rookie player. They haven’t played an 82 game season before, haven’t spent a lot of time in the weight room, are playing against grown men much bigger, stronger, and faster than any competition they’ve had before, and have to learn a whole new system. Rookies are usually bad. Now as a sophomore, that player is often substantially better, but RAPM tries to ignore those improvements, because it is skeptical of fluctuations that differ from the prior.

xRAPM (we’re almost there) is a more aggressive version of RAPM. RAPM uses a prior rating of 0, where xRAPM dynamically sets the rating. Essentially, given the Bryant/Fisher example, if there is a large positive effect, xRAPM tends towards crediting the player we think is better (Bryant).

So how is RPM different? “RPM reflects enhancements to RAPM by Engelmann, among them the use of Bayesian priors, aging curves, score of the game and extensive out-of-sample testing to improve RPM's predictive accuracy.” What does that mean? I don’t know, because ESPN has not published how to calculate RPM. Presumably, aging curves account for things like the rookie/sophomore example above. Bayesian priors refers to the regression. The rest is gibberish. If someone with more statistical background than me wants to take crack at it, links are below:

https://cornerthreehoops.wordpress.com/2014/04/17/explaining-espns-real-plus-minus/

https://deadspin.com/just-what-the-hell-is-real-plus-minus-espns-new-nba-s-1560361469

https://www.poundingtherock.com/2014/4/8/5594238/problem-with-real-plus-minus

The math:

https://squared2020.com/2017/09/18/deep-dive-on-regularized-adjusted-plus-minus-i-introductory-example/

It follows the development of adjusted plus-minus (APM) by several analysts and regularized adjusted plus-minus (RAPM) by Joe Sill.

RPM reflects enhancements to RAPM by Engelmann, among them the use of Bayesian priors, aging curves, score of the game and extensive out-of-sample testing to improve RPM's predictive accuracy.

This guide is going to be broken into two posts. The first will be some brief history on the statistic, and what it’s trying to accomplish. The second will be issues with RPM, and how we use it wrong. This is not meant to be exhaustive, or detail how to actually calculate RPM, but rather to give everyone a good idea on what the stat is and what it can (and can’t) do. Almost all the math will be skipped, because no one really cares about that. There will be a link at the bottom if anyone really wants it.

1. What is RPM

RPM stands for Real Plus-Minus, which is really just marketing talk. It is a derivative of xRAPM, or Expected Regularized Adjusted Plus-Minus, which was created by Engelmann. xRAPM is, as you would expect, itself a derivative of RAPM, which is a derivative of APM, which was an attempt to fix the OG plus-minus.

PM in it’s most basic form is how much the score changes when a player is on the floor. This can be looked at for individuals, combinations of multiple players, or entire lineups. The issue with PM at an individual level is noise. There are 9 other players on the floor at any given time (4 teammates and 5 opponents) all contributing varying amounts to the final score. How can you attribute that to one player? It becomes even more difficult when two or more players play so much of their minutes together that PM cannot tell them apart. This is called collinearity. The example I give is Kobe Bryant and Derek Fisher. Whenever Fisher was on the floor, Bryant almost always was as well. Essentially, PM said they were equally impactful, despite how absurd that statement is.

APM uses linear regression to model the minutes the players are apart to attempt to isolate one player from the rest. Unfortunately, this does not resolve collinearity, since the minutes Fisher and Bryant were apart were so few that you ended up with extremely small samples wildly swinging the data.

RAPM attempts to solve this problem by applying ridge regression, which pulls the data toward a prior, or predetermined expectation. RAPM uses a player’s previous seasons as the prior. If a player rates higher or lower than previous seasons would indicate, RAPM assumes that is a fluke that will average out long term, and pulls the rating closer to previous seasons. The unfortunately has the effect of throwing out a lot of relevant data.

Consider a rookie player. They haven’t played an 82 game season before, haven’t spent a lot of time in the weight room, are playing against grown men much bigger, stronger, and faster than any competition they’ve had before, and have to learn a whole new system. Rookies are usually bad. Now as a sophomore, that player is often substantially better, but RAPM tries to ignore those improvements, because it is skeptical of fluctuations that differ from the prior.

xRAPM (we’re almost there) is a more aggressive version of RAPM. RAPM uses a prior rating of 0, where xRAPM dynamically sets the rating. Essentially, given the Bryant/Fisher example, if there is a large positive effect, xRAPM tends towards crediting the player we think is better (Bryant).

So how is RPM different? “RPM reflects enhancements to RAPM by Engelmann, among them the use of Bayesian priors, aging curves, score of the game and extensive out-of-sample testing to improve RPM's predictive accuracy.” What does that mean? I don’t know, because ESPN has not published how to calculate RPM. Presumably, aging curves account for things like the rookie/sophomore example above. Bayesian priors refers to the regression. The rest is gibberish. If someone with more statistical background than me wants to take crack at it, links are below:

https://cornerthreehoops.wordpress.com/2014/04/17/explaining-espns-real-plus-minus/

https://deadspin.com/just-what-the-hell-is-real-plus-minus-espns-new-nba-s-1560361469

https://www.poundingtherock.com/2014/4/8/5594238/problem-with-real-plus-minus

The math:

https://squared2020.com/2017/09/18/deep-dive-on-regularized-adjusted-plus-minus-i-introductory-example/