Squaring Things Up with R2: What It Is and What It Can (and Cannot) Tell You

J Anal Toxicol. 2022 Apr 21;46(4):443-448. doi: 10.1093/jat/bkab036.

Abstract

The coefficient of correlation (r) and the coefficient of determination (R2 or r2) have long been used in analytical chemistry, bioanalysis and forensic toxicology as figures demonstrating linearity of the calibration data in method validation. We clarify here what these two figures are and why they should not be used for this purpose in the context of model fitting for prediction. R2 evaluates whether the data are better explained by the regression model used than by no model at all (i.e., a flat line of slope = 0 and intercept $\bar y$), and to what degree. Hopefully, in the context of calibration curves, the fact that a linear regression better explains the data than no model at all should not be a point of contention. Upon closer examination, a series of restrictions appear in the interpretation of these coefficients. They cannot indicate whether the dataset at hand is linear or not, because they assume that the regression model used is an adequate model for the data. For the same reason, they cannot disprove the existence of another functional relationship in the data. By definition, they are influenced by the variability of the data. The slope of the calibration curve will also change their value. Finally, when heteroscedastic data are analyzed, the coefficients will be influenced by calibration levels spacing within the dynamic range, unless a weighted version of the equations is used. With these considerations in mind, we suggest to stop using r and R2 as figures of merit to demonstrate linearity of calibration curves in method validations. Of course, this does not preclude their use in other contexts. Alternative paths for evaluation of linearity and calibration model validity are summarily presented.

MeSH terms

  • Calibration*
  • Linear Models