At Variance with Schmidt and Hunter (1998)
Part 1: Distinguishing utility gains from validity gains
When it comes to research findings, the devil is in the details. Here, I’d like to discuss Schmidt and Hunter (1998)1, considered a classic research article in industrial-organizational (I-O) psychology, by most any standard (e.g., cited 6,816 times by Google Scholar at the time of this blogging). The authors bring together information reported across multiple meta-analyses to conclude that cognitive ability tests tend to demonstrate high levels of validity and utility when predicting employee performance. These levels are found to increase even further when cognitive ability tests are combined with other types of employment tests (e.g., work samples, tests of employee integrity, tests of conscientiousness).
The general conclusion of this article is hard to dispute—so long as the employment tests in question are conceptually and psychometrically well supported. However, I will put myself at variance with S&H in two ways. The first way is narrow and statistical; it is covered here (Part 1). The second way is much broader and will be covered in a forthcoming blog post (Part 2).
At Variance with the Variance
Appealing to fair use2, let’s take a look at the often-referred-to Table 1 (S&H, p. 266):
As my wonderful PowerPoint® flourishes indicate above, the second and third columns of numbers would be correct if these referred to utility — not validity. When estimating utility (e.g., predicting the dollar value of selecting employees), then word on the street since 1949 is that you should directly subtract the validity coefficients (correlations, multiple Rs) of each selection system from one another (Brogden, 1949, p. 183)3:
The left side above is the change in the value of a selection system M, when moving to using selection system a, from selection system b. On the right side, the yellow highlight is the difference in the systems’ respective validities (or multiple Rs). Thus, to obtain Rd(utility) the difference in utility that is due to validity, simply calculate:
But in the context of validity, some different math is needed, contrary to the footnotes in Tables 1 and 2 of S&H, both of which state:
This is incorrect. If you want to express an increase in the validity of a selection system in terms of a multiple R, one cannot simply take the difference between the multiple Rs for each system, as we (correctly) did above for utility estimation. Instead, the difference in R2s for each system (the variance components) need to be subtracted from one another. You could then simply express the difference as an R2 as well (similar to the ∆R2 in multiple linear regression)—or take the square root to report the difference as a multiple R. The need to focus on variances in validity estimation is why I am ‘at variance’ with S&H. 😎
Where Rd(validity) is the multiple R for the difference in validities for two selection systems a and b, where R2a > R2b. Use of this formula results in validity gains that are much higher than those reported by S&H (again, S&H should be reporting their results as utility gains, not validity gains).
Stay tuned for Part 2, the broader reason that I’m ‘at variance’ with S&H (insert suspenseful Halloween music here, for one because it might take at least that long for me to post again…).
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274. https://doi.org/10.1037/0033-2909.124.2.262
[Issues raised here also apply to the unpublished working paper by Schmidt (2016).]
https://www.copyright.gov/fair-use/
Brogden, H. E. (1949). When testing pays off. Personnel Psychology, 2, 171–183. https://doi.org/10.1111/j.1744-6570.1949.tb01397.x