Physicians are often asked to make prognostic assessments but often worry that their assessments will prove inaccurate. Prognostic systems were developed to enhance the accuracy of such assessments. This paper describes an approach for evaluating prognostic systems based on the accuracy (calibration and discrimination) and generalizability (reproducibility and transportability) of the system's predictions. Reproducibility is the ability to produce accurate predictions among patients not included in the development of the system but from the same population. Transportability is the ability to produce accurate predictions among patients drawn from a different but plausibly related population. On the basis of the observation that the generalizability of a prognostic system is commonly limited to a single historical period, geographic location, methodologic approach, disease spectrum, or follow-up interval, we describe a working hierarchy of the cumulative generalizability of prognostic systems.
This approach is illustrated in a structured review of the Dukes and Jass staging systems for colon and rectal cancer and applied to a young man with colon cancer. Because it treats the development of the system as a “black box” and evaluates only the performance of the predictions, the approach can be applied to any system that generates predicted probabilities. Although the Dukes and Jass staging systems are discrete, the approach can also be applied to systems that generate continuous predictions and, with some modification, to systems that predict over multiple time periods. Like any scientific hypothesis, the generalizability of a prognostic system is established by being tested and being found accurate across increasingly diverse settings. The more numerous and diverse the settings in which the system is tested and found accurate, the more likely it will generalize to an untested setting.