Michael H. Kutner, PhD; J. Sunil Rao, PhD
The Editors welcome submissions for possible publication in the Letters section. Authors of letters should:
•Include no more than 300 words of text, three authors, and five references
•Type with double-spacing
•Send three copies of the letter, an authors' form signed by all authors, and a cover letter describing any conflicts of interest related to the contents of the letter.
Letters commenting on an Annals article will be considered if they are received within 6 weeks of the time the article was published. Only some of the letters received can be published. Published letters are edited and may be shortened; tables and figures are included only selectively. Authors will be notified that the letter has been received. If the letter is selected for publication, the author will be notified about 3 weeks before the publication date. Unpublished letters cannot be returned.
Annals welcomes electronically submitted letters.
Kutner M., Rao J.; Predictions of Hospital Mortality Rates. Ann Intern Med. 1997;127:846-847. doi: 10.7326/0003-4819-127-9-199711010-00017
Download citation file:
Published: Ann Intern Med. 1997;127(9):846-847.
TO THE EDITOR:
Pine and colleagues  compared hospital mortality rates calculated by using administrative data alone; administrative plus laboratory data; and administrative, laboratory, and clinical data. The authors controlled for other risk factors and adjusted for disease severity. Although the statistical methods used are interesting, we have a few concerns about the validity and generalizability of the findings.
All of the model building and validation (predictions) are done by using the same set of data. This can highly bias the predictions from the models (that is, making them overly optimistic) . The reason is that the same data are used to search for the best model through stepwise logistic regression and then predict the responses from this same model that was just built. Much work has been done to show that not only are the predictions biased (here, areas under the receiver-operating characteristic [ROC] curves) but differences between model predictions can also be biased. One way around this problem is to set aside some of the data for validation only (a test data set of, for example, 10% to 20% of the original data set) and use the remaining portion on which to build the models (training data set). One could then simply average predictions over many random splits of the data set into test and training partitions. The data set used is large enough to accommodate this. Another alternative would be to collect another data set for validation purposes only .
to gain full access to the content and tools.
Learn more about subscription options.
Register Now for a free account.
Copyright © 2016 American College of Physicians. All Rights Reserved.
Print ISSN: 0003-4819 | Online ISSN: 1539-3704
Conditions of Use
This PDF is available to Subscribers Only