The full content of Annals is available to subscribers

Subscribe/Learn More  >
Original Research |

Comparison of Natural Language Processing Biosurveillance Methods for Identifying Influenza From Encounter Notes

Peter L. Elkin, MD; David A. Froehling, MD; Dietlind L. Wahner-Roedler, MD; Steven H. Brown, MD, MS; and Kent R. Bailey, PhD
[+] Article, Author, and Disclosure Information

From Mount Sinai School of Medicine, New York, New York; Mayo Clinic, Rochester, Minnesota; and Veterans Health Administration and Vanderbilt University, Nashville, Tennessee.

Disclaimer: All authors have had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. The contributors of this report have disclosed that they have no financial interest, relationship, affiliation, or other association with any organization that might represent a conflict of interest. In addition, this report does not contain any discussion of unlabeled use of commercial products or products for investigational use.

Acknowledgment: The authors thank Inna Gurewitz, MPH, for her assistance in preparing this manuscript.

Grant Support: By the CDC (grants PH00022 and HK00014) and a research contract from the Veterans Administration (contract V249P-0525; Biosurveillance SDR Project 330).

Potential Conflicts of Interest: Disclosures can be viewed at www.acponline.org/authors/icmje/ConflictOfInterestForms.do?msNum=M11-0732.

Reproducible Research Statement:Study protocol: Available from Dr. Elkin (e-mail, ontolimatics@gmail.com). Statistics code and data set: Not available.

Corresponding Author: Peter L. Elkin, MD, 212 East 95th Street, Suite 3B, New York, NY 10128.

Current Author Addresses: Dr. Elkin: Mount Sinai School of Medicine, Center for Biomedical Informatics, 212 East 95th Street, Suite 3B, New York, NY 10128.

Drs. Froehling, Wahner-Roedler, and Bailey: Mayo Clinic, 200 First Street, Rochester, MN 55905.

Dr. Brown: 2100 West End Avenue, Suite 840, Nashville, TN 37203.

Author Contributions: Conception and design: P.L. Elkin, D.L. Wahner-Roedler, K.R. Bailey.

Analysis and interpretation of the data: P.L. Elkin, D.A. Froehling, D.L. Wahner-Roedler, S.H. Brown, K.R. Bailey.

Drafting of the article: P.L. Elkin, K.R. Bailey.

Critical revision for important intellectual content: P.L. Elkin, D.A. Froehling, K.R. Bailey, S.H. Brown.

Final approval of the article: P.L. Elkin, D.A. Froehling, D.L. Wahner-Roedler, S.H. Brown, K.R. Bailey.

Provision of study materials or patients: D.L. Wahner-Roedler.

Statistical expertise: K.R. Bailey.

Obtaining of funding: P.L. Elkin.

Administrative, technical, or logistic support: P.L. Elkin, D.L. Wahner-Roedler.

Collection and assembly of data: P.L. Elkin, D.L. Wahner-Roedler.

Ann Intern Med. 2012;156(1_Part_1):11-18. doi:10.7326/0003-4819-156-1-201201030-00003
Text Size: A A A

Background: An effective national biosurveillance system expedites outbreak recognition and facilitates response coordination at the federal, state, and local levels. The BioSense system, used at the Centers for Disease Control and Prevention, incorporates chief complaints but not data from the whole encounter note into its surveillance algorithms.

Objective: To evaluate whether biosurveillance by using data from the whole encounter note is superior to that using data from the chief complaint field alone.

Design: 6-year retrospective case–control cohort study.

Setting: Mayo Clinic, Rochester, Minnesota.

Participants: 17 243 persons tested for influenza A or B virus between 1 January 2000 and 31 December 2006.

Measurements: The accuracy of a model based on signs and symptoms to predict influenza virus infection in patients with upper respiratory tract symptoms, and the ability of a natural language processing technique to identify definitional clinical features from free-text encounter notes.

Results: Surveillance based on the whole encounter note was superior to the chief complaint field alone. For the case definition used by surveillance of the whole encounter note, the normalized partial area under the receiver-operating characteristic curve (specificity, 0.1 to 0.4) for surveillance using the whole encounter note was 92.9% versus 70.3% for surveillance with the chief complaint field (difference, 22.6%; P < 0.001). Comparison of the 2 models at the fixed specificity of 0.4 resulted in sensitivities of 89.0% and 74.4%, respectively (P < 0.001). The relative risk for missing a true case of influenza was 2.3 by using the chief complaint field model.

Limitations: Participants were seen at 1 tertiary referral center. The cost of comprehensive biosurveillance monitoring was not studied.

Conclusion: A biosurveillance model for influenza using the whole encounter note is more accurate than a model that uses only the chief complaint field. Because case-defining signs and symptoms of influenza are commonly available in health records, the investigators believe that the national strategy for biosurveillance should be changed to incorporate data from the whole health record.

Primary Funding Source: Centers for Disease Control and Prevention.


Grahic Jump Location
Figure 1.
Study flow diagram.

PCR = polymerase chain reaction.

Grahic Jump Location
Grahic Jump Location
Figure 2.
Fully encoded encounter note using SNOMED CT.

Blue entities are positive assertions, red are negative, and green are uncertain. SNOMED CT = Systematized Nomenclature of Medicine–Clinical Terms.

Grahic Jump Location
Grahic Jump Location
Figure 3.
Receiver-operating characteristic curve comparing surveillance by using the whole encounter note with that of the chief complaint field in the high sensitivity range, with bootstrap analysis.
Grahic Jump Location




Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).


Submit a Comment/Letter
Response to the Editorial "Fortune Favors a Prepared Health Care System"
Posted on January 30, 2012
Peter L.Elkin, MD, MACP, FACMI, Steven H. Brown, MD, MS, FACMI
Conflict of Interest: None Declared

We would like to thank Drs. Palmore and Henderson for their thoughtful editorial (1). We agree that the prepared healthcare system includes the ability to quickly analyze clinical data and to respond rapidly to public health emergencies (2). Advances in Health Informatics over the last 20 years have given us tools that, if applied, can help us prepare for and face future healthcare crisis (3) We believe that evaluations of Informatics interventions, like other clinical or public health interventions, should be a prerequisite to changing clinical or public health practice. We agree with Drs. Palmore and Henderson that additional evaluations are necessary and thank them for recognizing our contribution.

Drs. Palmore and Henderson raise the important issue that Healthcare costs are on the rise and we need more affordable care. We could not agree more. Healthcare reform has put into place new payment strategies that realign incentives toward high quality care. These include Accountable Care Organizations where payers and healthcare providers share in the savings that stem from lower cost and higher quality care. Many strategies used to systematize clinical practice while improving the quality and safety of care require clinical data monitoring . As with biosurveillance, relevant clinical data is often in free text. The same technology that allows us to monitor our population for emerging infectious diseases and acts of bioterrorism can support fully automated electronic quality and safety monitoring (collectively referred to as the field of eQuality) (4-6).

Health Informatics systems that can monitor clinical care are an integral and essential part of any plan to provide better healthcare value while improving healthcare quality and safety(7). Standards organizations such as HL7 and IHTSDO have made foundational contributions by developing standards for health record structure and for meaningful representation and exchange of structured data. Health Informatics approaches that allow us to turn the free text health record content into codified knowledge are another important piece of the puzzle that can bring us closer to the goal of automated biosurveillance, quality and safety monitoring. Our study, has contributed to achieving this goal by demonstrating the effective use of EHR data for secondary purposes such as biosurveillance. Although cost issues need to be discussed and an optimal strategy devised, we should not delay in our efforts to deploy and test strategies that have the potential to bring us closer to these important national goals. We encourage others to extend our results toward building a safer and more effective systematized healthcare system for the United States of America.

Peter L. Elkin, MD, MACP, FACMI

Steven H. Brown, MD, MS, FACMI


1. Palmore TN, Henderson DK. Fortune favors a prepared health care system. Ann Intern Med. Jan 3;156(1 Pt 1):54-5.

2. Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med. Jan 3;156(1 Pt 1):11-8.

3. Elkin PL, Brown SH, Husser CS, Bauer BA, Wahner-Roedler D, Rosenbloom ST, et al. Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin Proc. 2006 Jun;81(6):741-8.

4. Brown SH, Elkin PL, Rosenbloom ST, Fielstein E, Speroff T. eQuality for all: Extending automated quality measurement of free text clinical narratives. AMIA Annu Symp Proc. 2008:71-5.

5. Brown SH, Speroff T, Fielstein EM, Bauer BA, Wahner-Roedler DL, Greevy R, et al. eQuality: electronic quality assessment from narrative clinical reports. Mayo Clin Proc. 2006 Nov;81(11):1472-81.

6. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. Aug 24;306(8):848-55.

7. Bates DW, Gawande AA. Improving safety with information technology. N Engl J Med. 2003 Jun 19;348(25):2526-34.

Conflict of Interest:

None declared

Submit a Comment/Letter

Summary for Patients

Clinical Slide Sets

Terms of Use

The In the Clinic® slide sets are owned and copyrighted by the American College of Physicians (ACP). All text, graphics, trademarks, and other intellectual property incorporated into the slide sets remain the sole and exclusive property of the ACP. The slide sets may be used only by the person who downloads or purchases them and only for the purpose of presenting them during not-for-profit educational activities. Users may incorporate the entire slide set or selected individual slides into their own teaching presentations but may not alter the content of the slides in any way or remove the ACP copyright notice. Users may make print copies for use as hand-outs for the audience the user is personally addressing but may not otherwise reproduce or distribute the slides by any means or media, including but not limited to sending them as e-mail attachments, posting them on Internet or Intranet sites, publishing them in meeting proceedings, or making them available for sale or distribution in any unauthorized form, without the express written permission of the ACP. Unauthorized use of the In the Clinic slide sets will constitute copyright infringement.


Buy Now for $32.00

to gain full access to the content and tools.

Want to Subscribe?

Learn more about subscription options

Related Articles
Related Point of Care
Topic Collections
PubMed Articles
Forgot your password?
Enter your username and email address. We'll send you a reminder to the email address on record.