Biosurveillance for invasive fungal infections via text mining

David Martinez*, Hanna Suominen, Michelle Ananda-Rajah, Lawrence Cavedon

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hos-pitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mor-tality rate. Surveillance and detection of IFDs irrespective of the stage of diag-nosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learn-ing over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on de-tecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining compo-nent over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant suc-cess. Our approach differs by using a purely statistical/machine-learning ap-proach to the language technology, and by being trained and tested on data col-lected from a number of hospitals. We collected reports for 288 IFD and 291control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control pa-tients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD-positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extract- ed high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of as-pergillus, and 1.0 and 0.51 for classifying patients as having contracted the in-fection.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1178
Publication statusPublished - 2012
Externally publishedYes
Event2012 Cross Language Evaluation Forum Conference, CLEF 2012 - Rome, Italy
Duration: 17 Sept 201220 Sept 2012

Fingerprint

Dive into the research topics of 'Biosurveillance for invasive fungal infections via text mining'. Together they form a unique fingerprint.

Cite this