TY - JOUR
T1 - Biosurveillance for invasive fungal infections via text mining
AU - Martinez, David
AU - Suominen, Hanna
AU - Ananda-Rajah, Michelle
AU - Cavedon, Lawrence
PY - 2012
Y1 - 2012
N2 - Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hos-pitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mor-tality rate. Surveillance and detection of IFDs irrespective of the stage of diag-nosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learn-ing over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on de-tecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining compo-nent over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant suc-cess. Our approach differs by using a purely statistical/machine-learning ap-proach to the language technology, and by being trained and tested on data col-lected from a number of hospitals. We collected reports for 288 IFD and 291control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control pa-tients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD-positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extract- ed high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of as-pergillus, and 1.0 and 0.51 for classifying patients as having contracted the in-fection.
AB - Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hos-pitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mor-tality rate. Surveillance and detection of IFDs irrespective of the stage of diag-nosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learn-ing over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on de-tecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining compo-nent over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant suc-cess. Our approach differs by using a purely statistical/machine-learning ap-proach to the language technology, and by being trained and tested on data col-lected from a number of hospitals. We collected reports for 288 IFD and 291control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control pa-tients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD-positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extract- ed high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of as-pergillus, and 1.0 and 0.51 for classifying patients as having contracted the in-fection.
KW - Biosurveillance
KW - Clinical reports
KW - Machine learning
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=84922022489&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84922022489
SN - 1613-0073
VL - 1178
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2012 Cross Language Evaluation Forum Conference, CLEF 2012
Y2 - 17 September 2012 through 20 September 2012
ER -