TY - JOUR
T1 - Task 1 of the CLEF ehealth evaluation lab 2016
T2 - 2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016
AU - Suominen, Hanna
AU - Zhou, Liyuan
AU - Goeuriot, Lorraine
AU - Kelly, Liadh
PY - 2016
Y1 - 2016
N2 - Cascaded speech recognition (SR) and information extraction (IE) could support the best practice for clinical handover and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges and hence, processing correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum (CLEF) in 2016. This IE task built on the 2015 CLEF eHealth Task on SR by using its 201 synthetic handover documents for training and validation (appr. 8; 500 + 7; 700 words) and releasing another 100 documents with over 6; 500 expert-Annotated words for testing. It attracted 25 team registrations and 3 team submissions with 2 methods each. When using the macro-Averaged F1 over the 35 form headings present in the training documents for evaluation on the test documents, all participant methods outperformed all 4 baselines, including the organizers' method (F1 = 0:25), published in 2015 in a top-Tier medical informatics journal and provided to the participants as an option to build on, a random classifier (F1 = 0:02), and majority classifiers for the two most common classes (i.e., NA to filter out text irrelevant to the form and the most common form heading, both with F1 > 0:00). The top-2 methods (F1 = 0:38 and 0:37) had statistically significantly (p > 0:05, Wilcoxon signed-rank test) better performance than the third-best method (F1 = 0:35). In comparison, the top-3 methods and the organizers' method (7th) had F1 of 0.81, 0.80, 0.81, and 0.75 in the NA class, respectively.
AB - Cascaded speech recognition (SR) and information extraction (IE) could support the best practice for clinical handover and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges and hence, processing correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum (CLEF) in 2016. This IE task built on the 2015 CLEF eHealth Task on SR by using its 201 synthetic handover documents for training and validation (appr. 8; 500 + 7; 700 words) and releasing another 100 documents with over 6; 500 expert-Annotated words for testing. It attracted 25 team registrations and 3 team submissions with 2 methods each. When using the macro-Averaged F1 over the 35 form headings present in the training documents for evaluation on the test documents, all participant methods outperformed all 4 baselines, including the organizers' method (F1 = 0:25), published in 2015 in a top-Tier medical informatics journal and provided to the participants as an option to build on, a random classifier (F1 = 0:02), and majority classifiers for the two most common classes (i.e., NA to filter out text irrelevant to the form and the most common form heading, both with F1 > 0:00). The top-2 methods (F1 = 0:38 and 0:37) had statistically significantly (p > 0:05, Wilcoxon signed-rank test) better performance than the third-best method (F1 = 0:35). In comparison, the top-3 methods and the organizers' method (7th) had F1 of 0.81, 0.80, 0.81, and 0.75 in the NA class, respectively.
KW - Computer Systems Evaluation
KW - Data Collection
KW - Information Extraction
KW - Medical Informatics
KW - Nursing Records
KW - Patient Handoff
KW - Patient Handover
KW - Records as Topic
KW - Software Design
KW - Speech Recognition
KW - Test-set Generation
KW - Text Classification
UR - http://www.scopus.com/inward/record.url?scp=84984820786&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84984820786
SN - 1613-0073
VL - 1609
SP - 1
EP - 14
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 5 September 2016 through 8 September 2016
ER -