TY - JOUR
T1 - Task 1a of the CLEF eHealth Evaluation Lab 2015
AU - Suominen, Hanna
AU - Hanlen, Leif
AU - Goeuriot, Lorraine
AU - Kelly, Liadh
AU - Jones, Gareth J.F.
PY - 2015
Y1 - 2015
N2 - Best practice for clinical handover and its documentation recommends standardized, structured, and synchronous processes with patient involvement. Cascaded speech recognition (SR) and information extraction could support their compliance and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges. First, multiple people speak clinical jargon in the presence of background noise with limited possibilities for SR personalization. Second, errors multiply in cascading and hence, SR correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015. The task released 100 synthetic handover documents for training and another 100 documents for testing in both verbal and written formats. It attracted 48 team registrations, 21 email confirmations, and four method submissions by two teams. The submissions were compared against a leading commercial SR engine and simple majority baseline. Although this engine performed significantly better than any submission [i.e., 38.5 vs. 52.8 test error percentage of the best submission with the Wilcoxon signed-rank test value of 302.5 (p < 10-12)], the releases of data, tools, and evaluations contribute to the body of knowledge on the task difficulty and method suitability.
AB - Best practice for clinical handover and its documentation recommends standardized, structured, and synchronous processes with patient involvement. Cascaded speech recognition (SR) and information extraction could support their compliance and release clinicians' time from writing documents to patient interaction and education. However, high requirements for processing correctness evoke methodological challenges. First, multiple people speak clinical jargon in the presence of background noise with limited possibilities for SR personalization. Second, errors multiply in cascading and hence, SR correctness needs to be carefully evaluated as meeting the requirements. This overview paper reports on how these issues were addressed in a shared task of the eHealth evaluation lab of the Conference and Labs of the Evaluation Forum in 2015. The task released 100 synthetic handover documents for training and another 100 documents for testing in both verbal and written formats. It attracted 48 team registrations, 21 email confirmations, and four method submissions by two teams. The submissions were compared against a leading commercial SR engine and simple majority baseline. Although this engine performed significantly better than any submission [i.e., 38.5 vs. 52.8 test error percentage of the best submission with the Wilcoxon signed-rank test value of 302.5 (p < 10-12)], the releases of data, tools, and evaluations contribute to the body of knowledge on the task difficulty and method suitability.
KW - Computer systems evaluation
KW - Data collection
KW - Information extraction
KW - Medical informatics
KW - Nursing records
KW - Patient Hand-over
KW - Patient handoff
KW - Records as topic
KW - Software design
KW - Speech recognition
KW - Test-set generation
UR - http://www.scopus.com/inward/record.url?scp=84982805922&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84982805922
SN - 1613-0073
VL - 1391
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 16th Conference and Labs of the Evaluation Forum, CLEF 2015
Y2 - 8 September 2015 through 11 September 2015
ER -