TY - JOUR
T1 - Combining Hidden Markov Models and Latent Semantic Analysis for Topic Segmentation and Labeling: Method and Clinical Application
AU - Ginter, Filip
AU - Suominen, Hanna
AU - Pyysalo, Sampo
AU - Salakoski, Tapio
PY - 2009/12
Y1 - 2009/12
N2 - Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.
AB - Motivation: Topic segmentation and labeling systems enable fine-grained information search. However, previously proposed methods require annotated data to adapt to different information needs and have limited applicability to texts with short segment length. Methods: We introduce an unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments. Results: The method is evaluated on intensive care nursing narratives and motivated by information needs in this domain. The method is shown to considerably outperform a keyword-based heuristic baseline and to achieve a level of performance comparable to that of a related supervised method trained on 3600 manually annotated words.
KW - Computerized patient records
KW - Hidden Markov models
KW - Information retrieval
KW - Latent semantic analysis
KW - Nursing
KW - Topic classification
KW - Topic segmentation
UR - http://www.scopus.com/inward/record.url?scp=71849108574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/record.url?scp=71849089213&partnerID=8YFLogxK
UR - https://www.utupub.fi/handle/10024/42045
U2 - 10.1016/j.ijmedinf.2009.02.003
DO - 10.1016/j.ijmedinf.2009.02.003
M3 - Conference article
SN - 1386-5056
VL - 78
SP - e1-e6
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
IS - 12
T2 - 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008
Y2 - 1 September 2008 through 3 September 2008
ER -