TY - GEN
T1 - Big data small data, in domain out-of domain, known word unknown word
T2 - 19th Conference on Computational Natural Language Learning, CoNLL 2015
AU - Qu, Lizhen
AU - Ferraro, Gabriela
AU - Zhou, Liyuan
AU - Hou, Weiwei
AU - Schneider, Nathan
AU - Baldwin, Timothy
N1 - Publisher Copyright:
© 2015 Association for Computational Linguistics.
PY - 2015
Y1 - 2015
N2 - Word embeddings — distributed word representations that can be learned from unlabelled data — have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four sequence labelling tasks: part-of-speech tagging, syntactic chunking, named entity recognition, and multiword expression identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over out-of-vocabulary words and also out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider.
AB - Word embeddings — distributed word representations that can be learned from unlabelled data — have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four sequence labelling tasks: part-of-speech tagging, syntactic chunking, named entity recognition, and multiword expression identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over out-of-vocabulary words and also out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider.
UR - http://www.scopus.com/inward/record.url?scp=85072761231&partnerID=8YFLogxK
U2 - 10.18653/v1/k15-1009
DO - 10.18653/v1/k15-1009
M3 - Conference contribution
T3 - CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings
SP - 83
EP - 93
BT - CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 30 July 2015 through 31 July 2015
ER -