TY - JOUR
T1 - Towards advanced collocation error correction in Spanish learner corpora
AU - Ferraro, Gabriela
AU - Nazar, Rogelio
AU - Alonso Ramos, Margarita
AU - Wanner, Leo
PY - 2014/3
Y1 - 2014/3
N2 - Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.
AB - Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.
KW - CALL
KW - Collocation
KW - Collocation error
KW - Collocation error correction
KW - Collocation error detection
KW - Miscollocation
UR - http://www.scopus.com/inward/record.url?scp=84897029145&partnerID=8YFLogxK
U2 - 10.1007/s10579-013-9242-3
DO - 10.1007/s10579-013-9242-3
M3 - Article
SN - 1574-020X
VL - 48
SP - 45
EP - 64
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
IS - 1
ER -