TY - GEN
T1 - OpenWHO
T2 - 10th Conference on Machine Translation, WMT 2025
AU - Merx, Raphaël
AU - Suominen, Hanna
AU - Cohn, Trevor
AU - Vylomova, Ekaterina
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - In machine translation (MT), health is a high-stakes domain characterised by widespread deployment and domain-specific vocabulary. However, there is a lack of MT evaluation datasets for low-resource languages in this domain. To address this gap, we introduce OpenWHO, a document-level parallel corpus of 2,978 documents and 26,824 sentences from the World Health Organization's e-learning platform. Sourced from expert-authored, professionally translated materials shielded from web-crawling, OpenWHO spans a diverse range of over 20 languages, of which nine are low-resource. Leveraging this new resource, we evaluate modern large language models (LLMs) against traditional MT models. Our findings reveal that LLMs consistently outperform traditional MT models, with Gemini 2.5 Flash achieving a +4.79 ChrF point improvement over NLLB-54B on our low-resource test set. Further, we investigate how LLM context utilisation affects accuracy, finding that the benefits of document-level translation are most pronounced in specialised domains like health. We release the OpenWHO corpus to encourage further research into low-resource MT in the health domain.
AB - In machine translation (MT), health is a high-stakes domain characterised by widespread deployment and domain-specific vocabulary. However, there is a lack of MT evaluation datasets for low-resource languages in this domain. To address this gap, we introduce OpenWHO, a document-level parallel corpus of 2,978 documents and 26,824 sentences from the World Health Organization's e-learning platform. Sourced from expert-authored, professionally translated materials shielded from web-crawling, OpenWHO spans a diverse range of over 20 languages, of which nine are low-resource. Leveraging this new resource, we evaluate modern large language models (LLMs) against traditional MT models. Our findings reveal that LLMs consistently outperform traditional MT models, with Gemini 2.5 Flash achieving a +4.79 ChrF point improvement over NLLB-54B on our low-resource test set. Further, we investigate how LLM context utilisation affects accuracy, finding that the benefits of document-level translation are most pronounced in specialised domains like health. We release the OpenWHO corpus to encourage further research into low-resource MT in the health domain.
UR - https://www.scopus.com/pages/publications/105028853372
U2 - 10.18653/v1/2025.wmt-1.8
DO - 10.18653/v1/2025.wmt-1.8
M3 - Conference Paper
AN - SCOPUS:105028853372
T3 - Conference on Machine Translation - Proceedings
SP - 142
EP - 160
BT - WMT 2025 - 10th Conference on Machine Translation, Proceedings of the Conference
A2 - Haddow, Barry
A2 - Kocmi, Tom
A2 - Koehn, Philipp
A2 - Monz, Christof
PB - Association for Computational Linguistics (ACL)
Y2 - 8 November 2025 through 9 November 2025
ER -