TY - GEN
T1 - TULUN
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Merx, Raphaël
AU - Suominen, Hanna
AU - Hong, Lois
AU - Thieberger, Nick
AU - Cohn, Trevor
AU - Vylomova, Ekaterina
N1 - Publisher Copyright:
©2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for nontechnical users and small organizations. To address this gap, we propose TULUN,1 a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90–22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev.
AB - Machine translation (MT) systems that support low-resource languages often struggle on specialized domains. While researchers have proposed various techniques for domain adaptation, these approaches typically require model fine-tuning, making them impractical for nontechnical users and small organizations. To address this gap, we propose TULUN,1 a versatile solution for terminology-aware translation, combining neural MT with large language model (LLM)-based post-editing guided by existing glossaries and translation memories. Our open-source web-based platform enables users to easily create, edit, and leverage terminology resources, fostering a collaborative human-machine translation process that respects and incorporates domain expertise while increasing MT accuracy. Evaluations show effectiveness in both real-world and benchmark scenarios: on medical and disaster relief translation tasks for Tetun and Bislama, our system achieves improvements of 16.90–22.41 ChrF++ points over baseline MT systems. Across six low-resource languages on the FLORES dataset, TULUN outperforms both standalone MT and LLM approaches, achieving an average improvement of 2.8 ChrF++ points over NLLB-54B. TULUN is publicly accessible at bislama-trans.rapha.dev.
UR - https://www.scopus.com/pages/publications/105020384036
U2 - 10.18653/v1/2025.acl-demo.13
DO - 10.18653/v1/2025.acl-demo.13
M3 - Conference Paper
AN - SCOPUS:105020384036
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 129
EP - 139
BT - System Demonstrations
A2 - Mishra, Pushkar
A2 - Muresan, Smaranda
A2 - Yu, Tao
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -