Identifying multilingual wikipedia articles based on cross language similarity and activity

Khoi Nguyen Tran, Peter Christen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


    Wikipedia is an online free and open access encyclopedia available in many languages. Wikipedia articles across over 280 languages are written by millions of editors. However, the growth of articles and their content is slowing, especially within the largest Wikipedia language: English. The stabilization of articles presents opportunities for multilingual Wikipedia editors to apply their translation skills to add articles and content to smaller Wikipedia languages. In this poster, we propose similarity and activity measures of Wiki-pedia articles across two languages: English and German. These measures allow us to evaluate the distribution of articles based on their knowledge coverage and their activity across languages. We show the state of Wikipedia articles as of June 2012 and discuss how these measures allow us to develop recommendation and verification models for multilingual editors to enrich articles and content in Wikipedia languages with relatively smaller knowledge coverage.

    Original languageEnglish
    Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
    Number of pages4
    Publication statusPublished - 2013
    Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
    Duration: 27 Oct 20131 Nov 2013

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings


    Conference22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
    Country/TerritoryUnited States
    CitySan Francisco, CA


    Dive into the research topics of 'Identifying multilingual wikipedia articles based on cross language similarity and activity'. Together they form a unique fingerprint.

    Cite this