quanteda: An R package for the quantitative analysis of textual data

Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Muller, Akitaka Matsuo

    Research output: Contribution to journalArticlepeer-review

    Abstract

    quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data.
    Original languageEnglish
    Pages (from-to)1-4pp
    JournalThe Journal of Open Source Software
    Volume3
    Issue number30
    DOIs
    Publication statusPublished - 2018

    Fingerprint

    Dive into the research topics of 'quanteda: An R package for the quantitative analysis of textual data'. Together they form a unique fingerprint.

    Cite this