TY - JOUR
T1 - quanteda: An R package for the quantitative analysis of textual data
AU - Benoit, Kenneth
AU - Watanabe, Kohei
AU - Wang, Haiyan
AU - Nulty, Paul
AU - Obeng, Adam
AU - Muller, Stefan
AU - Matsuo, Akitaka
PY - 2018
Y1 - 2018
N2 - quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data.
AB - quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data.
U2 - 10.21105/joss.00774
DO - 10.21105/joss.00774
M3 - Article
VL - 3
SP - 1
EP - 4
JO - The Journal of Open Source Software
JF - The Journal of Open Source Software
IS - 30
ER -