TY - JOUR
T1 - Crowd-sourced Text Analysis
T2 - Reproducible and Agile Production of Political Data
AU - Benoit, Kenneth
AU - Conway, Drew
AU - Lauderdale, Benjamin E.
AU - Laver, Michael
AU - Mikhaylov, Slava
N1 - Publisher Copyright:
© 2016 American Political Science Association.
PY - 2016/5/1
Y1 - 2016/5/1
N2 - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
AB - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
UR - http://www.scopus.com/inward/record.url?scp=84982918683&partnerID=8YFLogxK
U2 - 10.1017/S0003055416000058
DO - 10.1017/S0003055416000058
M3 - Article
SN - 0003-0554
VL - 110
SP - 278
EP - 295
JO - American Political Science Review
JF - American Political Science Review
IS - 2
ER -