Automatic classification of documents in cold-start scenarios

Ricardo Kawase, Marco Fisichella, Bernardo Pereira Nunes, Kyung Hun Ha, Markus Bick

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on theWeb. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.

Original languageEnglish
Title of host publication3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013
Publication statusPublished - 2013
Externally publishedYes
Event3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013 - Madrid, Spain
Duration: 12 Jun 201314 Jun 2013

Publication series

NameACM International Conference Proceeding Series

Conference

Conference3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013
Country/TerritorySpain
CityMadrid
Period12/06/1314/06/13

Fingerprint

Dive into the research topics of 'Automatic classification of documents in cold-start scenarios'. Together they form a unique fingerprint.

Cite this