Abstract
We suggest a new class of metrics for measuring distances between documents, generalizing the well-known resemblance distance. We then show how to combine distance measures with statistical smoothing to develop techniques for imputing missing features of documents. We treat in detail the case where these features are continuous variates, but we note that our methods can be adapted to settings where the features are ordered or unordered categorical variates (e.g., the names of potential authors of the documents). The results of applying our ideas to the dating of medieval manuscripts are briefly summarized.
Original language | English |
---|---|
Pages (from-to) | 255-262 |
Number of pages | 8 |
Journal | Journal of Computational and Graphical Statistics |
Volume | 14 |
Issue number | 2 |
DOIs | |
Publication status | Published - Jun 2005 |