Distance measures and smoothing methodology for imputing features of documents

Andrey Feuerverger*, Peter Hall, Gelila Tilahun, Michael Gervers

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    3 Citations (Scopus)

    Abstract

    We suggest a new class of metrics for measuring distances between documents, generalizing the well-known resemblance distance. We then show how to combine distance measures with statistical smoothing to develop techniques for imputing missing features of documents. We treat in detail the case where these features are continuous variates, but we note that our methods can be adapted to settings where the features are ordered or unordered categorical variates (e.g., the names of potential authors of the documents). The results of applying our ideas to the dating of medieval manuscripts are briefly summarized.

    Original languageEnglish
    Pages (from-to)255-262
    Number of pages8
    JournalJournal of Computational and Graphical Statistics
    Volume14
    Issue number2
    DOIs
    Publication statusPublished - Jun 2005

    Fingerprint

    Dive into the research topics of 'Distance measures and smoothing methodology for imputing features of documents'. Together they form a unique fingerprint.

    Cite this