Estimating the support of a high-dimensional distribution

Bernhard Schölkopf*, John C. Platt, John Shawe-Taylor, Alex J. Smola, Robert C. Williamson

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    4574 Citations (Scopus)

    Abstract

    Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

    Original languageEnglish
    Pages (from-to)1443-1471
    Number of pages29
    JournalNeural Computation
    Volume13
    Issue number7
    DOIs
    Publication statusPublished - Jul 2001

    Fingerprint

    Dive into the research topics of 'Estimating the support of a high-dimensional distribution'. Together they form a unique fingerprint.

    Cite this