From stochastic mixability to fast rates

Nishant A. Mehta, Robert C. Williamson

    Research output: Contribution to journalConference articlepeer-review

    11 Citations (Scopus)

    Abstract

    Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/√ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

    Original languageEnglish
    Pages (from-to)1197-1205
    Number of pages9
    JournalAdvances in Neural Information Processing Systems
    Volume2
    Issue numberJanuary
    Publication statusPublished - 2014
    Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
    Duration: 8 Dec 201413 Dec 2014

    Fingerprint

    Dive into the research topics of 'From stochastic mixability to fast rates'. Together they form a unique fingerprint.

    Cite this