(Almost) no label no cry

Giorgio Patrini, Richard Nock, Paul Rivera, Tiberio Caetano

    Research output: Contribution to journalConference articlepeer-review

    80 Citations (Scopus)

    Abstract

    In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is minimally sufficient for the minimization of many proper scoring losses with linear (or kernelized) classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Then, we present an iterative learning algorithm that uses this as initialization. We ground this algorithm in Rademacher-style generalization bounds that fit the LLP setting, introducing a generalization of Rademacher complexity and a Label Proportion Complexity measure. This latter algorithm optimizes tractable bounds for the corresponding bag-empirical risk. Experiments are provided on fourteen domains, whose size ranges up to ≈300K observations. They display that our algorithms are scalable and tend to consistently outperform the state of the art in LLP. Moreover, in many cases, our algorithms compete with or are just percents of AUC away from the Oracle that learns knowing all labels. On the largest domains, half a dozen proportions can suffice, i.e. roughly 40K times less than the total number of labels.

    Original languageEnglish
    Pages (from-to)190-198
    Number of pages9
    JournalAdvances in Neural Information Processing Systems
    Volume1
    Issue numberJanuary
    Publication statusPublished - 2014
    Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
    Duration: 8 Dec 201413 Dec 2014

    Fingerprint

    Dive into the research topics of '(Almost) no label no cry'. Together they form a unique fingerprint.

    Cite this