TY - GEN
T1 - Loss factorization, weakly supervised learning and label noise robustness
AU - Patrini, Giorgio
AU - Nielsen, Frank
AU - Nock, Richard
AU - Carioni, Marcello
PY - 2016
Y1 - 2016
N2 - We prove that the empirical risk of most wellknown loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any rkhs. The frrst term is a (kernel) mean operator - the focal quantity of this work - which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like sgd and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.
AB - We prove that the empirical risk of most wellknown loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any rkhs. The frrst term is a (kernel) mean operator - the focal quantity of this work - which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like sgd and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.
UR - http://www.scopus.com/inward/record.url?scp=84998567448&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 33rd International Conference on Machine Learning, ICML 2016
SP - 1102
EP - 1126
BT - 33rd International Conference on Machine Learning, ICML 2016
A2 - Balcan, Maria Florina
A2 - Weinberger, Kilian Q.
PB - International Machine Learning Society (IMLS)
T2 - 33rd International Conference on Machine Learning, ICML 2016
Y2 - 19 June 2016 through 24 June 2016
ER -