TY - GEN
T1 - Making deep neural networks robust to label noise
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
AU - Patrini, Giorgio
AU - Rozza, Alessandro
AU - Menon, Aditya Krishna
AU - Nock, Richard
AU - Qu, Lizhen
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures - stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers - demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.
AB - We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures - stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers - demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.
UR - http://www.scopus.com/inward/record.url?scp=85042632149&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.240
DO - 10.1109/CVPR.2017.240
M3 - Conference contribution
T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
SP - 2233
EP - 2241
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 July 2017 through 26 July 2017
ER -