TY - GEN

T1 - Fast on-line statistical learning on a GPGPU

AU - Xiao, Fang Zhou

AU - McCreath, Eric

AU - Webers, Christfried

PY - 2011

Y1 - 2011

N2 - On-line Machine Learning using Stochastic Gradient Descent is an inherently sequential computation. This makes it difficult to improve performance by simply employing parallel architectures. Langford et al. made a modification to the standard stochastic gradient descent approach which opens up the possibility of parallel computation. They also proved that there is no significant loss in accuracy in their approach. They did empirically demonstrate the performance gain in speed for the case of a pipelined architecture with a few processing units. In this paper we report on applying the Langford et al. approach on a General Purpose Graphics Processing Unit (GPGPU) with a large number of processing units. We accelerate the learning speed by approximately 4.5 times compared to a standard single threaded approach with comparable accuracy. We also evaluate the GPU performance for the sequential variant of the algorithm, which has not previously been reported. Finally, we investigate how changes in the number of threads, number of blocks, and amount of delay, effects the overall performance and accuracy.

AB - On-line Machine Learning using Stochastic Gradient Descent is an inherently sequential computation. This makes it difficult to improve performance by simply employing parallel architectures. Langford et al. made a modification to the standard stochastic gradient descent approach which opens up the possibility of parallel computation. They also proved that there is no significant loss in accuracy in their approach. They did empirically demonstrate the performance gain in speed for the case of a pipelined architecture with a few processing units. In this paper we report on applying the Langford et al. approach on a General Purpose Graphics Processing Unit (GPGPU) with a large number of processing units. We accelerate the learning speed by approximately 4.5 times compared to a standard single threaded approach with comparable accuracy. We also evaluate the GPU performance for the sequential variant of the algorithm, which has not previously been reported. Finally, we investigate how changes in the number of threads, number of blocks, and amount of delay, effects the overall performance and accuracy.

KW - Asynchronous optimisation

KW - GPGPU

KW - On-line learning

KW - Statistical machine learning

UR - http://www.scopus.com/inward/record.url?scp=84869074615&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781920682989

T3 - Conferences in Research and Practice in Information Technology Series

SP - 35

EP - 46

BT - Parallel and Distributed Computing 2011 - Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing, AusPDC 2011

T2 - 9th Australasian Symposium on Parallel and Distributed Computing, AusPDC 2011

Y2 - 17 January 2011 through 20 January 2011

ER -