TY - GEN
T1 - Fast on-line statistical learning on a GPGPU
AU - Xiao, Fang Zhou
AU - McCreath, Eric
AU - Webers, Christfried
PY - 2011
Y1 - 2011
N2 - On-line Machine Learning using Stochastic Gradient Descent is an inherently sequential computation. This makes it difficult to improve performance by simply employing parallel architectures. Langford et al. made a modification to the standard stochastic gradient descent approach which opens up the possibility of parallel computation. They also proved that there is no significant loss in accuracy in their approach. They did empirically demonstrate the performance gain in speed for the case of a pipelined architecture with a few processing units. In this paper we report on applying the Langford et al. approach on a General Purpose Graphics Processing Unit (GPGPU) with a large number of processing units. We accelerate the learning speed by approximately 4.5 times compared to a standard single threaded approach with comparable accuracy. We also evaluate the GPU performance for the sequential variant of the algorithm, which has not previously been reported. Finally, we investigate how changes in the number of threads, number of blocks, and amount of delay, effects the overall performance and accuracy.
AB - On-line Machine Learning using Stochastic Gradient Descent is an inherently sequential computation. This makes it difficult to improve performance by simply employing parallel architectures. Langford et al. made a modification to the standard stochastic gradient descent approach which opens up the possibility of parallel computation. They also proved that there is no significant loss in accuracy in their approach. They did empirically demonstrate the performance gain in speed for the case of a pipelined architecture with a few processing units. In this paper we report on applying the Langford et al. approach on a General Purpose Graphics Processing Unit (GPGPU) with a large number of processing units. We accelerate the learning speed by approximately 4.5 times compared to a standard single threaded approach with comparable accuracy. We also evaluate the GPU performance for the sequential variant of the algorithm, which has not previously been reported. Finally, we investigate how changes in the number of threads, number of blocks, and amount of delay, effects the overall performance and accuracy.
KW - Asynchronous optimisation
KW - GPGPU
KW - On-line learning
KW - Statistical machine learning
UR - http://www.scopus.com/inward/record.url?scp=84869074615&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781920682989
T3 - Conferences in Research and Practice in Information Technology Series
SP - 35
EP - 46
BT - Parallel and Distributed Computing 2011 - Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing, AusPDC 2011
T2 - 9th Australasian Symposium on Parallel and Distributed Computing, AusPDC 2011
Y2 - 17 January 2011 through 20 January 2011
ER -