Abstract
We propose and analyse a populational version of stepwise gradient descent suitable for a wide range of learning problems. The algorithm is motivated by genetic algorithms which update a population of solutions rather than just a single representative as is typical for gradient descent. This modification of traditional gradient descent (as used, for example, in the backpropogation algorithm) avoids getting trapped in local minima. We use an averaging analysis of the algorithm to relate its behaviour to an associated ordinary differential equation. We derive a result concerning how long one has to wait in order that, with a given high probability, the algorithm is within a certain neighbourhood of the global minimum. We also analyse the effect of different population sizes. An example is presented which corroborates our theory very well.
Original language | English |
---|---|
Pages (from-to) | 331-363 |
Number of pages | 33 |
Journal | Mathematics of Control, Signals, and Systems |
Volume | 10 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1997 |