Communications of the ACM, June 2018, Vol. 61 No. 6, Pages 13-14
News: “Deep Learning Hunts for Signals Among the Noise”
By Chris Edwards
Over the past decade, advances in deep learning have transformed the fortunes of the artificial intelligence (AI) community. The neural network approach that researchers had largely written off by the end of the 1990s now seems likely to become the most widespread technology in machine learning. However, protagonists find it difficult to explain why deep learning often works well, but is prone to seemingly bizarre failures.
The secret to deep learning’s success in avoiding the traps of poor local minima may lie in a decision taken primarily to reduce computation time. After each pass through the training set, the backpropagation algorithm that tunes the weights used by each neuron for the next test should analyze all of the data. Instead, stochastic gradient descent (SGD) uses a much smaller random sample that is far easier to compute. The simplification causes the process to follow a more random path towards the global minimum than full gradient descent. A result of this seems to be that SGD can often skip over poor local minima.
“We are looking for a minimum that is most tolerant to perturbation in parameters or inputs,” says Poggio. “I don’t know if SGD is the best we can do now, but I find almost magical that it finds these degenerate solutions that work.”