Curiously fast convergence of some stochastic gradient descent algorithms

This document reports an empirical finding about the convergence speed of stochastic gradient algorithms applied to finite training set and challenges the attendees of SLDS 2009 with their theoretical implications. The stochastic gradient optimization theory states that the residual error decreases no faster than t-1 when the examples are picked randomly with replacement. The main observation is that picking the examples with replacement (i.e. shuffling the data before each epoch and going sequentially) achieves a speed curiously close to t-2. To my knowledge (as of 2014), the convergence speed for SGD sampled without replacement has not been theoretically established. Although the open problems were not published in the SLDS 2009 conference book, this finding was mentioned and replicated in the journal version of the Pegasos paper (2011, section 7.5). This open problem was also mentioned by Ben Recht during the IPAM Workshop on Stochastic Gradient Methods (2014) in relation with the much more impressive speeds achieved by recent algorithms for the optimization of finite sums (SAG, SVRG.)

<html><font color=red></html> New (Dec 2015): <html>&nbsp;</font></html> Asu Ozdaglar gave a proof during her NIPS 2015 invited talk. (video recording ) (ArXiV)


Léon Bottou: Curiously fast convergence of some stochastic gradient descent algorithms, Unpublished open problem offered to the attendance of the SLDS 2009 conference, 2009.

slds-2009.djvu slds-2009.pdf slds-2009.ps.gz

@unpublished{bottou-slds-open-problem-2009,
  author = {Bottou, L\'{e}on},
  title = {Curiously fast convergence of some stochastic gradient descent algorithms},
  note = {Unpublished open problem offered to the attendance of the SLDS 2009 conference},
  year = {2009},
  url = {http://leon.bottou.org/papers/bottou-slds-open-problem-2009},
}