During the first years of my thesis, my main thema
was the construction of speech recognition systems using neural networks.
Kevin Lang and Geoff Hinton had published a tech report describing Time-Delay Neural Networks (TDNN).
Alex Waibel and his team then demonstrated their efficiency at discriminating
the japanese phonems /b/, /d/, /g/. But their approach was very costly. Training took weeks
on their Alliant super-computer.
Using Stochastic Gradient Descent, I proposed a new and computationally efficient variant of Time-Delay Neural Networks. I was able to run speaker-independent word recognition systems on a regular workstation instead of a super-computer. This was later extended to continuous speech recognition system by combining a time-delay neural network and a Viterbi decoder. The combination was trained globally using a discriminant algorithm.
This work led to a long collaboration with Yann LeCun
on convolutional networks applied to a broad variety of problems in image recognition
and signal processing, using increasingly sophisticated
Structured Learning techniques. See also Yann's pages about
convolutional networks.