Table of Contents
More news
Borges and AI
Léon Bottou and Bernhard Schölkopf https://arxiv.org/abs/2310.01425
We started this work mid-2022. AI was already turning into a mainstream topic. Both as a scientist and a member of the society, I was troubled by the ambient confusion between the actual AI technology and the AI of our dreams or nightmares. We seem unable to grasp this technology and its impact without referring to an AI mythology that maybe starts with Homer's golden maiden and was popularized by modern science fiction.
Therefore we decided to instead interpret the advances of AI using a very different lens: the fiction of Jorge Luis Borges, whose subtly ironical stories illuminate how language works and relates to reality. This intellectual exercise turned into a very fruitful exercise, one that has reframed our outlook on AI:
- It clarifies the relation between AI and language models, or fiction machines.
- It explains how humans perceive these technologies, searching for vindications that comfort our preconceptions, vainly attempting to purify the fiction machine, or trusting this modern Pythia over our own reason.
- It also explains how fiction machines should be seen as tools to construct theories for both real and imagined worlds. The ability to create fictional stories —so-called “hallucinations”— is crucially important. For instance, to understand a factual story, say a historical battle, we must be able to imagine how different circumstance or decisions would have changed the events. This provides a new meaning to Pat Winston's claim about the centrality of story making and story telling.
- And finally, it shows the importance how understanding the world through the right story. For instance, understanding the weather patterns through the mood of the Gods only went so far. Yet it took centuries to readjust.
From Causal Graphs to Causal Invariance
Pointing out the very well written report Causality for Machine Learning recently published by Cloudera's Fast Forward Labs. Nisha Muktewar and Chris Wallace must have put a lot of work into this. This report stands out because they have a complete section about Causal Invariance and they neatly summarizes the purpose of our own Invariant Risk Minimization with beautiful experimental results.
NYC Data Science Seminar
Alex Peysakhovich and I represent Facebook on the organizing committee of the NYC Data Science Seminar Series. This rotating seminar organized by Columbia, CornellTech, Facebook, Microsoft Research NYC, and New York University has featured a number of prominent speakers.
Graph Transducer Networks explained
I was scavenging my old emails a couple weeks ago and found a copy of an early technical report that not only describes Graph Transformer Networks in a couple pages but also explains why they are defined the way they are.
The infinite MNIST dataset
Why settle for 60000 MNIST training examples when you can have one trillion? The MNIST8M dataset was generated using the elastic deformation code originally written for (Loosli, Canu, and Bottou, 2007). Unfortunately the original MNIST8M files were accidentally deleted from the NEC servers a couple weeks ago. Instead of regenerating the files, I have repackaged the generation code in a convenient form.
Explore/Exploit = Correlation/Causation!
Our paper“Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising” has appeared in JMLR. This paper takes the example of ad placement to illustrate how one can leverage causal inference to understand the behavior of complex learning systems interacting with their environment.
Nips 2013
Nips just took place near Lake Tahoe. Many people have written how things are changing in machine learning. There also were many interesting papers and invited talks. Thanks to the program chairs Max and Zoubin for producing this exciting conference program. Thanks to the workshop chairs Rich Caruana and Gunnar Rätsch for the stimulating workshops. Thanks to Terry Sejnowsky for creating NIPS, and special thanks to Mary-Ellen Perry without whom nothing would happen.
Counterfactual Reasoning and Learning Systems
The report “Counterfactual Reasoning and Learning Systems” shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.
SGD-2.0 released
Announcing version 2.0 of my Stochastic Gradient Descent package. This release provides implementations of the Stochastic Gradient Descent and Averaged Stochastic Gradient Descent algorithms for Linear SVMs and CRFs. The latter sometimes shows vastly superior performance. See the SGD package pages for details.
Natural Language Processing from Scratch
Ronan's masterpiece, "Natural Language Processing (Almost) from Scratch", has been published in JMLR. This paper describes how to use a unified neural network architecture to solve a collection of natural language processing tasks with near state-of-the-art accuracies and ridiculously fast processing speed. A couple thousand lines of C code processes english sentence at more than 10000 words per second and outputs part-of-speech tags, named entity tags, chunk boundaries, semantic role labeling tags, and, in the latest version, syntactic parse trees. Download SENNA!
Learning Semantics
Learning Semantics, <html> </html>Nips 2011 Workshop, Saturday December 17, 2011. <html> </html>Melia Sierra Nevada & Melia Sol y Nieve, Sierra Nevada, Spain.
This workshop is organized in collaboration with Antoine Bordes, Jason Weston, Ronan Collobert. This event should be very interesing: I believe that recent machine learning advances indicate new connections between machine learning and machine reasoning and lead to new opportunties for learning the semantics of the world.
From machine learning to machine reasoning
Over the last couple of years, I progressively formulated an unusual idea about the connection between machine learning and machine reasoning. I have discussed this idea with many friends and I even gave a seminar in Montreal in 2008. It is described in this technical report.
On the Vapnik-Chevonenkis-Sauer lemma
Many machine learning authors write that a certain fundamental combinatorial result was independently established by Vapnik and Chervonenkis (1971), Sauer (1972), Shelah (1972), and sometimes Perles and Shelah (reference unknown). Vapnik and Chervonenkis published a version of their results in the Proceedings of the USSR Academy of Sciences four years earlier in 1968. It also appears that Sauer and Shelah pursued this result for very different purposes.
Microsoft
Patrice Simard and I have been friends since the old AT&T Bell Labs times. He eventually convinced me to work for him at Microsoft. He told me to expect “interesting times”.
I can see several reasons for these interesting times.
- The scientific point of view. There are few places where I can find machine learning problems with similar scale, similar challenges, and similar impact. This practical experience will surely feed my future machine learning research. In fact I believe that such experiences are necessary to do research. One needs to see the world…
- The social point of view. The Internet is the largest encyclopedia of knowledge ever known to mankind, and this is great. On the other hand, everything you do on the Internet is recorded by someone somewhere. Large online services such as Google or Microsoft concentrate unprecedented amounts of such information. Our society is not ready for that. Very good things or very bad things can happen equally easily. They will affect all of us. We cannot just watch and count the points.
- The competitive point of view. Microsoft combines a difficult competitive position with considerable resources: it has both the will and the means to do new things on the scientific, engineering, economical, and social levels. How to resist that? Of course nothing is ever certain…
Cos424
Rob Schapire and David Blei gave me the opportunity to teach the cos424 course at Princeton University for the spring 2010 semester. In fact Rob is on sabbatical leave at Yahoo! and David is parenting. Running the orphan course was a useful experience. One thousand slides later, I am really eager to see the student projects…
Semantic Extraction with a Neural Network Architecture
It is the nineties again. Ronan Collobert from NEC Labs just released a noncommercial version of his neural network system for semantic extraction. Given an input sentence in plain english, Senna outputs a host of Natural Language Processing (NLP) tags: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER), and semantic role labeling (SRL). Senna does this with state-of-the-art accuracies, roughly two hundred times faster than competing approaches.
The Senna source code represents about 2000 lines of C. This is probably one thousand times smaller than your usual natural language processing program. In fact all the Senna tagging tasks are performed using the same neural network simulation code.
Download Senna here. A Senna paper has been submitted to JMLR.
SGDQN
The SGDQN paper has been published on the JMLR site. This variant of stochastic gradient got very good results during the first PASCAL Large Scale Learning Challenge. The paper gives a lot of explanation on the design of the algorithm. Source code is available from Antoine's web site.
ICML 2009
ICML 2009 took place in June. Michael Littman and I were the program co-chairs. Since we were expecting a lot of work, we tried to make it interesting by experimenting with a number of changes in the review process. Read more for a little explanation and a few conclusions…
OLaRank Implementation Released
Antoine Bordes provides an implementation of the OLaRank algorithm.
OLaRank is an online solver of the dual formulation of support vector machines for structured output spaces. The algorithm can use exact or greedy inference. Its running time scales linearly with the data size, competitive with a perceptron based on the same inference procedure. Its accuracy however is much better as it replicates the accuracy of a structured SVM. See the ECML/PKDD paper "Sequence Labelling SVMs Trained in One Pass" for details.
LaRank Implementation Released
Antoine Bordes provides an implementation of the LaRank algorithm, together with the datasets. This new implementation runs slightly faster than the code we have used for the LaRank paper. In addition there is a special version for the case of linear kernels.
NIPS 2007: Learning with Large Datasets
A page has been allocated for my segment of the NIPS 2007 Tutorials. The second part of the tutorial Learning with Large Datasets was given by Alex Gray. Alex had to replace Andrew Moore on short notice because airplane delays conspired against our initial plans. The page contains the slides and a video recording a the lecture I gave at Microsoft Research a few days after NIPS.
Blavatnik Award
During the 4th Annual Gala of the New York Academy of Sciences, I became one of the happy winners of the first Blavatnik Award for Young Scientists. The other finalists were very impressive. Choosing the winners must have been difficult. Leonard_Blavatnik told me he attended the Nobel ceremony a few years ago and thought that something similar should be done in New York for younger scientists. Apparently he plans to fund a similar award every year.
Talks online
The talks page contains pointers to my most significant lectures. Slides are available under both the PDF and DjVu formats.
Stochastic Gradient for SVM and CRF
You can now download fast stochastic gradient optimizers for linear Support Vector Machines (SVMs) and Conditional Random Fields (CRFs). Stochastic Gradient Descent has been historically associated with back-propagation algorithms in multilayer neural networks. These nonlinear nonconvex problems can be very difficult. Therefore it is useful to see how Stochastic Gradient Descent performs on such simple linear and convex problems. The benchmarks are very clear!
Large-Scale Kernel Machines
MIT Press has announced the availability of the book Large-Scale Kernel Machines, edited by Léon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston. This book expands the theme of our NIPS 2005 workshop. The book homepage contains useful information. You can even find the complete BibTex file that was used to generate the list of references.
Publication database updated
Thanks to a small lush script to parse BibTex files, all my publications are now indexed here. Most of them are available online. I still need to scan the oldest ones. My little BibTex parser now lives in the Lush CVS repository.
Old website offline
Browsing http://leon.bottou.com now redirects you to this new website. There is still much work to be done.
Web site rewrite
Started to rewrite my web site using Dokuwiki.
The old home page can be found at http://leon.bottou.com.
The list of publication is still at http://leon.bottou.com/publications.