====== DjVu: Scanned Documents on the Web. ======
[[http://djvu.org|{{:research:djvuorg.png?300 }}]]
This page describes the design of DjVu,
a document compression and imaging technology
that allows the efficient online distribution of
high resolution scanned documents.
More information can be found on [[http://www.djvu.org|DjVu.org]].
You can also see the page describing [[:projects:djvulibre|DjVuLibre]],
a free implementation of the DjVu system.
Finally, you can install the [[http://djvu.org/resources|DjVu browser plugin]],
and view the DjVu slides [[http://leon.bottou.org/slides/djvu/djvuslides.djvu|(djvu)]][[http://leon.bottou.org/slides/djvu/djvuslides.pdf|(pdf)]].
===== Overview =====
Despite the exponential growth of the internet,
most of the human knowledge is preserved on paper
in the form of books, magazines, journals, ...
Scanning these documents offers a cost-effective
way to put them online.
Unfortunately high-resolution color scanned images are too large
to be practical.
The main DjVu innovation is a document compression technique
that reduce these high resolution image to a size comparable
to that of a typical web page.
For instance, DjVu needs 40KB to 60KB to represent a typical magazine
page scanned in color with a resolution of 300 dpi.
Such large compression ratios are possible because DjVu
understands many aspects of document images.
* The //segmentation// step separates a //foreground image// from a //background image//.
* The //foreground image// contains the text and the line art. High compression ratios are achieved by collecting the repeated characters into a //shape dictionary// and simply coding the position of their occurences.
* The //background image// contains the paper texture and the photographic images. High compression ratios are achieved with simple wavelets because this part of the image can be rendered with a lesser resolution.
DjVu documents used to be viewed using a sophisticated [[http://www.djvuzone.org/download|browser plugin]]. Unfortunately, modern browsers no longer support such plugins, and you have to resort to standalone programs, such as [[https://djvu.sourceforge.net/djview4.html|DjView]], to view djvu documents.
The DjVu viewer efficiently represents high resolution images
with //limited memory// and implements very efficient //zooming and panning//.
The viewer automatically download the next pages in
the background in order to facilitate reading.
DjVu documents can be enriched with //annotations// and //hyperlinks//.
They can also contain a //searchable text version// of the document.
===== Main Publications =====
Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, Yoshua Bengio and Yann Le Cun: **High Quality Document Image Compression with DjVu**, //Journal of Electronic Imaging//, 7(3):410-425, 1998.
[[:papers:bottou-98|more...]]
Léon Bottou and Steven Pigeon: **Lossy Compression of Partially Masked Still Images**, //Proceedings of IEEE Data Compression Conference//, Snowbird, UT, April 1998.
[[:papers:bottou-pigeon-98|more...]]
Léon Bottou, Paul G. Howard and Yoshua Bengio: **The Z-Coder Adaptive Binary Coder**, //Proceedings IEEE Data Compression Conference 1998//, IEEE, Snowbird, April 1998.
[[:papers:bottou-howard-bengio-98|more...]]
Léon Bottou, Patrick Haffner and Yann Le Cun: **Conversion of Digital Documents to Multilayer Raster Formats**, //Proceedings of the Sixth International Conference on Document Analysis and Recognition//, 444-448, IEEE, Seattle, September 2001.
[[:papers:bottou-2001|more...]]
Yann Le Cun, Léon Bottou, Andrei Erofeev, Patrick Haffner and Bill W. Riemers: **DjVu document browsing with on-demand loading and rendering of image components**, //Internet Imaging//, San Jose, January 2001.
[[:papers:lecun-2001|more...]]
Patrick Haffner, Léon Bottou , Yann Le Cun and Luc Vincent: **A General Segmentation Scheme for DjVu Document Compression**, //Proceedings of the International Symposium on Mathematical Morphology (ISMM'02)//, CSIRO publications, Sydney, Australia, April 2002.
[[:papers:haffner-2002|more...]]