1DJVU(1)                          DjVuLibre-3.5                         DJVU(1)
2
3
4

NAME

6       DjVu - DjVu and DjVuLibre.
7
8

INTRODUCTION

10       Although  the Internet has given us a worldwide infrastructure on which
11       to build the universal library, much of the world  knowledge,  history,
12       and  literature  is  still  trapped  on  paper  in the basements of the
13       world's traditional libraries. Many libraries and content owners are in
14       the  process  of digitizing their collections.  While many such efforts
15       involve the painstaking process of converting paper documents  to  com‐
16       puter-friendly  form, such as SGML based formats, the high cost of such
17       conversions limits their extent. Scanning documents,  and  distributing
18       the  resulting  images electronically is not only considerably cheaper,
19       but also more faithful to the original document  because  it  preserves
20       its visual aspect.
21
22       Despite  the quickly improving speed of network connections and comput‐
23       ers, the number of scanned document images accessible on the Web  today
24       is relatively small. There are several reasons for this.
25
26       The  first reason is the relatively high cost of scanning anything else
27       but unbound sheets in black and white. This  problem  is  slowly  going
28       away with the appearance of fast and low-cost color scanners with sheet
29       feeders.
30
31       The second reason is that long-established image compression  standards
32       and  file formats have proved inadequate for distributing scanned docu‐
33       ments at high resolution, particularly color documents.  Not  only  are
34       the file sizes and download times impractical, the decoding and render‐
35       ing times are also prohibitive.  A typical  magazine  page  scanned  in
36       color  at 100 dpi in JPEG would typically occupy 100 KB to 200 KB , but
37       the text would be hardly readable: insufficient for screen viewing  and
38       totally  unacceptable for printing. The same page at 300 dpi would have
39       sufficient quality for viewing and printing, but the file size would be
40       300  KB  to  1000  KB  at best, which is impractical for remote access.
41       Another major problem is that a fully decoded 300 dpi color images of a
42       letter-size  page occupies 24 MB of memory and easily causes disk swap‐
43       ping.
44
45       The third reason is that digital documents are more than just a collec‐
46       tion  of  individual  page  images. Pages in a scanned documents have a
47       natural serial order. Special provision must be  made  to  ensure  that
48       flipping pages be instantaneous and effortless so as to maintain a good
49       user experience. Even more important, most  existing  document  formats
50       force  users  to download the entire document first before displaying a
51       chosen page.  However, users often want to jump to individual pages  of
52       the  document  without  waiting  for  the  entire document to download.
53       Efficient browsing requires efficient random page access, fast  sequen‐
54       tial  page  flipping,  and quick rendering. This can be achieved with a
55       combination  of  advanced  compression,   pre-fetching,   pre-decoding,
56       caching, and progressive rendering. DjVu decomposes each page into mul‐
57       tiple  components  (text,  backgrounds,  images,  libraries  of  common
58       shapes...)   that  may  be  shared  by  several pages and downloaded on
59       demand.  All these requirements call for a very sophisticated but  par‐
60       simonious control mechanism to handle on-demand downloading, pre-fetch‐
61       ing, decoding, caching, and progressive rendering of the  page  images.
62       What  is being considered here is not just a document image compression
63       technique, but a whole platform for document delivery.
64
65       DjVu is an image compression technique, a document format, and a  soft‐
66       ware  platform  for  delivering documents images over the Internet that
67       fulfills the above requirements.
68
69

DJVU IMAGE COMPRESSION

71       The DjVu image compression is based on three technologies:
72
73   DjVuPhoto
74       DjVuPhoto, also known as IW44, is a wavelet-based continuous-tone image
75       compression  technique with progressive decoding/rendering.  It is best
76       used for encoding photographic images in colors or in shades  of  gray.
77       Images are typically half the size as JPEG for the same distortion.
78
79   DjVuBitonal
80       DjVuBitonal,  also  known  as  JB2, is a bitonal image compression that
81       takes advantage of repetitions of nearly identical shapes on  the  page
82       (such  as  characters) to efficiently compress text images.  It is best
83       used to compress black and white images representing  text  and  simple
84       drawings.  A typical 300 dpi page in DjVuBitonal occupies 5 to 25 KB (3
85       to 8 times better than TIFF-G4 or PDF ).
86
87   DjVuDocument
88       DjVuDocument is a compression technique specifically designed for color
89       digital  documents  images containing both pictures and text, such as a
90       page of a magazine.  DjVuDocument  represents  images  into  separately
91       compressed  layers.   The  foreground  layer is usually compressed with
92       DjVu Bitonal and contains the text and drawings.  The background  layer
93       is  usually  compressed with DjVuPhoto and contains the background tex‐
94       ture and the pictures at lower resolution.
95
96

DJVU DOCUMENT DELIVERY PLATFORM

98       The DjVu technology is designed from the ground up to support the effi‐
99       cient  delivery  of  digital  documents over the Internet.  It provides
100       various ways to deal with multi-page documents,  and  various  ways  to
101       enrich the content with hyper-links, meta-data, searchable text, etc.
102
103
104   MIME types
105       The  DjVu  format has an official MIME type of image/vnd.djvu, which is
106       the preferred content-type to be given by http servers for DjVu  files.
107       Unofficial  mime  types used historically are image/x.djvu and image/x-
108       djvu, which may still be encountered.  Ideally, clients should be  con‐
109       figured  to  handle all three.  (For web server configuration help, see
110       http://www.djvuzone.org/support/tutorial/chapter-authoring1.html.)
111
112
113   Bundled multi-page documents
114       Bundled multi-page DjVu document uses a single file  to  represent  the
115       entire  document.   This  single file contains all the pages as well as
116       ancillary information (e.g. the page directory, data shared by  several
117       pages,  thumbnails,  etc.).   Using a single file format is very conve‐
118       nient for storing documents or for sending email attachments.
119
120       When you type the URL of a multi-page document, the DjVu browser plugin
121       starts  downloading the whole file, but displays the first page as soon
122       as it is available.  You can immediately navigate to other pages  using
123       the  DjVu  toolbar.   Suppose  however that the document is stored on a
124       remote web server.  You can easily access the first page and  see  that
125       this  is  not the document you wanted.  Although you will never display
126       the other pages the browser is transferring data for these pages and is
127       wasting the bandwidth of your server (and the bandwidth of the Internet
128       too).  You could also see the summary of the document on the first page
129       and  jump to page 100.  But page 100 cannot be displayed until data for
130       pages 1 to 99 has been received.  You may have to wait for  the  trans‐
131       mission of unnecessary page data.  This second problem (the unnecessary
132       wait) can be solved using the ``byte serving'' options of the  HTTP/1.1
133       protocol.  This option has to be supported by the web server, the prox‐
134       ies, the caches and the browser.  Byte serving however does  not  solve
135       the first problem (the waste of bandwidth).
136
137   Indirect multi-page documents
138       Indirect  multi-page  DjVu  documents solve both problems.  An indirect
139       multi-page DjVu document is composed of several files.  The  main  file
140       is  named  the  index file.  You can browse a document using the URL of
141       the index file, just like you do with a  bundled  multi-page  document.
142       The  index file however is very small.  It simply contains the document
143       directory and the URLs of secondary files  containing  the  page  data.
144       When  you  browse  an  indirect  multi-page  document, the browser only
145       accesses data for the pages you are viewing.  This can  be  done  at  a
146       reasonable  speed  because  the  browser maintains a cache of pages and
147       sometimes pre-fetches a few pages ahead  of  the  current  page.   This
148       model  uses  the  web serving bandwidth much more effectively.  It also
149       eliminates unnecessary delays when jumping ahead to pages located  any‐
150       where in a long document.
151
152   Annotations
153       Every  DjVu image optionally includes so-called annotation chunks.  The
154       annotation chunk is often used to define hyper-links to other  document
155       pages  or  to  arbitrary web pages.  Annotation chunks can also be used
156       for other purposes such as setting the initial viewing mode of a  page,
157       defining  highlighted  zones,  or storing arbitrary meta-data about the
158       page or the document.
159
160   Hidden text
161       Every DjVu image optionally includes a hidden text layer  that  associ‐
162       ated  graphical  features with the corresponding text.  The hidden text
163       layer is usually generated by running an Optical Character  Recognition
164       software.   This  textual  information provides for indexing DjVu docu‐
165       ments and copying/pasting text from DjVu page images.
166
167   Thumbnails
168       DjVu documents sometimes contain pre-computed page thumbnails.
169
170   Outline
171       DjVu documents sometimes contain a navigation chunk containing an  out‐
172       line,  that  is,  a hierarchical table of contents with pointers to the
173       corresponding document pages.
174
175

DJVUZONE AND DJVULIBRE

177       The DjVu technology was initially created by a few researchers in  AT&T
178       Labs     between     1995     and    1999.     Lizardtech,    Inc.    (
179       http://www.lizardtech.com ) then obtained  a  commercial  license  from
180       AT&T  and  continued the development.  They have now a variety of solu‐
181       tions for producing and distributing documents using the DjVu  technol‐
182       ogy.
183
184       The DjVuZone web site ( http://www.djvuzone.org ) is managed by the few
185       AT&T Labs researchers who created the  DjVu  technology  in  the  first
186       place.   We  promote  the  DjVu  technology by providing an independent
187       source of information about DjVu.
188
189       Understanding how little room there is for a proprietary document  for‐
190       mat,  Lizardtech released the DjVu Reference Library under the GNU Pub‐
191       lic License in December 2000.  This library entirely defines  the  com‐
192       pression format and the elementary codecs.  Six month later, Lizardtech
193       released an updated DjVu Reference Library as well as the  source  code
194       of the Unix viewer.
195
196       These  two  releases  form the basis of our initial DjVuLibre software.
197       We modified the build system to comply with  the  expectations  of  the
198       open  source  community.  Various bugs and portability issues have been
199       fixed.  We also tried to make it simpler to use and install, while pre‐
200       serving the essential structure of the Lizardtech releases.
201
202       The DjVuLibre software contains the following components:
203
204       bzz(1) A general purpose compression command line program.  Many inter‐
205              nal DjVu data structures are compressed using this technique.
206
207       c44(1) A DjVuPhoto command line encoder. This state-of-the-art  wavelet
208              compressor produces DjVuPhoto images from PPM or JPEG images.
209
210       cjb2(1)
211              A  DjVuBitonal  command line encoder. This soft-pattern-matching
212              compressor produces DjVuBitonal images from PBM images.  It  can
213              encode  images without loss, or introduce small changes in order
214              to improve the compression ratio.  The lossless encoding mode is
215              competitive with that of the Lizardtech commercial encoders.
216
217       cpaldjvu(1)
218              A  DjVuDocument command line encoder for images with few colors.
219              This encoder is well suited to compressing images with  a  small
220              number  of  distinct  colors  (e.g. screen-shots).  The dominant
221              color is encoded by the background layer.  The other colors  are
222              encoded by the foreground layer.
223
224       csepdjvu(1)
225              A  DjVuDocument command line encoder for separated images.  This
226              encoder takes a file  containing  pre-segmented  foreground  and
227              background images and produces a DjVuDocument image.
228
229       ddjvu(1)
230              A command line decoder for DjVu images.  This program produces a
231              PNM image representing any segment of any page of a  DjVu  docu‐
232              ment at any resolution.
233
234       djview(1)
235              A stand-alone viewer for DjVu images.  This sophisticated viewer
236              displays DjVu documents.  It implements document  navigation  as
237              well as fast zooming and panning.
238
239       nsdejavu(1)
240              A web browser plugin for viewing DjVu images.  This small plugin
241              allows for viewing DjVu documents from web browsers.  It  inter‐
242              nally uses djview to perform the actual work.
243
244       djvups(1)
245              A  command  line  tool  for converting DjVu documents into Post‐
246              Script .
247
248       djvm(1)
249              A command line tool for  manipulating  bundled  multi-page  DjVu
250              documents.   This  program  is  often used to collect individual
251              pages and produce a bundled document.
252
253       djvmcvt(1)
254              A command line tool for converting bundled documents to indirect
255              documents and conversely.
256
257       djvused(1)
258              A  powerful  command line tool for manipulating multi-page docu‐
259              ments, creating or editing annotation chunks, creating or  edit‐
260              ing  hidden  text  layers,  pre-computing  thumbnail images, and
261              more...
262
263       djvutxt(1)
264              A command line tool to extract the hidden text from  DjVu  docu‐
265              ments.
266
267       djvudump(1)
268              A  command  line  tool  for inspecting DjVu files and displaying
269              their internal structure.
270
271       djvuextract(1)
272              A command line tool for dis-assembling DjVu image files.
273
274       djvumake(1)
275              A command line tool for assembling DjVu image files.
276
277       djvuserve(1)
278              A CGI program for generating indirect multi-page DjVu  documents
279              on the fly.
280
281       djvutoxml(1), djvuxmlparser(1)
282              Command line tools to edit DjVu metadata as XML files.
283
284

DJVU ENCODERS AND ANY2DJVU

286       DjVuLibre comes with a variety of specialized encoders, c44(1) for pho‐
287       tographic images, cjb2(1)  for  bitonal  images,  and  cpaldjvu(1)  for
288       images  with few distinct colors.  Although these encoders perform well
289       in their specialized domain, they cannot handle complex tasks involving
290       segmentation and multipage encoding.
291
292       The Lizardtech commercial products (see http://www.lizardtech.com/solu
293       tions/document) can perform these complex encoding tasks
294
295
296       Another  solution  is   provided   by   the   compression   server   at
297       (http://any2djvu.djvuzone.org).   This machine uses pre-lizardtech pro‐
298       totype encoders from AT&T Labs and performs almost as well as the  com‐
299       mercial Lizardtech encoders.  Please note that the Any2DjVu compression
300       server comes with no guarantee, that nothing is  done  to  ensure  that
301       your  documents  will  remain  confidential, and that there is only one
302       computer working for the whole planet.
303
304

CREDITS

306       Numerous people have contributed to the DjVu  source  code  during  the
307       last  five years.  Please submit a sourceforge bug report to update the
308       following list.
309
310          Yoshua Bengio, Léon Bottou, Chakradhar Chandaluri, Regis M. Chaplin,
311          Ming  Chen,  Parag  Deshmukh, Royce Edwards, Andrew Erofeev, Praveen
312          Guduru, Patrick Haffner, Paul G. Howard, Orlando Keise, Yann Le Cun,
313          Artem  Mikheev,  Florin  Nicsa, Joseph M. Orost, Steven Pigeon, Bill
314          Riemers, Patrice Simard, Jeffery Triggs, Luc  Vincent,  Pascal  Vin‐
315          cent.
316
317
318
319DjVuLibre-3.5                     10/11/2001                           DJVU(1)
Impressum