1DJVU(1)                          DjVuLibre-3.5                         DJVU(1)
2
3
4

NAME

6       DjVu - DjVu and DjVuLibre.
7
8

INTRODUCTION

10       Although  the Internet has given us a worldwide infrastructure on which
11       to build the universal library, much of the world  knowledge,  history,
12       and  literature  is  still  trapped  on  paper  in the basements of the
13       world's traditional libraries. Many libraries and content owners are in
14       the  process  of digitizing their collections.  While many such efforts
15       involve the painstaking process of converting paper documents  to  com‐
16       puter-friendly  form, such as SGML based formats, the high cost of such
17       conversions limits their extent. Scanning documents,  and  distributing
18       the  resulting  images electronically is not only considerably cheaper,
19       but also more faithful to the original document  because  it  preserves
20       its visual aspect.
21
22       Despite  the quickly improving speed of network connections and comput‐
23       ers, the number of scanned document images accessible on the Web  today
24       is relatively small. There are several reasons for this.
25
26       The  first reason is the relatively high cost of scanning anything else
27       but unbound sheets in black and white. This  problem  is  slowly  going
28       away with the appearance of fast and low-cost color scanners with sheet
29       feeders.
30
31       The second reason is that long-established image compression  standards
32       and  file formats have proved inadequate for distributing scanned docu‐
33       ments at high resolution, particularly color documents.  Not  only  are
34       the file sizes and download times impractical, the decoding and render‐
35       ing times are also prohibitive.  A typical  magazine  page  scanned  in
36       color  at 100 dpi in JPEG would typically occupy 100 KB to 200 KB , but
37       the text would be hardly readable: insufficient for screen viewing  and
38       totally  unacceptable for printing. The same page at 300 dpi would have
39       sufficient quality for viewing and printing, but the file size would be
40       300  KB  to  1000  KB  at best, which is impractical for remote access.
41       Another major problem is that a fully decoded 300 dpi color images of a
42       letter-size  page occupies 24 MB of memory and easily causes disk swap‐
43       ping.
44
45       The third reason is that digital documents are more than just a collec‐
46       tion  of  individual  page  images. Pages in a scanned documents have a
47       natural serial order. Special provision must be  made  to  ensure  that
48       flipping pages be instantaneous and effortless so as to maintain a good
49       user experience. Even more important, most  existing  document  formats
50       force  users  to download the entire document first before displaying a
51       chosen page.  However, users often want to jump to individual pages  of
52       the  document  without  waiting  for  the  entire document to download.
53       Efficient browsing requires efficient random page access, fast  sequen‐
54       tial  page  flipping,  and quick rendering. This can be achieved with a
55       combination  of  advanced  compression,   pre-fetching,   pre-decoding,
56       caching, and progressive rendering. DjVu decomposes each page into mul‐
57       tiple  components  (text,  backgrounds,  images,  libraries  of  common
58       shapes...)   that  may  be  shared  by  several pages and downloaded on
59       demand.  All these requirements call for a very sophisticated but  par‐
60       simonious control mechanism to handle on-demand downloading, pre-fetch‐
61       ing, decoding, caching, and progressive rendering of the  page  images.
62       What  is being considered here is not just a document image compression
63       technique, but a whole platform for document delivery.
64
65       DjVu is an image compression technique, a document format, and a  soft‐
66       ware  platform  for  delivering documents images over the Internet that
67       fulfills the above requirements.
68
69

DJVU IMAGE COMPRESSION

71       The DjVu image compression is based on three technologies:
72
73   DjVuPhoto
74       DjVuPhoto, also known as IW44, is a wavelet-based continuous-tone image
75       compression  technique with progressive decoding/rendering.  It is best
76       used for encoding photographic images in colors or in shades  of  gray.
77       Images are typically half the size as JPEG for the same distortion.
78
79   DjVuBitonal
80       DjVuBitonal,  also  known  as  JB2, is a bitonal image compression that
81       takes advantage of repetitions of nearly identical shapes on  the  page
82       (such  as  characters) to efficiently compress text images.  It is best
83       used to compress black and white images representing  text  and  simple
84       drawings.  A typical 300 dpi page in DjVuBitonal occupies 5 to 25 KB (3
85       to 8 times better than TIFF-G4 or PDF ).
86
87   DjVuDocument
88       DjVuDocument is a compression technique specifically designed for color
89       digital  documents  images containing both pictures and text, such as a
90       page of a magazine.  DjVuDocument  represents  images  into  separately
91       compressed  layers.   The  foreground  layer is usually compressed with
92       DjVu Bitonal and contains the text and drawings.  The background  layer
93       is  usually  compressed with DjVuPhoto and contains the background tex‐
94       ture and the pictures at lower resolution.
95
96

DJVU DOCUMENT DELIVERY PLATFORM

98       The DjVu technology is designed from the ground up to support the effi‐
99       cient  delivery  of  digital  documents over the Internet.  It provides
100       various ways to deal with multi-page documents,  and  various  ways  to
101       enrich the content with hyper-links, meta-data, searchable text, etc.
102
103
104   MIME types
105       The  DjVu  format has an official MIME type of image/vnd.djvu, which is
106       the preferred content-type to be given by http servers for DjVu  files.
107       Unofficial  mime  types used historically are image/x.djvu and image/x-
108       djvu, which may still be encountered.  Ideally, clients should be  con‐
109       figured  to  handle all three.  (For web server configuration help, see
110       http://www.djvuzone.org/support/tutorial/chapter-authoring1.html.)
111
112
113   Bundled multi-page documents
114       Bundled multi-page DjVu document uses a single file  to  represent  the
115       entire  document.   This  single file contains all the pages as well as
116       ancillary information (e.g. the page directory, data shared by  several
117       pages,  thumbnails,  etc.).   Using a single file format is very conve‐
118       nient for storing documents or for sending email attachments.
119
120       When you type the URL of a multi-page document, the DjVu browser plugin
121       starts  downloading the whole file, but displays the first page as soon
122       as it is available.  You can immediately navigate to other pages  using
123       the  DjVu  toolbar.   Suppose  however that the document is stored on a
124       remote web server.  You can easily access the first page and  see  that
125       this  is  not the document you wanted.  Although you will never display
126       the other pages the browser is transferring data for these pages and is
127       wasting the bandwidth of your server (and the bandwidth of the Internet
128       too).  You could also see the summary of the document on the first page
129       and  jump to page 100.  But page 100 cannot be displayed until data for
130       pages 1 to 99 has been received.  You may have to wait for  the  trans‐
131       mission of unnecessary page data.  This second problem (the unnecessary
132       wait) can be solved using the ``byte serving'' options of the  HTTP/1.1
133       protocol.  This option has to be supported by the web server, the prox‐
134       ies, the caches and the browser.  Byte serving however does  not  solve
135       the first problem (the waste of bandwidth).
136
137   Indirect multi-page documents
138       Indirect  multi-page  DjVu  documents solve both problems.  An indirect
139       multi-page DjVu document is composed of several files.  The  main  file
140       is  named  the  index file.  You can browse a document using the URL of
141       the index file, just like you do with a  bundled  multi-page  document.
142       The  index file however is very small.  It simply contains the document
143       directory and the URLs of secondary files  containing  the  page  data.
144       When  you  browse  an  indirect  multi-page  document, the browser only
145       accesses data for the pages you are viewing.  This can  be  done  at  a
146       reasonable  speed  because  the  browser maintains a cache of pages and
147       sometimes pre-fetches a few pages ahead  of  the  current  page.   This
148       model  uses  the  web serving bandwidth much more effectively.  It also
149       eliminates unnecessary delays when jumping ahead to pages located  any‐
150       where in a long document.
151
152   Annotations
153       Every  DjVu image optionally includes so-called annotation chunks.  The
154       annotation chunk is often used to define hyper-links to other  document
155       pages  or  to  arbitrary web pages.  Annotation chunks can also be used
156       for other purposes such as setting the initial viewing mode of a  page,
157       defining  highlighted  zones,  or storing arbitrary meta-data about the
158       page or the document.
159
160   Hidden text
161       Every DjVu image optionally includes a hidden text layer  that  associ‐
162       ated  graphical  features with the corresponding text.  The hidden text
163       layer is usually generated by running an Optical Character  Recognition
164       software.   This  textual  information provides for indexing DjVu docu‐
165       ments and copying/pasting text from DjVu page images.
166
167   Thumbnails
168       DjVu documents sometimes contain pre-computed page thumbnails.
169
170

DJVUZONE AND DJVULIBRE

172       The DjVu technology was initially created by a few researchers in  AT&T
173       Labs     between     1995     and    1999.     Lizardtech,    Inc.    (
174       http://www.lizardtech.com ) then obtained  a  commercial  license  from
175       AT&T  and  continued the development.  They have now a variety of solu‐
176       tions for producing and distributing documents using the DjVu  technol‐
177       ogy.
178
179       The DjVuZone web site ( http://www.djvuzone.org ) is managed by the few
180       AT&T Labs researchers who created the  DjVu  technology  in  the  first
181       place.   We  promote  the  DjVu  technology by providing an independent
182       source of information about DjVu.
183
184       Understanding how little room there is for a proprietary document  for‐
185       mat,  Lizardtech released the DjVu Reference Library under the GNU Pub‐
186       lic License in December 2000.  This library entirely defines  the  com‐
187       pression format and the elementary codecs.  Six month later, Lizardtech
188       released an updated DjVu Reference Library as well as the  source  code
189       of the Unix viewer.
190
191       These  two  releases  form the basis of our initial DjVuLibre software.
192       We modified the build system to comply with  the  expectations  of  the
193       open  source  community.  Various bugs and portability issues have been
194       fixed.  We also tried to make it simpler to use and install, while pre‐
195       serving the essential structure of the Lizardtech releases.
196
197       The DjVuLibre software contains the following components:
198
199       bzz(1) A general purpose compression command line program.  Many inter‐
200              nal DjVu data structures are compressed using this technique.
201
202       c44(1) A DjVuPhoto command line encoder. This state-of-the-art  wavelet
203              compressor produces DjVuPhoto images from PPM or JPEG images.
204
205       cjb2(1)
206              A  DjVuBitonal  command line encoder. This soft-pattern-matching
207              compressor produces DjVuBitonal images from PBM images.  It  can
208              encode  images without loss, or introduce small changes in order
209              to improve the compression ratio.  The lossless encoding mode is
210              competitive with that of the Lizardtech commercial encoders.
211
212       cpaldjvu(1)
213              A  DjVuDocument command line encoder for images with few colors.
214              This encoder is well suited to compressing images with  a  small
215              number  of  distinct  colors  (e.g. screen-shots).  The dominant
216              color is encoded by the background layer.  The other colors  are
217              encoded by the foreground layer.
218
219       csepdjvu(1)
220              A  DjVuDocument command line encoder for separated images.  This
221              encoder takes a file  containing  pre-segmented  foreground  and
222              background images and produces a DjVuDocument image.
223
224       ddjvu(1)
225              A command line decoder for DjVu images.  This program produces a
226              PNM image representing any segment of any page of a  DjVu  docu‐
227              ment at any resolution.
228
229       djview(1)
230              A stand-alone viewer for DjVu images.  This sophisticated viewer
231              displays DjVu documents.  It implements document  navigation  as
232              well as fast zooming and panning.
233
234       nsdejavu(1)
235              A web browser plugin for viewing DjVu images.  This small plugin
236              allows for viewing DjVu documents from web browsers.  It  inter‐
237              nally uses djview to perform the actual work.
238
239       djvups(1)
240              A  command  line  tool  for converting DjVu documents into Post‐
241              Script .
242
243       djvm(1)
244              A command line tool for  manipulating  bundled  multi-page  DjVu
245              documents.   This  program  is  often used to collect individual
246              pages and produce a bundled document.
247
248       djvmcvt(1)
249              A command line tool for converting bundled documents to indirect
250              documents and conversely.
251
252       djvused(1)
253              A  powerful  command line tool for manipulating multi-page docu‐
254              ments, creating or editing annotation chunks, creating or  edit‐
255              ing  hidden  text  layers,  pre-computing  thumbnail images, and
256              more...
257
258       djvutxt(1)
259              A command line tool to extract the hidden text from  DjVu  docu‐
260              ments.
261
262       djvudump(1)
263              A  command  line  tool  for inspecting DjVu files and displaying
264              their internal structure.
265
266       djvuextract(1)
267              A command line tool for dis-assembling DjVu image files.
268
269       djvumake(1)
270              A command line tool for assembling DjVu image files.
271
272       djvuserve(1)
273              A CGI program for generating indirect multi-page DjVu  documents
274              on the fly.
275
276       djvutoxml(1), djvuxmlparser(1)
277              Command line tools to edit DjVu metadata as XML files.
278
279

DJVU ENCODERS AND ANY2DJVU

281       DjVuLibre comes with a variety of specialized encoders, c44(1) for pho‐
282       tographic images, cjb2(1)  for  bitonal  images,  and  cpaldjvu(1)  for
283       images  with few distinct colors.  Although these encoders perform well
284       in their specialized domain, they cannot handle complex tasks involving
285       segmentation and multipage encoding.
286
287       The Lizardtech commercial products (see http://www.lizardtech.com/solu
288       tions/document) can perform these complex encoding tasks
289
290
291       Another  solution  is   provided   by   the   compression   server   at
292       (http://any2djvu.djvuzone.org).   This machine uses pre-lizardtech pro‐
293       totype encoders from AT&T Labs and performs almost as well as the  com‐
294       mercial Lizardtech encoders.  Please note that the Any2DjVu compression
295       server comes with no guarantee, that nothing is  done  to  ensure  that
296       your  documents  will  remain  confidential, and that there is only one
297       computer working for the whole planet.
298
299

CREDITS

301       Numerous people have contributed to the DjVu  source  code  during  the
302       last  five years.  Please submit a sourceforge bug report to update the
303       following list.
304
305          Yoshua Bengio, Léon Bottou, Chakradhar Chandaluri, Regis M. Chaplin,
306          Ming  Chen,  Parag  Deshmukh, Royce Edwards, Andrew Erofeev, Praveen
307          Guduru, Patrick Haffner, Paul G. Howard, Orlando Keise, Yann Le Cun,
308          Artem  Mikheev,  Florin  Nicsa, Joseph M. Orost, Steven Pigeon, Bill
309          Riemers, Patrice Simard, Jeffery Triggs, Luc  Vincent,  Pascal  Vin‐
310          cent.
311
312
313
314DjVuLibre-3.5                     10/11/2001                           DJVU(1)
Impressum