1DJVU(1)                          DjVuLibre-3.5                         DJVU(1)
2
3
4

NAME

6       DjVu - DjVu and DjVuLibre.
7
8

INTRODUCTION

10       Although  the Internet has given us a worldwide infrastructure on which
11       to build the universal library, much of the world  knowledge,  history,
12       and  literature  is  still  trapped  on  paper  in the basements of the
13       world's traditional libraries. Many libraries and content owners are in
14       the  process  of digitizing their collections.  While many such efforts
15       involve the painstaking process of converting paper documents  to  com‐
16       puter-friendly  form, such as SGML based formats, the high cost of such
17       conversions limits their extent. Scanning documents,  and  distributing
18       the  resulting  images electronically is not only considerably cheaper,
19       but also more faithful to the original document  because  it  preserves
20       its visual aspect.
21
22       Despite  the quickly improving speed of network connections and comput‐
23       ers, the number of scanned document images accessible on the Web  today
24       is relatively small. There are several reasons for this.
25
26       The  first reason is the relatively high cost of scanning anything else
27       but unbound sheets in black and white. This  problem  is  slowly  going
28       away with the appearance of fast and low-cost color scanners with sheet
29       feeders.
30
31       The second reason is that long-established image compression  standards
32       and  file formats have proved inadequate for distributing scanned docu‐
33       ments at high resolution, particularly color documents.  Not  only  are
34       the file sizes and download times impractical, the decoding and render‐
35       ing times are also prohibitive.  A typical  magazine  page  scanned  in
36       color  at 100 dpi in JPEG would typically occupy 100 KB to 200 KB , but
37       the text would be hardly readable: insufficient for screen viewing  and
38       totally  unacceptable for printing. The same page at 300 dpi would have
39       sufficient quality for viewing and printing, but the file size would be
40       300  KB to 1000 KB at best, which is impractical for remote access. An‐
41       other major problem is that a fully decoded 300 dpi color images  of  a
42       letter-size  page occupies 24 MB of memory and easily causes disk swap‐
43       ping.
44
45       The third reason is that digital documents are more than just a collec‐
46       tion  of  individual  page  images. Pages in a scanned documents have a
47       natural serial order. Special provision must be  made  to  ensure  that
48       flipping pages be instantaneous and effortless so as to maintain a good
49       user experience. Even more important, most  existing  document  formats
50       force  users  to download the entire document first before displaying a
51       chosen page.  However, users often want to jump to individual pages  of
52       the  document without waiting for the entire document to download.  Ef‐
53       ficient browsing requires efficient random page access, fast sequential
54       page  flipping, and quick rendering. This can be achieved with a combi‐
55       nation of advanced compression,  pre-fetching,  pre-decoding,  caching,
56       and progressive rendering. DjVu decomposes each page into multiple com‐
57       ponents (text, backgrounds,  images,  libraries  of  common  shapes...)
58       that  may  be  shared  by  several pages and downloaded on demand.  All
59       these requirements call for a very sophisticated but parsimonious  con‐
60       trol mechanism to handle on-demand downloading, pre-fetching, decoding,
61       caching, and progressive rendering of the page images.  What  is  being
62       considered here is not just a document image compression technique, but
63       a whole platform for document delivery.
64
65       DjVu is an image compression technique, a document format, and a  soft‐
66       ware  platform  for  delivering documents images over the Internet that
67       fulfills the above requirements.
68
69

DJVU IMAGE COMPRESSION

71       The DjVu image compression is based on three technologies:
72
73   DjVuPhoto
74       DjVuPhoto, also known as IW44, is a wavelet-based continuous-tone image
75       compression  technique with progressive decoding/rendering.  It is best
76       used for encoding photographic images in colors or in shades  of  gray.
77       Images are typically half the size as JPEG for the same distortion.
78
79   DjVuBitonal
80       DjVuBitonal,  also  known  as  JB2, is a bitonal image compression that
81       takes advantage of repetitions of nearly identical shapes on  the  page
82       (such  as  characters) to efficiently compress text images.  It is best
83       used to compress black and white images representing  text  and  simple
84       drawings.  A typical 300 dpi page in DjVuBitonal occupies 5 to 25 KB (3
85       to 8 times better than TIFF-G4 or PDF ).
86
87   DjVuDocument
88       DjVuDocument is a compression technique specifically designed for color
89       digital  documents  images containing both pictures and text, such as a
90       page of a magazine.  DjVuDocument  represents  images  into  separately
91       compressed  layers.   The  foreground  layer is usually compressed with
92       DjVu Bitonal and contains the text and drawings.  The background  layer
93       is  usually  compressed with DjVuPhoto and contains the background tex‐
94       ture and the pictures at lower resolution.
95
96

DJVU DOCUMENT DELIVERY PLATFORM

98       The DjVu technology is designed from the ground up to support the effi‐
99       cient  delivery  of  digital  documents over the Internet.  It provides
100       various ways to deal with multi-page documents, and various ways to en‐
101       rich the content with hyper-links, meta-data, searchable text, etc.
102
103
104   MIME types
105       The  DjVu  format has an official MIME type of image/vnd.djvu, which is
106       the preferred content-type to be given by http servers for DjVu  files.
107       Unofficial  mime  types used historically are image/x.djvu and image/x-
108       djvu, which may still be encountered.  Ideally, clients should be  con‐
109       figured to handle all three.
110
111
112   Bundled multi-page documents
113       Bundled  multi-page  DjVu  document uses a single file to represent the
114       entire document.  This single file contains all the pages  as  well  as
115       ancillary  information (e.g. the page directory, data shared by several
116       pages, thumbnails, etc.).  Using a single file format  is  very  conve‐
117       nient for storing documents or for sending email attachments.
118
119       When you type the URL of a multi-page document, the DjVu browser plugin
120       starts downloading the whole file, but displays the first page as  soon
121       as  it is available.  You can immediately navigate to other pages using
122       the DjVu toolbar.  Suppose however that the document is stored on a re‐
123       mote  web  server.   You  can easily access the first page and see that
124       this is not the document you wanted.  Although you will  never  display
125       the other pages the browser is transferring data for these pages and is
126       wasting the bandwidth of your server (and the bandwidth of the Internet
127       too).  You could also see the summary of the document on the first page
128       and jump to page 100.  But page 100 cannot be displayed until data  for
129       pages  1  to 99 has been received.  You may have to wait for the trans‐
130       mission of unnecessary page data.  This second problem (the unnecessary
131       wait)  can be solved using the ``byte serving'' options of the HTTP/1.1
132       protocol.  This option has to be supported by the web server, the prox‐
133       ies,  the  caches and the browser.  Byte serving however does not solve
134       the first problem (the waste of bandwidth).
135
136   Indirect multi-page documents
137       Indirect multi-page DjVu documents solve both  problems.   An  indirect
138       multi-page  DjVu  document is composed of several files.  The main file
139       is named the index file.  You can browse a document using  the  URL  of
140       the  index  file,  just like you do with a bundled multi-page document.
141       The index file however is very small.  It simply contains the  document
142       directory  and  the  URLs  of secondary files containing the page data.
143       When you browse an indirect multi-page document, the browser  only  ac‐
144       cesses  data for the pages you are viewing.  This can be done at a rea‐
145       sonable speed because the browser maintains a cache of pages and  some‐
146       times  pre-fetches  a  few pages ahead of the current page.  This model
147       uses the web serving bandwidth much more effectively.  It  also  elimi‐
148       nates  unnecessary  delays when jumping ahead to pages located anywhere
149       in a long document.
150
151   Annotations
152       Every DjVu image optionally includes so-called annotation chunks.   The
153       annotation  chunk is often used to define hyper-links to other document
154       pages or to arbitrary web pages.  Annotation chunks can  also  be  used
155       for  other purposes such as setting the initial viewing mode of a page,
156       defining highlighted zones, or storing arbitrary  meta-data  about  the
157       page or the document.
158
159   Hidden text
160       Every  DjVu  image optionally includes a hidden text layer that associ‐
161       ated graphical features with the corresponding text.  The  hidden  text
162       layer  is usually generated by running an Optical Character Recognition
163       software.  This textual information provides for  indexing  DjVu  docu‐
164       ments and copying/pasting text from DjVu page images.
165
166   Thumbnails
167       DjVu documents sometimes contain pre-computed page thumbnails.
168
169   Outline
170       DjVu  documents sometimes contain a navigation chunk containing an out‐
171       line, that is, a hierarchical table of contents with  pointers  to  the
172       corresponding document pages.
173
174

DJVUZONE AND DJVULIBRE

176       The  DjVu technology was initially created by a few researchers in AT&T
177       Labs between 1995 and 1999.  Lizardtech, Inc.  then obtained a  commer‐
178       cial license from AT&T and continued the development. The current owner
179       of  the  DjVu  commercial  rights  is   Cuminas   (   https://www.cumi
180       nas.jp/en/about_djvu ), offers solutions for producing and distributing
181       documents using the DjVu technology, as well as a DjVu viewer  packaged
182       as a Chrome extension.
183
184       The  DjVu.org  web  site  ( http://www.djvu.org ) is managed by the few
185       AT&T Labs researchers who created the  DjVu  technology  in  the  first
186       place.   We  promote  the  DjVu  technology by providing an independent
187       source of information about DjVu.
188
189       Understanding how little room there is for a proprietary document  for‐
190       mat,  Lizardtech released the DjVu Reference Library under the GNU Pub‐
191       lic License in December 2000.  This library entirely defines  the  com‐
192       pression format and the elementary codecs.  Six month later, Lizardtech
193       released an updated DjVu Reference Library as well as the  source  code
194       of the Unix viewer.
195
196       These  two  releases  form the basis of our initial DjVuLibre software.
197       We modified the build system to comply with  the  expectations  of  the
198       open  source  community.  Various bugs and portability issues have been
199       fixed.  We also tried to make it simpler to use and install, while pre‐
200       serving the essential structure of the Lizardtech releases.
201
202       The DjVuLibre software contains the following components:
203
204       bzz(1) A general purpose compression command line program.  Many inter‐
205              nal DjVu data structures are compressed using this technique.
206
207       c44(1) A DjVuPhoto command line encoder. This state-of-the-art  wavelet
208              compressor produces DjVuPhoto images from PPM or JPEG images.
209
210       cjb2(1)
211              A  DjVuBitonal  command line encoder. This soft-pattern-matching
212              compressor produces DjVuBitonal images from PBM images.  It  can
213              encode  images without loss, or introduce small changes in order
214              to improve the compression ratio.  The lossless encoding mode is
215              competitive with that of the Lizardtech commercial encoders.
216
217       cpaldjvu(1)
218              A  DjVuDocument command line encoder for images with few colors.
219              This encoder is well suited to compressing images with  a  small
220              number  of  distinct  colors  (e.g. screen-shots).  The dominant
221              color is encoded by the background layer.  The other colors  are
222              encoded by the foreground layer.
223
224       csepdjvu(1)
225              A  DjVuDocument command line encoder for separated images.  This
226              encoder takes a file  containing  pre-segmented  foreground  and
227              background images and produces a DjVuDocument image.
228
229       ddjvu(1)
230              A command line decoder for DjVu images.  This program produces a
231              PNM image representing any segment of any page of a  DjVu  docu‐
232              ment at any resolution.
233
234       djview(1)
235              A stand-alone viewer for DjVu images.  This sophisticated viewer
236              displays DjVu documents.  It implements document  navigation  as
237              well as fast zooming and panning.
238
239       nsdejavu(1)
240              A web browser plugin for viewing DjVu images.  This small plugin
241              allows for viewing DjVu documents from web browsers.  It  inter‐
242              nally uses djview to perform the actual work.
243
244       djvups(1)
245              A  command  line  tool  for converting DjVu documents into Post‐
246              Script .
247
248       djvm(1)
249              A command line tool for  manipulating  bundled  multi-page  DjVu
250              documents.   This  program  is  often used to collect individual
251              pages and produce a bundled document.
252
253       djvmcvt(1)
254              A command line tool for converting bundled documents to indirect
255              documents and conversely.
256
257       djvused(1)
258              A  powerful  command line tool for manipulating multi-page docu‐
259              ments, creating or editing annotation chunks, creating or  edit‐
260              ing  hidden  text  layers,  pre-computing  thumbnail images, and
261              more...
262
263       djvutxt(1)
264              A command line tool to extract the hidden text from  DjVu  docu‐
265              ments.
266
267       djvudump(1)
268              A  command  line  tool  for inspecting DjVu files and displaying
269              their internal structure.
270
271       djvuextract(1)
272              A command line tool for dis-assembling DjVu image files.
273
274       djvumake(1)
275              A command line tool for assembling DjVu image files.
276
277       djvuserve(1)
278              A CGI program for generating indirect multi-page DjVu  documents
279              on the fly.
280
281       djvutoxml(1), djvuxmlparser(1)
282              Command line tools to edit DjVu metadata as XML files.
283
284

DJVU ENCODERS AND ANY2DJVU

286       DjVuLibre comes with a variety of specialized encoders, c44(1) for pho‐
287       tographic images, cjb2(1) for bitonal images, and cpaldjvu(1)  for  im‐
288       ages with few distinct colors.  Although these encoders perform well in
289       their specialized domain, they cannot handle  complex  tasks  involving
290       segmentation and multipage encoding.
291
292       The Lizardtech commercial products (see http://www.lizardtech.com/solu
293       tions/document) can perform these complex encoding tasks
294
295
296       Another  solution  is   provided   by   the   compression   server   at
297       (http://any2djvu.djvu.org).  This machine uses pre-lizardtech prototype
298       encoders from AT&T Labs and performs almost as well as  the  commercial
299       Lizardtech  encoders.  Please note that the Any2DjVu compression server
300       comes with no guarantee, that nothing is done to ensure that your docu‐
301       ments  will  remain  confidential,  and that there is only one computer
302       working for the whole planet.
303
304

CREDITS

306       Numerous people have contributed to the DjVu  source  code  during  the
307       last  five years.  Please submit a sourceforge bug report to update the
308       following list.
309
310          Yoshua Bengio, Léon Bottou, Chakradhar Chandaluri, Regis M. Chaplin,
311          Ming  Chen,  Parag  Deshmukh, Royce Edwards, Andrew Erofeev, Praveen
312          Guduru, Patrick Haffner, Paul G. Howard, Orlando Keise, Yann Le Cun,
313          Artem  Mikheev,  Florin  Nicsa, Joseph M. Orost, Steven Pigeon, Bill
314          Riemers, Patrice Simard, Jeffery Triggs, Luc  Vincent,  Pascal  Vin‐
315          cent.
316
317
318
319DjVuLibre-3.5                     10/11/2001                           DJVU(1)
Impressum