1DJVUSED(1) DjVuLibre-3.5 DJVUSED(1)
2
3
4
6 djvused - Multi-purpose DjVu document editor.
7
8
10 djvused [options] djvufile
11
12
14 Program djvused is a powerful command line tool for manipulating multi-
15 page documents, creating or editing annotation chunks, creating or
16 editing hidden text layers, pre-computing thumbnail images, and more.
17 The program first reads the DjVu document djvufile and executes a num‐
18 ber of djvused commands.
19
20 Djvused commands can be read from a specific file (when option -f is
21 specified), read from the command line (when option -e is specified),
22 or read from the standard input (the default).
23
24
26 -v Cause djvused to print a command line prompt before reading com‐
27 mands and a brief message describing how each command was exe‐
28 cuted. This option is very useful for debugging djvused scripts
29 and also for interactively entering djvused commands on the
30 standard input.
31
32 -f scriptfile
33 Cause djvused to read commands from file scriptfile.
34
35 -e command
36 Cause djvused to execute the commands specified by the option
37 argument commands. It is advisable to surround the djvused com‐
38 mands by single quotes in order to prevent unwanted shell expan‐
39 sion.
40
41 -s Cause djvused to save the file djvufile after executing the
42 specified commands. This is similar to executing command save
43 immediately before terminating the program.
44
45 -n Cause djvused to disregard save commands. This is useful for
46 debugging djvused scripts without overwriting files on your
47 disk.
48
49
51 There are many ways to use program djvused. The following examples
52 illustrate some common uses of this program.
53
54
55 Obtaining the size of a page
56 Command size outputs the width and height of the selected pages using a
57 HTML friendly syntax. For instance, the following command prints the
58 size of page 3 of document myfile.djvu.
59
60 djvused myfile.djvu -e 'select 3; size'
61
62 Extracting the hidden text
63 Command print-pure-txt outputs the text associated with a page or a
64 document. For instance, the following shell command outputs the text
65 for the entire document. Lines and pages are delimited by the usual
66 control characters.
67
68 djvused myfile.djvu -e 'print-pure-txt'
69
70 Command print-txt produces a more extensive output describing the
71 structure and the location of the text components. The syntax of this
72 output is described later in this man page. For instance, the follow‐
73 ing shell command outputs extended text information for page 3 of docu‐
74 ment myfile.djvu.
75
76 djvused myfile.djvu -e 'select 3; print-txt'
77
78 Extracting the annotations
79 Annotation data can be extracted using command print-ant. The syntax
80 of the annotation data is described later in this man page. For
81 instance, the following shell command outputs the annotation data for
82 the first page of document myfile.djvu.
83
84 djvused myfile.djvu -e 'select 1; print-ant'
85
86 Command print-ant only prints the annotations stored in the selected
87 component file. Command print-merged-ant also retrieves annotations
88 from all the component files referenced by the current page (using INCL
89 chunks) and prints the merged information.
90
91
92 Dumping/restoring annotations and text
93 Three commands, output-txt, output-ant, and output-all, produce djvused
94 scripts. For instance, the following shell command produces a djvused
95 script, myfile.dsed, that recreates all the text and annotation data in
96 document myfile.djvu.
97
98 djvused myfile.djvu -e 'output-all' > myfile.dsed
99
100 Script myfile.dsed is a text file that can be easily edited. The fol‐
101 lowing shell command then recreates the text and annotation information
102 in file myfile.djvu.
103
104 djvused myfile.djvu -f myfile.dsed -s
105
106
107 Extracting a page
108 Both commands save-page and save-page-with create a DjVu file repre‐
109 senting the selected component file of a document. The following shell
110 command, for instance, creates a file p05.djvu containing page 5 of
111 document myfile.djvu.
112
113 djvused myfile.djvu -e 'select 5; save-page p05.djvu'
114
115 Each page of a document might import data from another component file
116 using the so-called inclusion ( INCL ) chunks. Command save-page then
117 produces a file with unresolved references to imported data. Such a
118 file should then be made part of a multi-page document containing the
119 required data in other component files. On the other hand, command
120 save-page-with copies all the imported data into the output file. This
121 file is directly usable. Yet collecting several such files into a
122 multi-page document might lead to useless data replication.
123
124
125 Pre-computing thumbnails
126 Commands set-thumbnails constructs thumbnails that can be later dis‐
127 played by DjVu viewers. The following shell command, for instance,
128 computes thumbnails of size 64x64 pixels for all pages of file
129 myfile.djvu.
130
131 djvused myfile.djvu -e 'set-thumbnails 64' -s
132
133
135 Command lines might contain zero, one, or more djvused commands and an
136 optional comment. Multiple djvused commands must be separated by a
137 semicolon character ';'. Comments are introduced by the '#' character
138 and extend until the end of the command line.
139
140
141 Selection commands
142 Multi-page DjVu documents are composed of a number of component files.
143 Most component files describe a specific page of a document. Some com‐
144 ponent files contain information shared by several pages such as shared
145 image data, shared annotations or thumbnails. Many djvused commands
146 operate on selected component files. All component files are initially
147 selected. The following commands are useful for changing the selec‐
148 tion.
149
150 ls List all component files in the document. Each line contains an
151 optional page number, a letter describing the component file
152 type, the size of the component file, and the identifier of the
153 component file. Component file type letters P, I, A, and T
154 respectively stand for page data, shared image data, shared
155 annotation data, and thumbnail data. Page numbers are only
156 listed for component files containing page data.
157
158 select [fileid]
159 Select the component file identified by argument fileid. Argu‐
160 ment fileid must be either a page number or a component file
161 identifier. The select command selects all component files when
162 the argument fileid is omitted.
163
164 select-shared-ant
165 Select a component file containing shared annotations. Only one
166 such component file is supported by the current DjVu software.
167 This component file usually contains annotations pertaining to
168 the whole document as opposed to specific pages. An error mes‐
169 sage is displayed if there is no such component file.
170
171 create-shared-ant
172 Create and select a component file containing shared annota‐
173 tions. This command only selects the shared annotation compo‐
174 nent file if such a component file already exists. Otherwise it
175 creates a new shared annotation component file and makes sure
176 that it is imported by all pages in the document.
177
178
179 Miscellaneous commands
180 help Display a help message listing all commands supported by
181 djvused.
182
183 n Print the total number of pages in the document.
184
185 dump Display the EA IFF 85 structure of the document or of the
186 selected component file. A similar capability is offered by
187 program djvudump.
188
189 size Display the width and the height of the selected pages. The
190 dimensions of each page are displayed using a syntax suitable
191 for direct insertion into the <EMBED...></EMBED> tags.
192
193
194 Text and annotation commands
195 print-pure-txt
196 Print the text stored in the hidden text layer of the selected
197 pages. A similar capability is offered by program djvutxt.
198 Structural information is sometimes represented by control char‐
199 acters. Text from different pages is delimited by form feed
200 characters ("\f"). Lines are delimited by newline characters
201 ("\n"). Columns, regions, and paragraphs are sometimes delim‐
202 ited by vertical tab ("\013"), group separators ("\035") and
203 unit separators ("\037") respectively.
204
205 print-txt
206 Prints extensive hidden text information for the selected pages.
207 This information describes the structure of the text on the doc‐
208 ument page and locates the structural elements in the page
209 image. The syntax of this output is described later in this man
210 page.
211
212 remove-txt
213 Remove the hidden text information from the selected component
214 files. For instance, executing commands select and remove-txt
215 removes all hidden text information from the DjVu document.
216
217 set-txt [djvusedtxtfile]
218 Insert hidden text information into the selected pages. The
219 optional argument djvusedtxtfile names a file containing the
220 hidden text information. This file must contain data similar to
221 what is produced by command print-txt. When the optional argu‐
222 ment is omitted, the program reads the hidden text information
223 from the djvused script until reaching an end-of-file or a line
224 containing a single period.
225
226 output-txt
227 Prints a djvused script that reconstructs the hidden text infor‐
228 mation for the selected pages. This script can later be edited
229 and executed by invoking program djvused with option -f.
230
231 print-ant
232 Prints the annotations of the selected component file. The
233 annotation data is represented using a simple syntax described
234 later in this document.
235
236 print-merged-ant
237 Merge the annotations stored in the selected component files
238 with the annotations imported from other component files such as
239 the shared annotation component file.. The annotation data is
240 represented using a simple syntax described later in this docu‐
241 ment.
242
243 remove-ant
244 Remove the annotation information from the selected component
245 files. For instance, executing commands select and remove-ant
246 removes all annotation information from the DjVu document.
247
248 set-ant [djvusedantfile]
249 Insert annotations into the selected component file. The
250 optional argument djvusedantfile names a file containing the
251 annotation data. This file must contain data similar to what is
252 produced by command print-ant. When the optional argument is
253 omitted, the program reads the annotation data from the djvused
254 script itself until reaching an end-of-file or a line containing
255 a single period.
256
257 output-ant
258 Print a djvused script that reconstructs the annotation informa‐
259 tion for the selected pages. This script can later be edited
260 and executed by invoking program djvused with option -f.
261
262 print-meta
263 Print the meta-data part of the annotations for the selected
264 component file. This command displays a subset of the informa‐
265 tion printed by command print-ant using a different syntax.
266 Meta-data are organized as key-value pairs. Each printed line
267 contains the key name such as author, title,etc., followed by a
268 tab character ("\t") and a double-quoted string representing the
269 UTF-8 encoded meta-data value.
270
271 set-meta [djvusedmetafile]
272 Set the meta-data part of the annotations of the selected compo‐
273 nent file. The remaining part of the annotations is left
274 unchanged The optional argument djvusedmetafile names a file
275 containing the meta-data. This file must contain data similar
276 to what is produced by command print-meta. When the optional
277 argument is omitted, the program reads the annotation data from
278 the djvused script itself until reaching an end-of-file or a
279 line containing a single period.
280
281 output-all
282 Print a djvused script that reconstructs both the hidden text
283 and the annotation information for the selected pages. This
284 script can later be edited and executed by invoking program
285 djvused with option -f.
286
287 Outline/bookmarks commands
288 print-outline
289 Print the outline of the document. Nothing is printed if the
290 document contains no outline.
291
292 set-outline [djvusedoutlinefile]
293 Insert outline information into the document. The optional
294 argument djvusedoutlinefile names a file containing the outline
295 information. This file must contain data similar to what is
296 produced by command print-outline. When the optional argument
297 is omitted, the program reads the hidden text information from
298 the djvused script until reaching an end-of-file or a line con‐
299 taining a single period.
300
301 Thumbnail commands
302 set-thumbnails sz
303 Compute thumbnails of size szxsz pixels and insert them into the
304 document. DjVu viewers can later display these thumbnails very
305 efficiently without need to download the data for each page.
306 Typical thumbnail size range from 48 to 128 pixels.
307
308 remove-thumbnails
309 Remove the pre-computed thumbnails from the DjVu document. New
310 thumbnails can then be computed using command set-thumbnails.
311
312
313 Save commands
314 The above commands only modify the memory image of the DjVu document.
315 The following commands provide means to save the modified data into the
316 file system.
317
318 save Save the modified DjVu document back into the input file djvu‐
319 file specified by the arguments of the program djvused. Nothing
320 is done if the DjVu file was not modified. Passing option -s
321 program djvused is equivalent to executing command save before
322 exiting the program.
323
324 save-bundled filename
325 Save the current DjVu document as a bundled multi-page DjVu doc‐
326 ument named filename. A similar capability is offered by pro‐
327 gram djvmcvt.
328
329 save-indirect filename
330 Save the current DjVu document as an indirect multi-page DjVu
331 document. The index file of the indirect document will be named
332 filename. All other files composing the indirect document will
333 be saved into the same directory as the index file. A similar
334 capability is offered by program djvmcvt.
335
336 save-page filename
337 Save the selected component file into DjVu file filename. The
338 selected component file might import data from another component
339 file using the so-called inclusion ( INCL ) chunks. This com‐
340 mand then produces a file with unresolved references to imported
341 data. Such a file should then be made part of a multi-page doc‐
342 ument containing the required data in other component files.
343
344 save-page-with filename
345 Save the selected component file into DjVu file filename. All
346 data imported from other component files is copied into the out‐
347 put file as well. This command always produces a usable DjVu
348 file. On the other hand, collecting several such files into a
349 multi-page document might lead to useless data replication.
350
351
352
354 Djvused uses a simple parenthesized syntax to represent both annota‐
355 tions and hidden text.
356
357 * This syntax is the native syntax used by DjVu for storing annota‐
358 tions. Program djvused simply compresses the annotation data using
359 the bzz(1) algorithm.
360
361 * This syntax differs from the native syntax used by DjVu for storing
362 the hidden text. Program djvused performs the translations between
363 the compact binary representation used by DjVu and the easily modi‐
364 fiable parenthesized syntax.
365
366 General syntax
367 Djvused files are ASCII text files. The legal characters in djvused
368 files are the printable ASCII characters and the space, tab, cr, and nl
369 characters. Using other characters has undefined results.
370
371 Djvused files are composed of a sequence of expressions separated by
372 blank characters (space, tab, cr, or nl). There are four kind of
373 expressions, namely integers, symbols, strings and lists.
374
375 Integers:
376 Integer numbers are represented by one or more digits, with the
377 usual interpretation.
378
379 Symbols:
380 Symbols, or identifiers, are sequences of printable ascii char‐
381 acters representing a name or a keyword. Acceptable characters
382 are the alpha-numeric characters, the underscore "_", the minus
383 character "-", and the hash character "#". Names should not
384 begin with a digit or a minus character.
385
386 Strings:
387 Strings denote an arbitrary sequence of bytes, usually inter‐
388 preted as a sequence of UTF-8 encoded characters. Strings in
389 djvused files are similar to strings in the C language. They
390 are surrounded by double quote characters. Certain sequences of
391 characters starting with a backslash ("\") have a special mean‐
392 ing. A backslash followed by letter "a", "b", "t", "n", "v",
393 "f", "r", "\", and stands for the ascii character BEL(007),
394 BS(008), HT(009), LF(010), VT(011), FF(012), CR(013), BACK‐
395 SLASH(134) and DOUBLEQUOTE(042) respectively. A backslash fol‐
396 lowed by one to three digits stands for the byte whose octal
397 code is expressed by the digits. All other backslash sequences
398 are illegal. All non printable ascii characters must be
399 escaped.
400
401 Lists: Lists are sequence of expressions separated by blanks and sur‐
402 rounded by parentheses. All expressions types are acceptable
403 within a list, including sub-lists.
404
405
406 Hidden text syntax
407 The building blocks of the hidden text syntax are lists representing
408 each structural component of the hidden text. Structural components
409 have the following form:
410
411 (type xmin xmax ymin ymax ... )
412
413 The symbol type must be one of page, column, region, para, line, word,
414 or char, listed here by decreasing order of importance. The integers
415 xmin, xmax, ymin, and ymax represent the coordinates of a rectangle
416 indicating the position of the structural component in the page. Coor‐
417 dinates are measured in pixels and have their origin at the bottom left
418 corner of the page. The remaining expressions in the list either is a
419 single string representing the encoded text associated with this struc‐
420 tural component, or is a sequence of structural components with a
421 lesser type.
422
423 The hidden text for each page is simply represented by a single struc‐
424 tural element of type page. Various level of structural information
425 are acceptable. For instance, the page level component might only
426 specify a page level string, or might only provide a list of lines, or
427 might provide a full hierarchy down to the individual characters.
428
429
430 Outline/Bookmark syntax
431 The outline syntax is a single list of the form
432
433 (bookmarks ...)
434
435 The first element of the list is symbol bookmarks. The subsequent ele‐
436 ments are lists representing the toplevel outline entries. Each out‐
437 line entry is represented by a list with the following form:
438
439 (title url ... )
440
441 The string title is the title of the outline entry. The string url is
442 composed of the hash character ("#") followed by either the component
443 file identifier or the page number corresponding to the outline entry.
444 The remaining expressions describe subentries of this outline entry.
445
446
447 Annotation syntax
448 Annotations are represented by a sequence of annotation expressions.
449 The following annotation expressions are recognized:
450
451 (background color)
452 Specify the color of the viewer area surrounding the DjVu image.
453 Colors are represented with the X11 hexadecimal syntax #RRGGBB.
454 For instance, #000000 is black and #FFFFFF is white.
455
456 (zoom zoomvalue)
457 Specify the initial zoom factor of the image. Argument zoom‐
458 value can be one of stretch, one2one, width, page, or composed
459 of the letter d followed by a number in range 1 to 999 repre‐
460 senting a zoom factor (such as in d300 or d150 for instance.)
461
462 (mode modevalue)
463 Specify the initial display mode of the image. Argument mode‐
464 value is one of color, bw, fore, or back.
465
466 (align horzalign vertalign)
467 Specify how the image should be aligned on the viewer surface.
468 By default the image is located in the center. Argument horza‐
469 lign can be one of left, center, or right. Argument vertalign
470 can be one of top, center, or bottom.
471
472 (maparea url comment area ...)
473 Define an hyper-link for the specified destination.
474
475 Argument url can have one of the following forms:
476
477 href
478 (url href target)
479
480 where href is a string representing the destination and target
481 is a string representing the target frame for the hyper-link, as
482 defined by the HTML anchor tag <A>. The destination string href
483 can be an arbitrary URL or can be composed of the hash character
484 ("#") followed by either a component file identifier or a page
485 number. Page numbers may be prefixed with an optional sign to
486 represent a page displacement. For instance the strings "#-1"
487 and "#+1" can be used to access the previous page and the next
488 page.
489
490 Argument comment is a string that might be displayed by the
491 viewer when the user moves the mouse over the hyper-link.
492
493 Argument area defines the shape and the location of the hyper‐
494 link. The following forms are recognized:
495
496 (rect xmin ymin width height)
497 (oval xmin ymin width height)
498 (poly x0 y0 x1 y1 ... )
499 (text xmin ymin width height) - Not implemented.
500 (line x0 y0 x1 y1) - Not implemented.
501
502 All parameters are numbers representing coordinates. Coordi‐
503 nates are measured in pixels and have their origin at the bottom
504 left corner of the page.
505
506 The remaining expressions in the maparea list represent the vis‐
507 ual effect associated with the hyper-link.
508
509 A first set of options defines how borders are drawn for rect,
510 oval, polygon, or text hyperlink areas.
511
512 (none)
513 (xor)
514 (border color)
515 (shadow_in [thickness])
516 (shadow_out [thickness])
517 (shadow_ein [thickness])
518 (shadow_eout [thickness])
519
520 where parameter color has syntax #RRGGBB as described above, and
521 parameter thickness is an integer in range 1 to 32. The last
522 four border options are only supported for rect hyperlink areas.
523 The default border is a simple black line. Border options do
524 not apply to line areas.
525
526 When a border option is specified, the border becomes visible
527 when the user moves the mouse over the hyperlink. The border may
528 be made always visible by using the following option:
529
530 (border-avis)
531
532 The following two options may be used with rect hyperlink areas.
533 The complete area will be highlighted using the specified color
534 at the specified opacity (0-100, default 50).
535
536 (hilite color)
537 (opacity op) - Not implemented.
538
539 This is often used with an empty URL for simply emphasizing a
540 specific segment of an image.
541
542 The following three options may be used with line areas to spec‐
543 ify an optional ending arrow, the line width and color. The
544 default is a black line with width 1 and without arrow.
545
546 (arrow) - Not implemented.
547 (width w) - Not implemented.
548 (lineclr color) - Not implemented.
549
550 Finally the following three options can be used with text areas.
551 The default background color is transparent. The default text
552 color is black. The pushpin option indicates that the text is
553 symbolized by a small pushpin icon. Clicking the icon reveals
554 the text.
555
556 (backclr bkcolor) - Not implemented.
557 (textclr txtcolor) - Not implemented.
558 (pushpin) - Not implemented.
559
560 (metadata ... (key value) ... )
561 Define meta-data entries. Each entry is identified by a symbol
562 key representing the nature of the meta data entry. Typical
563 keys include year, booktitle, editor, author, etc. It is sug‐
564 gested to use the same key names as the BibTeX bibliography sys‐
565 tem. String value represents the value associated with the cor‐
566 responding key.
567
568
570 The current version of program djvused only supports selecting one com‐
571 ponent file or all component files. There is no way to select only a
572 few component files.
573
574
576 This program was initially written by Léon Bottou <leonb@users.source‐
577 forge.net> and was improved by Yann Le Cun <profshadoko@users.source‐
578 forge.net>, Florin Nicsa, Bill Riemers <docbill@sourceforge.net> and
579 many others.
580
581
583 djvu(1), djvutxt(1), djvmcvt(1), djvudump(1), bzz(1)
584
585
586
587DjVuLibre-3.5 5/22/2005 DJVUSED(1)