1DJVUSED(1) DjVuLibre-3.5 DJVUSED(1)
2
3
4
6 djvused - Multi-purpose DjVu document editor.
7
8
10 djvused [options] djvufile
11
12
13
15 Program djvused is a powerful command line tool for manipulating multi-
16 page documents, creating or editing annotation chunks, creating or
17 editing hidden text layers, pre-computing thumbnail images, and more.
18 The program first reads the DjVu document djvufile and executes a num‐
19 ber of djvused commands.
20
21 Djvused commands can be read from a specific file (when option -f is
22 specified), read from the command line (when option -e is specified),
23 or read from the standard input (the default).
24
25
27 -v Cause djvused to print a command line prompt before reading com‐
28 mands and a brief message describing how each command was exe‐
29 cuted. This option is very useful for debugging djvused scripts
30 and also for interactively entering djvused commands on the
31 standard input.
32
33 -f scriptfile
34 Cause djvused to read commands from file scriptfile.
35
36 -e command
37 Cause djvused to execute the commands specified by the option
38 argument commands. It is advisable to surround the djvused com‐
39 mands by single quotes in order to prevent unwanted shell expan‐
40 sion.
41
42 -s Cause djvused to save the file djvufile after executing the
43 specified commands. This is similar to executing command save
44 immediately before terminating the program.
45
46 -u Cause djvused to print hidden text and annotations as UTF-8 in‐
47 stead of encoding non-ASCII characters with octal escape se‐
48 quences for maximal portability. This option is convenient for
49 manually editing or viewing the djvused output. This option
50 also causes the emission of an UTF-8 BOM under Windows.
51
52 -n Cause djvused to disregard save commands. This is useful for
53 debugging djvused scripts without overwriting files on your
54 disk.
55
56
58 There are many ways to use program djvused. The following examples il‐
59 lustrate some common uses of this program.
60
61
62 Obtaining the size of a page
63 Command size outputs the width and height of the selected pages using a
64 HTML friendly syntax. For instance, the following command prints the
65 size of page 3 of document myfile.djvu.
66
67 djvused myfile.djvu -e 'select 3; size'
68
69 Extracting the hidden text
70 Command print-pure-txt outputs the text associated with a page or a
71 document. For instance, the following shell command outputs the text
72 for the entire document. Lines and pages are delimited by the usual
73 control characters.
74
75 djvused myfile.djvu -e 'print-pure-txt'
76
77 Command print-txt produces a more extensive output describing the
78 structure and the location of the text components. The syntax of this
79 output is described later in this man page. For instance, the follow‐
80 ing shell command outputs extended text information for page 3 of docu‐
81 ment myfile.djvu.
82
83 djvused myfile.djvu -e 'select 3; print-txt'
84
85 Extracting the annotations
86 Annotation data can be extracted using command print-ant. The syntax
87 of the annotation data is described later in this man page. For in‐
88 stance, the following shell command outputs the annotation data for the
89 first page of document myfile.djvu.
90
91 djvused myfile.djvu -e 'select 1; print-ant'
92
93 Command print-ant only prints the annotations stored in the selected
94 component file. Command print-merged-ant also retrieves annotations
95 from all the component files referenced by the current page (using INCL
96 chunks) and prints the merged information.
97
98
99 Dumping/restoring annotations and text
100 Three commands, output-txt, output-ant, and output-all, produce djvused
101 scripts. For instance, the following shell command produces a djvused
102 script, myfile.dsed, that recreates all the text and annotation data in
103 document myfile.djvu.
104
105 djvused myfile.djvu -e 'output-all' > myfile.dsed
106
107 Script myfile.dsed is a text file that can be easily edited. The fol‐
108 lowing shell command then recreates the text and annotation information
109 in file myfile.djvu.
110
111 djvused myfile.djvu -f myfile.dsed -s
112
113
114 Extracting a page
115 Both commands save-page and save-page-with create a DjVu file repre‐
116 senting the selected component file of a document. The following shell
117 command, for instance, creates a file p05.djvu containing page 5 of
118 document myfile.djvu.
119
120 djvused myfile.djvu -e 'select 5; save-page p05.djvu'
121
122 Each page of a document might import data from another component file
123 using the so-called inclusion ( INCL ) chunks. Command save-page then
124 produces a file with unresolved references to imported data. Such a
125 file should then be made part of a multi-page document containing the
126 required data in other component files. On the other hand, command
127 save-page-with copies all the imported data into the output file. This
128 file is directly usable. Yet collecting several such files into a
129 multi-page document might lead to useless data replication.
130
131
132 Pre-computing thumbnails
133 Commands set-thumbnails constructs thumbnails that can be later dis‐
134 played by DjVu viewers. The following shell command, for instance,
135 computes thumbnails of size 64x64 pixels for all pages of file my‐
136 file.djvu.
137
138 djvused myfile.djvu -e 'set-thumbnails 64' -s
139
140
142 Command lines might contain zero, one, or more djvused commands and an
143 optional comment. Multiple djvused commands must be separated by a
144 semicolon character ';'. Comments are introduced by the '#' character
145 and extend until the end of the command line.
146
147
148 Selection commands
149 Multi-page DjVu documents are composed of a number of component files.
150 Most component files describe a specific page of a document. Some com‐
151 ponent files contain information shared by several pages such as shared
152 image data, shared annotations or thumbnails. Many djvused commands
153 operate on selected component files. All component files are initially
154 selected. The following commands are useful for changing the selec‐
155 tion.
156
157 n Print the total number of pages in the document.
158
159 ls List all component files in the document. Each line contains an
160 optional page number, a letter describing the component file
161 type, the size of the component file, and identifier of the com‐
162 ponent file. Component file type letters P, I, A, and T respec‐
163 tively stand for page data, shared image data, shared annotation
164 data, and thumbnail data. Page numbers are only listed for com‐
165 ponent files containing page data. When it is set, the optional
166 page title (see command set-page-title below) is displayed after
167 the component file identifier.
168
169 select [fileid]
170 Select the component file identified by argument fileid. Argu‐
171 ment fileid must be either a page number or a component file
172 identifier. The select command selects all component files when
173 the argument fileid is omitted.
174
175 select-shared-ant
176 Select a component file containing shared annotations. Only one
177 such component file is supported by the current DjVu software.
178 This component file usually contains annotations pertaining to
179 the whole document as opposed to specific pages. An error mes‐
180 sage is displayed if there is no such component file.
181
182 create-shared-ant
183 Create and select a component file containing shared annota‐
184 tions. This command only selects the shared annotation compo‐
185 nent file if such a component file already exists. Otherwise it
186 creates a new shared annotation component file and makes sure
187 that it is imported by all pages in the document.
188
189 showsel
190 Shows the currently selected component files with the same for‐
191 mat as command ls.
192
193
194 Text and annotation commands
195 print-pure-txt
196 Print the text stored in the hidden text layer of the selected
197 pages. A similar capability is offered by program djvutxt.
198 Structural information is sometimes represented by control char‐
199 acters. Text from different pages is delimited by form feed
200 characters ("\f"). Lines are delimited by newline characters
201 ("\n"). Columns, regions, and paragraphs are sometimes delim‐
202 ited by vertical tab ("\013"), group separators ("\035") and
203 unit separators ("\037") respectively.
204
205 print-txt
206 Prints extensive hidden text information for the selected pages.
207 This information describes the structure of the text on the doc‐
208 ument page and locates the structural elements in the page im‐
209 age. The syntax of this output is described later in this man
210 page.
211
212 remove-txt
213 Remove the hidden text information from the selected component
214 files. For instance, executing commands select and remove-txt
215 removes all hidden text information from the DjVu document.
216
217 set-txt [djvusedtxtfile]
218 Insert hidden text information into the selected pages. The op‐
219 tional argument djvusedtxtfile names a file containing the hid‐
220 den text information. This file must contain data similar to
221 what is produced by command print-txt. When the optional argu‐
222 ment is omitted, the program reads the hidden text information
223 from the djvused script until reaching an end-of-file or a line
224 containing a single period.
225
226 output-txt
227 Prints a djvused script that reconstructs the hidden text infor‐
228 mation for the selected pages. This script can later be edited
229 and executed by invoking program djvused with option -f.
230
231 print-ant
232 Prints the annotations of the selected component file. The an‐
233 notation data is represented using a simple syntax described
234 later in this document.
235
236 print-merged-ant
237 Merge the annotations stored in the selected component files
238 with the annotations imported from other component files such as
239 the shared annotation component file.. The annotation data is
240 represented using a simple syntax described later in this docu‐
241 ment.
242
243 remove-ant
244 Remove the annotation information from the selected component
245 files. For instance, executing commands select and remove-ant
246 removes all annotation information from the DjVu document.
247
248 set-ant [djvusedantfile]
249 Insert annotations into the selected component file. The op‐
250 tional argument djvusedantfile names a file containing the anno‐
251 tation data. This file must contain data similar to what is
252 produced by command print-ant. When the optional argument is
253 omitted, the program reads the annotation data from the djvused
254 script itself until reaching an end-of-file or a line containing
255 a single period.
256
257 output-ant
258 Print a djvused script that reconstructs the annotation informa‐
259 tion for the selected pages. This script can later be edited
260 and executed by invoking program djvused with option -f.
261
262 print-meta
263 Print the metadata part of the annotations for the selected com‐
264 ponent file. This command displays a subset of the information
265 printed by command print-ant using a different syntax. metadata
266 are organized as key-value pairs. Each printed line contains
267 the key name such as author, title,etc., followed by a tab char‐
268 acter ("\t") and a double-quoted string representing the UTF-8
269 encoded metadata value.
270
271 remove-meta
272 Remove the metadata part of the annotations of the selected com‐
273 ponent files.
274
275 set-meta [djvusedmetafile]
276 Set the metadata part of the annotations of the selected compo‐
277 nent file. The remaining part of the annotations is left un‐
278 changed. The optional argument djvusedmetafile names a file
279 containing the metadata. This file must contain data similar to
280 what is produced by command print-meta. When the optional argu‐
281 ment is omitted, the program reads the annotation data from the
282 djvused script itself until reaching an end-of-file or a line
283 containing a single period.
284
285 print-xmp
286 Print the XMP metadata string contained in the annotation chunk
287 of the selected component file. This command displays in fact a
288 subset of the information printed by command print-ant.
289
290 remove-xmp
291 Removes the XMP tag from the annotation chunk of the selected
292 component file.
293
294 set-xmp [xmpfile]
295 Set the XMP metadata part of the annotations of the selected
296 component file. The remaining part of the annotations is left
297 unchanged. The optional argument xmpfile names a file contain‐
298 ing the XMP metadata in a format similar to that produced by
299 command print-xmp. When the optional argument is omitted, the
300 program reads the XMP annotation data from the djvused script
301 itself until reaching an end-of-file or a line containing a sin‐
302 gle period.
303
304 output-all
305 Print a djvused script that reconstructs both the hidden text
306 and the annotation information for the selected pages. This
307 script can later be edited and executed by invoking program
308 djvused with option -f.
309
310 Outline/bookmarks commands
311 print-outline
312 Print the outline of the document. Nothing is printed if the
313 document contains no outline.
314
315 remove-outline
316 Removes the outline from the document.
317
318 set-outline [djvusedoutlinefile]
319 Insert outline information into the document. The optional ar‐
320 gument djvusedoutlinefile names a file containing the outline
321 information. This file must contain data similar to what is
322 produced by command print-outline. When the optional argument
323 is omitted, the program reads the hidden text information from
324 the djvused script until reaching an end-of-file or a line con‐
325 taining a single period.
326
327 Thumbnail commands
328 set-thumbnails sz
329 Compute thumbnails of size szxsz pixels and insert them into the
330 document. DjVu viewers can later display these thumbnails very
331 efficiently without need to download the data for each page.
332 Typical thumbnail size range from 48 to 128 pixels.
333
334 remove-thumbnails
335 Remove the pre-computed thumbnails from the DjVu document. New
336 thumbnails can then be computed using command set-thumbnails.
337
338
339 Save commands
340 The above commands only modify the memory image of the DjVu document.
341 The following commands provide means to save the modified data into the
342 file system.
343
344 save Save the modified DjVu document back into the input file djvu‐
345 file specified by the arguments of the program djvused. Nothing
346 is done if the DjVu file was not modified. Passing option -s
347 program djvused is equivalent to executing command save before
348 exiting the program.
349
350 save-bundled filename
351 Save the current DjVu document as a bundled multi-page DjVu doc‐
352 ument named filename. A similar capability is offered by pro‐
353 gram djvmcvt.
354
355 save-indirect filename
356 Save the current DjVu document as an indirect multi-page DjVu
357 document. The index file of the indirect document will be named
358 filename. All other files composing the indirect document will
359 be saved into the same directory as the index file. A similar
360 capability is offered by program djvmcvt.
361
362 save-page filename
363 Save the selected component file into DjVu file filename. The
364 selected component file might import data from another component
365 file using the so-called inclusion ( INCL ) chunks. This com‐
366 mand then produces a file with unresolved references to imported
367 data. Such a file should then be made part of a multi-page doc‐
368 ument containing the required data in other component files.
369
370 save-page-with filename
371 Save the selected component file into DjVu file filename. All
372 data imported from other component files is copied into the out‐
373 put file as well. This command always produces a usable DjVu
374 file. On the other hand, collecting several such files into a
375 multi-page document might lead to useless data replication.
376
377
378 Miscellaneous commands
379 help Display a help message listing all commands supported by
380 djvused.
381
382 dump Display the EA IFF 85 structure of the document or of the se‐
383 lected component file. A similar capability is offered by pro‐
384 gram djvudump.
385
386 size Display the width and the height of the selected pages. The di‐
387 mensions of each page are displayed using a syntax suitable for
388 direct insertion into the <EMBED...></EMBED> tags. This command
389 also displays the default page orientation when it is different
390 from zero.
391
392 set-rotation [+-]rot
393 Changes the default orientation of the selected pages. The ori‐
394 entation is expressed as an integer in range 0..3 representing a
395 number of 90 degree counter-clockwise rotations. When the argu‐
396 ment is preceded by a sign + or -, argument rot counts how many
397 additional 90 degree counter-clockwise rotations should be ap‐
398 plied to the page. Otherwise, argument rot represents the de‐
399 sired absolute page orientation. Only DjVu pages can be ro‐
400 tated. Pages represented as a raw IW44 image cannot be rotated.
401
402 set-dpi dpi
403 Sets the resolution of the page image in dots per inche. Argu‐
404 ment dpi should be in range 25..6000.
405
406 set-page-title title
407 Sets a page title for the selected page. When page titles are
408 available, recent versions of the DjVuLibre viewers display
409 these page titles instead of page numbers and also accept them
410 in page selection options. Command ls can be used to see both
411 the page titles and page identifiers. To unset a page title,
412 simply make it equal to the page identifier.
413
414
416 Djvused uses a simple parenthesized syntax to represent both annota‐
417 tions and hidden text.
418
419 * This syntax is the native syntax used by DjVu for storing annota‐
420 tions. Program djvused simply compresses the annotation data using
421 the bzz(1) algorithm.
422
423 * This syntax differs from the native syntax used by DjVu for storing
424 the hidden text. Program djvused performs the translations between
425 the compact binary representation used by DjVu and the easily modi‐
426 fiable parenthesized syntax.
427
428 General syntax
429 Djvused files are ASCII text files. The legal characters in djvused
430 files are the printable ASCII characters and the space, tab, cr, and nl
431 characters. Using other characters has undefined results.
432
433 Djvused files are composed of a sequence of expressions separated by
434 blank characters (space, tab, cr, or nl). There are four kind of ex‐
435 pressions, namely integers, symbols, strings and lists.
436
437 Integers:
438 Integer numbers are represented by one or more digits, with the
439 usual interpretation.
440
441 Symbols:
442 Symbols, or identifiers, are sequences of printable ascii char‐
443 acters representing a name or a keyword. Acceptable characters
444 are the alpha-numeric characters, the underscore "_", the minus
445 character "-", and the hash character "#". Names should not be‐
446 gin with a digit or a minus character.
447
448 Strings:
449 Strings denote an arbitrary sequence of bytes, usually inter‐
450 preted as a sequence of UTF-8 encoded characters. Strings in
451 djvused files are similar to strings in the C language. They
452 are surrounded by double quote characters. Certain sequences of
453 characters starting with a backslash ("\") have a special mean‐
454 ing. A backslash followed by letter "a", "b", "t", "n", "v",
455 "f", "r", "\", and stands for the ascii character BEL(007),
456 BS(008), HT(009), LF(010), VT(011), FF(012), CR(013), BACK‐
457 SLASH(134) and DOUBLEQUOTE(042) respectively. A backslash fol‐
458 lowed by one to three digits stands for the byte whose octal
459 code is expressed by the digits. All other backslash sequences
460 are illegal. All non printable ascii characters must be es‐
461 caped.
462
463 Lists: Lists are sequence of expressions separated by blanks and sur‐
464 rounded by parentheses. All expressions types are acceptable
465 within a list, including sub-lists.
466
467
468 Hidden text syntax
469 The building blocks of the hidden text syntax are lists representing
470 each structural component of the hidden text. Structural components
471 have the following form:
472
473 (type xmin ymin xmax ymax ... )
474
475 The symbol type must be one of page, column, region, para, line, word,
476 or char, listed here by decreasing order of importance. The integers
477 xmin, ymin, xmax, and ymax represent the coordinates of a rectangle in‐
478 dicating the position of the structural component in the page. Coordi‐
479 nates are measured in pixels and have their origin at the bottom left
480 corner of the page. The remaining expressions in the list either is a
481 single string representing the encoded text associated with this struc‐
482 tural component, or is a sequence of structural components with a
483 lesser type.
484
485 The hidden text for each page is simply represented by a single struc‐
486 tural element of type page. Various level of structural information
487 are acceptable. For instance, the page level component might only
488 specify a page level string, or might only provide a list of lines, or
489 might provide a full hierarchy down to the individual characters.
490
491
492 Outline/Bookmark syntax
493 The outline syntax is a single list of the form
494
495 (bookmarks ...)
496
497 The first element of the list is symbol bookmarks. The subsequent ele‐
498 ments are lists representing the toplevel outline entries. Each out‐
499 line entry is represented by a list with the following form:
500
501 (title url ... )
502
503 The string title is the title of the outline entry. The destination
504 string url can be either an arbitrary percent encoded URL, or composed
505 of the hash character ("#") followed by a page name or number, or com‐
506 posed of the question mark character ("?") followed by cgi-style argu‐
507 ments interpreted by the djvu viewer. The remaining expressions in the
508 list describe subentries of this outline entry.
509
510
511 Annotation syntax
512 Annotations are represented by a sequence of annotation expressions.
513 The following annotation expressions are recognized:
514
515 (background color)
516 Specify the color of the viewer area surrounding the DjVu image.
517 Colors are represented with the X11 hexadecimal syntax #RRGGBB.
518 For instance, #000000 is black and #FFFFFF is white.
519
520 (zoom zoomvalue)
521 Specify the initial zoom factor of the image. Argument zoom‐
522 value can be one of stretch, one2one, width, page, or composed
523 of the letter d followed by a number in range 1 to 999 repre‐
524 senting a zoom factor (such as in d300 or d150 for instance.)
525
526 (mode modevalue)
527 Specify the initial display mode of the image. Argument mode‐
528 value is one of color, bw, fore, or back.
529
530 (align horzalign vertalign)
531 Specify how the image should be aligned on the viewer surface.
532 By default the image is located in the center. Argument horza‐
533 lign can be one of left, center, or right. Argument vertalign
534 can be one of top, center, or bottom.
535
536 (maparea url comment area ...)
537 Define an hyper-link for the specified destination.
538
539 Argument url can have one of the following forms:
540
541 href
542 (url href target)
543
544 where href is a string representing the destination and target
545 is a string representing the target frame for the hyper-link, as
546 defined by the HTML anchor tag <A>. The destination string href
547 can be either an arbitrary percent encoded URL, or composed of
548 the hash character ("#") followed by a page name or number, or
549 composed of the question mark character ("?") followed by cgi-
550 style arguments interpreted by the djvu viewer. Page numbers
551 may be prefixed with an optional sign to represent a page dis‐
552 placement. For instance the strings "#-1" and "#+1" can be used
553 to access the previous page and the next page.
554
555 Argument comment is a string that might be displayed by the
556 viewer when the user moves the mouse over the hyper-link.
557
558 Argument area defines the shape and the location of the hyper‐
559 link. The following forms are recognized:
560
561 (rect xmin ymin width height)
562 (oval xmin ymin width height)
563 (poly x0 y0 x1 y1 ... )
564 (text xmin ymin width height)
565 (line x0 y0 x1 y1)
566
567 All parameters are numbers representing coordinates. Coordi‐
568 nates are measured in pixels and have their origin at the bottom
569 left corner of the page.
570
571 The remaining expressions in the maparea list represent the vis‐
572 ual effect associated with the hyper-link.
573
574 A first set of options defines how borders are drawn for rect,
575 oval, polygon, or text hyperlink areas.
576
577 (none)
578 (xor)
579 (border color)
580 (shadow_in [thickness])
581 (shadow_out [thickness])
582 (shadow_ein [thickness])
583 (shadow_eout [thickness])
584
585 where parameter color has syntax #RRGGBB as described above, and
586 parameter thickness is an integer in range 1 to 32. The last
587 four border options are only supported for rect hyperlink areas.
588 Although the border mode defaults to (xor), it is wise to always
589 specify the border mode. Border options do not apply to line
590 areas.
591
592 When a border option is specified, the border becomes visible
593 when the user moves the mouse over the hyperlink. The border may
594 be made always visible by using the following option:
595
596 (border_avis)
597
598 The following two options may be used with rect hyperlink areas.
599 The complete area will be highlighted using the specified color
600 at the specified opacity (0-100, default 50). Some viewers
601 (e.g., djview4) support opacities in range 0-200 with 200 repre‐
602 senting a fully opaque color.
603
604 (hilite color)
605 (opacity op)
606
607 This is often used with an empty URL for simply emphasizing a
608 specific segment of an image.
609
610 The following three options may be used with line areas to spec‐
611 ify an optional ending arrow, the line width and color. The de‐
612 fault is a black line with width 1 and without arrow.
613
614 (arrow)
615 (width w)
616 (lineclr color)
617
618 Finally the following three options can be used with text areas.
619 The default background color is transparent. The default text
620 color is black. The pushpin option indicates that the text is
621 symbolized by a small pushpin icon. Clicking the icon reveals
622 the text.
623
624 (backclr bkcolor)
625 (textclr txtcolor)
626 (pushpin)
627
628 (metadata ... (key value) ... )
629 Define metadata entries. Each entry is identified by a symbol
630 key representing the nature of the meta data entry. The string
631 value represents the value associated with the corresponding
632 key. Two sets of keys are noteworthy: keys borrowed from the
633 BibTex bibliography system, and keys borrowed from the PDF
634 DocInfo metadata. BibTex keys are always expressed in lower‐
635 case, such as year, booktitle, editor, author, etc.. DocInfo
636 keys start with an uppercase letter, such as Title, Author, Sub‐
637 ject, Creator, Produced, Trapped, CreationDate, and ModDate.
638 The values associated with the last two keys should be dates ex‐
639 pressed according to RFC 3339.
640
641
643 The current version of program djvused only supports selecting one com‐
644 ponent file or all component files. There is no way to select only a
645 few component files.
646
647
649 This program was initially written by Léon Bottou <leonb@users.source‐
650 forge.net> and was improved by Yann Le Cun <profshadoko@users.source‐
651 forge.net>, Florin Nicsa, Bill Riemers <docbill@sourceforge.net> and
652 many others.
653
654
656 djvu(1), djvutxt(1), djvmcvt(1), djvudump(1), bzz(1), Emacs djvused
657 front end djvu.el on GNU Elpa repository.
658
659
660
661DjVuLibre-3.5 5/22/2005 DJVUSED(1)