1DJVUSED(1) DjVuLibre-3.5 DJVUSED(1)
2
3
4
6 djvused - Multi-purpose DjVu document editor.
7
8
10 djvused [options] djvufile
11
12
13
15 Program djvused is a powerful command line tool for manipulating multi-
16 page documents, creating or editing annotation chunks, creating or
17 editing hidden text layers, pre-computing thumbnail images, and more.
18 The program first reads the DjVu document djvufile and executes a num‐
19 ber of djvused commands.
20
21 Djvused commands can be read from a specific file (when option -f is
22 specified), read from the command line (when option -e is specified),
23 or read from the standard input (the default).
24
25
27 -v Cause djvused to print a command line prompt before reading com‐
28 mands and a brief message describing how each command was exe‐
29 cuted. This option is very useful for debugging djvused scripts
30 and also for interactively entering djvused commands on the
31 standard input.
32
33 -f scriptfile
34 Cause djvused to read commands from file scriptfile.
35
36 -e command
37 Cause djvused to execute the commands specified by the option
38 argument commands. It is advisable to surround the djvused com‐
39 mands by single quotes in order to prevent unwanted shell expan‐
40 sion.
41
42 -s Cause djvused to save the file djvufile after executing the
43 specified commands. This is similar to executing command save
44 immediately before terminating the program.
45
46 -u Cause djvused to print hidden text and annotations as UTF-8
47 instead of encoding non-ASCII characters with octal escape
48 sequences for maximal portability. This option is convenient for
49 manually editing or viewing the djvused output. This option
50 also causes the emission of an UTF-8 BOM under Windows.
51
52 -n Cause djvused to disregard save commands. This is useful for
53 debugging djvused scripts without overwriting files on your
54 disk.
55
56
58 There are many ways to use program djvused. The following examples
59 illustrate some common uses of this program.
60
61
62 Obtaining the size of a page
63 Command size outputs the width and height of the selected pages using a
64 HTML friendly syntax. For instance, the following command prints the
65 size of page 3 of document myfile.djvu.
66
67 djvused myfile.djvu -e 'select 3; size'
68
69 Extracting the hidden text
70 Command print-pure-txt outputs the text associated with a page or a
71 document. For instance, the following shell command outputs the text
72 for the entire document. Lines and pages are delimited by the usual
73 control characters.
74
75 djvused myfile.djvu -e 'print-pure-txt'
76
77 Command print-txt produces a more extensive output describing the
78 structure and the location of the text components. The syntax of this
79 output is described later in this man page. For instance, the follow‐
80 ing shell command outputs extended text information for page 3 of docu‐
81 ment myfile.djvu.
82
83 djvused myfile.djvu -e 'select 3; print-txt'
84
85 Extracting the annotations
86 Annotation data can be extracted using command print-ant. The syntax
87 of the annotation data is described later in this man page. For
88 instance, the following shell command outputs the annotation data for
89 the first page of document myfile.djvu.
90
91 djvused myfile.djvu -e 'select 1; print-ant'
92
93 Command print-ant only prints the annotations stored in the selected
94 component file. Command print-merged-ant also retrieves annotations
95 from all the component files referenced by the current page (using INCL
96 chunks) and prints the merged information.
97
98
99 Dumping/restoring annotations and text
100 Three commands, output-txt, output-ant, and output-all, produce djvused
101 scripts. For instance, the following shell command produces a djvused
102 script, myfile.dsed, that recreates all the text and annotation data in
103 document myfile.djvu.
104
105 djvused myfile.djvu -e 'output-all' > myfile.dsed
106
107 Script myfile.dsed is a text file that can be easily edited. The fol‐
108 lowing shell command then recreates the text and annotation information
109 in file myfile.djvu.
110
111 djvused myfile.djvu -f myfile.dsed -s
112
113
114 Extracting a page
115 Both commands save-page and save-page-with create a DjVu file repre‐
116 senting the selected component file of a document. The following shell
117 command, for instance, creates a file p05.djvu containing page 5 of
118 document myfile.djvu.
119
120 djvused myfile.djvu -e 'select 5; save-page p05.djvu'
121
122 Each page of a document might import data from another component file
123 using the so-called inclusion ( INCL ) chunks. Command save-page then
124 produces a file with unresolved references to imported data. Such a
125 file should then be made part of a multi-page document containing the
126 required data in other component files. On the other hand, command
127 save-page-with copies all the imported data into the output file. This
128 file is directly usable. Yet collecting several such files into a
129 multi-page document might lead to useless data replication.
130
131
132 Pre-computing thumbnails
133 Commands set-thumbnails constructs thumbnails that can be later dis‐
134 played by DjVu viewers. The following shell command, for instance,
135 computes thumbnails of size 64x64 pixels for all pages of file
136 myfile.djvu.
137
138 djvused myfile.djvu -e 'set-thumbnails 64' -s
139
140
142 Command lines might contain zero, one, or more djvused commands and an
143 optional comment. Multiple djvused commands must be separated by a
144 semicolon character ';'. Comments are introduced by the '#' character
145 and extend until the end of the command line.
146
147
148 Selection commands
149 Multi-page DjVu documents are composed of a number of component files.
150 Most component files describe a specific page of a document. Some com‐
151 ponent files contain information shared by several pages such as shared
152 image data, shared annotations or thumbnails. Many djvused commands
153 operate on selected component files. All component files are initially
154 selected. The following commands are useful for changing the selec‐
155 tion.
156
157 n Print the total number of pages in the document.
158
159 ls List all component files in the document. Each line contains an
160 optional page number, a letter describing the component file
161 type, the size of the component file, and identifier of the com‐
162 ponent file. Component file type letters P, I, A, and T respec‐
163 tively stand for page data, shared image data, shared annotation
164 data, and thumbnail data. Page numbers are only listed for com‐
165 ponent files containing page data. When it is set, the optional
166 page title (see command set-page-title below) is displayed after
167 the component file identifier.
168
169 select [fileid]
170 Select the component file identified by argument fileid. Argu‐
171 ment fileid must be either a page number or a component file
172 identifier. The select command selects all component files when
173 the argument fileid is omitted.
174
175 select-shared-ant
176 Select a component file containing shared annotations. Only one
177 such component file is supported by the current DjVu software.
178 This component file usually contains annotations pertaining to
179 the whole document as opposed to specific pages. An error mes‐
180 sage is displayed if there is no such component file.
181
182 create-shared-ant
183 Create and select a component file containing shared annota‐
184 tions. This command only selects the shared annotation compo‐
185 nent file if such a component file already exists. Otherwise it
186 creates a new shared annotation component file and makes sure
187 that it is imported by all pages in the document.
188
189 showsel
190 Shows the currently selected component files with the same for‐
191 mat as command ls.
192
193
194 Text and annotation commands
195 print-pure-txt
196 Print the text stored in the hidden text layer of the selected
197 pages. A similar capability is offered by program djvutxt.
198 Structural information is sometimes represented by control char‐
199 acters. Text from different pages is delimited by form feed
200 characters ("\f"). Lines are delimited by newline characters
201 ("\n"). Columns, regions, and paragraphs are sometimes delim‐
202 ited by vertical tab ("\013"), group separators ("\035") and
203 unit separators ("\037") respectively.
204
205 print-txt
206 Prints extensive hidden text information for the selected pages.
207 This information describes the structure of the text on the doc‐
208 ument page and locates the structural elements in the page
209 image. The syntax of this output is described later in this man
210 page.
211
212 remove-txt
213 Remove the hidden text information from the selected component
214 files. For instance, executing commands select and remove-txt
215 removes all hidden text information from the DjVu document.
216
217 set-txt [djvusedtxtfile]
218 Insert hidden text information into the selected pages. The
219 optional argument djvusedtxtfile names a file containing the
220 hidden text information. This file must contain data similar to
221 what is produced by command print-txt. When the optional argu‐
222 ment is omitted, the program reads the hidden text information
223 from the djvused script until reaching an end-of-file or a line
224 containing a single period.
225
226 output-txt
227 Prints a djvused script that reconstructs the hidden text infor‐
228 mation for the selected pages. This script can later be edited
229 and executed by invoking program djvused with option -f.
230
231 print-ant
232 Prints the annotations of the selected component file. The
233 annotation data is represented using a simple syntax described
234 later in this document.
235
236 print-merged-ant
237 Merge the annotations stored in the selected component files
238 with the annotations imported from other component files such as
239 the shared annotation component file.. The annotation data is
240 represented using a simple syntax described later in this docu‐
241 ment.
242
243 remove-ant
244 Remove the annotation information from the selected component
245 files. For instance, executing commands select and remove-ant
246 removes all annotation information from the DjVu document.
247
248 set-ant [djvusedantfile]
249 Insert annotations into the selected component file. The
250 optional argument djvusedantfile names a file containing the
251 annotation data. This file must contain data similar to what is
252 produced by command print-ant. When the optional argument is
253 omitted, the program reads the annotation data from the djvused
254 script itself until reaching an end-of-file or a line containing
255 a single period.
256
257 output-ant
258 Print a djvused script that reconstructs the annotation informa‐
259 tion for the selected pages. This script can later be edited
260 and executed by invoking program djvused with option -f.
261
262 print-meta
263 Print the metadata part of the annotations for the selected com‐
264 ponent file. This command displays a subset of the information
265 printed by command print-ant using a different syntax. metadata
266 are organized as key-value pairs. Each printed line contains
267 the key name such as author, title,etc., followed by a tab char‐
268 acter ("\t") and a double-quoted string representing the UTF-8
269 encoded metadata value.
270
271 remove-meta
272 Remove the metadata part of the annotations of the selected com‐
273 ponent files.
274
275 set-meta [djvusedmetafile]
276 Set the metadata part of the annotations of the selected compo‐
277 nent file. The remaining part of the annotations is left
278 unchanged. The optional argument djvusedmetafile names a file
279 containing the metadata. This file must contain data similar to
280 what is produced by command print-meta. When the optional argu‐
281 ment is omitted, the program reads the annotation data from the
282 djvused script itself until reaching an end-of-file or a line
283 containing a single period.
284
285 print-xmp
286 Print the XMP metadata string contained in the annotation chunk
287 of the selected component file. This command displays in fact a
288 subset of the information printed by command print-ant.
289
290 remove-xmp
291 Removes the XMP tag from the annotation chunk of the selected
292 component file.
293
294 set-xmp [xmpfile]
295 Set the XMP metadata part of the annotations of the selected
296 component file. The remaining part of the annotations is left
297 unchanged. The optional argument xmpfile names a file contain‐
298 ing the XMP metadata in a format similar to that produced by
299 command print-xmp. When the optional argument is omitted, the
300 program reads the XMP annotation data from the djvused script
301 itself until reaching an end-of-file or a line containing a sin‐
302 gle period.
303
304 output-all
305 Print a djvused script that reconstructs both the hidden text
306 and the annotation information for the selected pages. This
307 script can later be edited and executed by invoking program
308 djvused with option -f.
309
310 Outline/bookmarks commands
311 print-outline
312 Print the outline of the document. Nothing is printed if the
313 document contains no outline.
314
315 remove-outline
316 Removes the outline from the document.
317
318 set-outline [djvusedoutlinefile]
319 Insert outline information into the document. The optional
320 argument djvusedoutlinefile names a file containing the outline
321 information. This file must contain data similar to what is
322 produced by command print-outline. When the optional argument
323 is omitted, the program reads the hidden text information from
324 the djvused script until reaching an end-of-file or a line con‐
325 taining a single period.
326
327 Thumbnail commands
328 set-thumbnails sz
329 Compute thumbnails of size szxsz pixels and insert them into the
330 document. DjVu viewers can later display these thumbnails very
331 efficiently without need to download the data for each page.
332 Typical thumbnail size range from 48 to 128 pixels.
333
334 remove-thumbnails
335 Remove the pre-computed thumbnails from the DjVu document. New
336 thumbnails can then be computed using command set-thumbnails.
337
338
339 Save commands
340 The above commands only modify the memory image of the DjVu document.
341 The following commands provide means to save the modified data into the
342 file system.
343
344 save Save the modified DjVu document back into the input file djvu‐
345 file specified by the arguments of the program djvused. Nothing
346 is done if the DjVu file was not modified. Passing option -s
347 program djvused is equivalent to executing command save before
348 exiting the program.
349
350 save-bundled filename
351 Save the current DjVu document as a bundled multi-page DjVu doc‐
352 ument named filename. A similar capability is offered by pro‐
353 gram djvmcvt.
354
355 save-indirect filename
356 Save the current DjVu document as an indirect multi-page DjVu
357 document. The index file of the indirect document will be named
358 filename. All other files composing the indirect document will
359 be saved into the same directory as the index file. A similar
360 capability is offered by program djvmcvt.
361
362 save-page filename
363 Save the selected component file into DjVu file filename. The
364 selected component file might import data from another component
365 file using the so-called inclusion ( INCL ) chunks. This com‐
366 mand then produces a file with unresolved references to imported
367 data. Such a file should then be made part of a multi-page doc‐
368 ument containing the required data in other component files.
369
370 save-page-with filename
371 Save the selected component file into DjVu file filename. All
372 data imported from other component files is copied into the out‐
373 put file as well. This command always produces a usable DjVu
374 file. On the other hand, collecting several such files into a
375 multi-page document might lead to useless data replication.
376
377
378 Miscellaneous commands
379 help Display a help message listing all commands supported by
380 djvused.
381
382 dump Display the EA IFF 85 structure of the document or of the
383 selected component file. A similar capability is offered by
384 program djvudump.
385
386 size Display the width and the height of the selected pages. The
387 dimensions of each page are displayed using a syntax suitable
388 for direct insertion into the <EMBED...></EMBED> tags. This com‐
389 mand also displays the default page orientation when it is dif‐
390 ferent from zero.
391
392 set-rotation [+-]rot
393 Changes the default orientation of the selected pages. The ori‐
394 entation is expressed as an integer in range 0..3 representing a
395 number of 90 degree counter-clockwise rotations. When the argu‐
396 ment is preceded by a sign + or -, argument rot counts how many
397 additional 90 degree counter-clockwise rotations should be
398 applied to the page. Otherwise, argument rot represents the
399 desired absolute page orientation. Only DjVu pages can be
400 rotated. Pages represented as a raw IW44 image cannot be
401 rotated.
402
403 set-dpi dpi
404 Sets the resolution of the page image in dots per inche. Argu‐
405 ment dpi should be in range 25..6000.
406
407 set-page-title title
408 Sets a page title for the selected page. When page titles are
409 available, recent versions of the DjVuLibre viewers display
410 these page titles instead of page numbers and also accept them
411 in page selection options. Command ls can be used to see both
412 the page titles and page identifiers. To unset a page title,
413 simply make it equal to the page identifier.
414
415
417 Djvused uses a simple parenthesized syntax to represent both annota‐
418 tions and hidden text.
419
420 * This syntax is the native syntax used by DjVu for storing annota‐
421 tions. Program djvused simply compresses the annotation data using
422 the bzz(1) algorithm.
423
424 * This syntax differs from the native syntax used by DjVu for storing
425 the hidden text. Program djvused performs the translations between
426 the compact binary representation used by DjVu and the easily modi‐
427 fiable parenthesized syntax.
428
429 General syntax
430 Djvused files are ASCII text files. The legal characters in djvused
431 files are the printable ASCII characters and the space, tab, cr, and nl
432 characters. Using other characters has undefined results.
433
434 Djvused files are composed of a sequence of expressions separated by
435 blank characters (space, tab, cr, or nl). There are four kind of
436 expressions, namely integers, symbols, strings and lists.
437
438 Integers:
439 Integer numbers are represented by one or more digits, with the
440 usual interpretation.
441
442 Symbols:
443 Symbols, or identifiers, are sequences of printable ascii char‐
444 acters representing a name or a keyword. Acceptable characters
445 are the alpha-numeric characters, the underscore "_", the minus
446 character "-", and the hash character "#". Names should not
447 begin with a digit or a minus character.
448
449 Strings:
450 Strings denote an arbitrary sequence of bytes, usually inter‐
451 preted as a sequence of UTF-8 encoded characters. Strings in
452 djvused files are similar to strings in the C language. They
453 are surrounded by double quote characters. Certain sequences of
454 characters starting with a backslash ("\") have a special mean‐
455 ing. A backslash followed by letter "a", "b", "t", "n", "v",
456 "f", "r", "\", and stands for the ascii character BEL(007),
457 BS(008), HT(009), LF(010), VT(011), FF(012), CR(013), BACK‐
458 SLASH(134) and DOUBLEQUOTE(042) respectively. A backslash fol‐
459 lowed by one to three digits stands for the byte whose octal
460 code is expressed by the digits. All other backslash sequences
461 are illegal. All non printable ascii characters must be
462 escaped.
463
464 Lists: Lists are sequence of expressions separated by blanks and sur‐
465 rounded by parentheses. All expressions types are acceptable
466 within a list, including sub-lists.
467
468
469 Hidden text syntax
470 The building blocks of the hidden text syntax are lists representing
471 each structural component of the hidden text. Structural components
472 have the following form:
473
474 (type xmin ymin xmax ymax ... )
475
476 The symbol type must be one of page, column, region, para, line, word,
477 or char, listed here by decreasing order of importance. The integers
478 xmin, ymin, xmax, and ymax represent the coordinates of a rectangle
479 indicating the position of the structural component in the page. Coor‐
480 dinates are measured in pixels and have their origin at the bottom left
481 corner of the page. The remaining expressions in the list either is a
482 single string representing the encoded text associated with this struc‐
483 tural component, or is a sequence of structural components with a
484 lesser type.
485
486 The hidden text for each page is simply represented by a single struc‐
487 tural element of type page. Various level of structural information
488 are acceptable. For instance, the page level component might only
489 specify a page level string, or might only provide a list of lines, or
490 might provide a full hierarchy down to the individual characters.
491
492
493 Outline/Bookmark syntax
494 The outline syntax is a single list of the form
495
496 (bookmarks ...)
497
498 The first element of the list is symbol bookmarks. The subsequent ele‐
499 ments are lists representing the toplevel outline entries. Each out‐
500 line entry is represented by a list with the following form:
501
502 (title url ... )
503
504 The string title is the title of the outline entry. The destination
505 string url can be either an arbitrary percent encoded URL, or composed
506 of the hash character ("#") followed by a page name or number, or com‐
507 posed of the question mark character ("?") followed by cgi-style argu‐
508 ments interpreted by the djvu viewer. The remaining expressions in the
509 list describe subentries of this outline entry.
510
511
512 Annotation syntax
513 Annotations are represented by a sequence of annotation expressions.
514 The following annotation expressions are recognized:
515
516 (background color)
517 Specify the color of the viewer area surrounding the DjVu image.
518 Colors are represented with the X11 hexadecimal syntax #RRGGBB.
519 For instance, #000000 is black and #FFFFFF is white.
520
521 (zoom zoomvalue)
522 Specify the initial zoom factor of the image. Argument zoom‐
523 value can be one of stretch, one2one, width, page, or composed
524 of the letter d followed by a number in range 1 to 999 repre‐
525 senting a zoom factor (such as in d300 or d150 for instance.)
526
527 (mode modevalue)
528 Specify the initial display mode of the image. Argument mode‐
529 value is one of color, bw, fore, or back.
530
531 (align horzalign vertalign)
532 Specify how the image should be aligned on the viewer surface.
533 By default the image is located in the center. Argument horza‐
534 lign can be one of left, center, or right. Argument vertalign
535 can be one of top, center, or bottom.
536
537 (maparea url comment area ...)
538 Define an hyper-link for the specified destination.
539
540 Argument url can have one of the following forms:
541
542 href
543 (url href target)
544
545 where href is a string representing the destination and target
546 is a string representing the target frame for the hyper-link, as
547 defined by the HTML anchor tag <A>. The destination string href
548 can be either an arbitrary percent encoded URL, or composed of
549 the hash character ("#") followed by a page name or number, or
550 composed of the question mark character ("?") followed by cgi-
551 style arguments interpreted by the djvu viewer. Page numbers
552 may be prefixed with an optional sign to represent a page dis‐
553 placement. For instance the strings "#-1" and "#+1" can be used
554 to access the previous page and the next page.
555
556 Argument comment is a string that might be displayed by the
557 viewer when the user moves the mouse over the hyper-link.
558
559 Argument area defines the shape and the location of the hyper‐
560 link. The following forms are recognized:
561
562 (rect xmin ymin width height)
563 (oval xmin ymin width height)
564 (poly x0 y0 x1 y1 ... )
565 (text xmin ymin width height)
566 (line x0 y0 x1 y1)
567
568 All parameters are numbers representing coordinates. Coordi‐
569 nates are measured in pixels and have their origin at the bottom
570 left corner of the page.
571
572 The remaining expressions in the maparea list represent the vis‐
573 ual effect associated with the hyper-link.
574
575 A first set of options defines how borders are drawn for rect,
576 oval, polygon, or text hyperlink areas.
577
578 (none)
579 (xor)
580 (border color)
581 (shadow_in [thickness])
582 (shadow_out [thickness])
583 (shadow_ein [thickness])
584 (shadow_eout [thickness])
585
586 where parameter color has syntax #RRGGBB as described above, and
587 parameter thickness is an integer in range 1 to 32. The last
588 four border options are only supported for rect hyperlink areas.
589 Although the border mode defaults to (xor), it is wise to always
590 specify the border mode. Border options do not apply to line
591 areas.
592
593 When a border option is specified, the border becomes visible
594 when the user moves the mouse over the hyperlink. The border may
595 be made always visible by using the following option:
596
597 (border_avis)
598
599 The following two options may be used with rect hyperlink areas.
600 The complete area will be highlighted using the specified color
601 at the specified opacity (0-100, default 50).
602
603 (hilite color)
604 (opacity op)
605
606 This is often used with an empty URL for simply emphasizing a
607 specific segment of an image.
608
609 The following three options may be used with line areas to spec‐
610 ify an optional ending arrow, the line width and color. The
611 default is a black line with width 1 and without arrow.
612
613 (arrow)
614 (width w)
615 (lineclr color)
616
617 Finally the following three options can be used with text areas.
618 The default background color is transparent. The default text
619 color is black. The pushpin option indicates that the text is
620 symbolized by a small pushpin icon. Clicking the icon reveals
621 the text.
622
623 (backclr bkcolor)
624 (textclr txtcolor)
625 (pushpin)
626
627 (metadata ... (key value) ... )
628 Define metadata entries. Each entry is identified by a symbol
629 key representing the nature of the meta data entry. The string
630 value represents the value associated with the corresponding
631 key. Two sets of keys are noteworthy: keys borrowed from the
632 BibTex bibliography system, and keys borrowed from the PDF
633 DocInfo metadata. BibTex keys are always expressed in lower‐
634 case, such as year, booktitle, editor, author, etc.. DocInfo
635 keys start with an uppercase letter, such as Title, Author, Sub‐
636 ject, Creator, Produced, Trapped, CreationDate, and ModDate.
637 The values associated with the last two keys should be dates
638 expressed according to RFC 3339.
639
640
642 The current version of program djvused only supports selecting one com‐
643 ponent file or all component files. There is no way to select only a
644 few component files.
645
646
648 This program was initially written by Léon Bottou <leonb@users.source‐
649 forge.net> and was improved by Yann Le Cun <profshadoko@users.source‐
650 forge.net>, Florin Nicsa, Bill Riemers <docbill@sourceforge.net> and
651 many others.
652
653
655 djvu(1), djvutxt(1), djvmcvt(1), djvudump(1), bzz(1), Emacs djvused
656 front end djvu.el on GNU Elpa repository.
657
658
659
660DjVuLibre-3.5 5/22/2005 DJVUSED(1)