1DJVUSED(1) DjVuLibre-3.5 DJVUSED(1)
2
3
4
6 djvused - Multi-purpose DjVu document editor.
7
8
10 djvused [options] djvufile
11
12
13
15 Program djvused is a powerful command line tool for manipulating multi-
16 page documents, creating or editing annotation chunks, creating or
17 editing hidden text layers, pre-computing thumbnail images, and more.
18 The program first reads the DjVu document djvufile and executes a num‐
19 ber of djvused commands.
20
21 Djvused commands can be read from a specific file (when option -f is
22 specified), read from the command line (when option -e is specified),
23 or read from the standard input (the default).
24
25
27 -v Cause djvused to print a command line prompt before reading com‐
28 mands and a brief message describing how each command was exe‐
29 cuted. This option is very useful for debugging djvused scripts
30 and also for interactively entering djvused commands on the
31 standard input.
32
33 -f scriptfile
34 Cause djvused to read commands from file scriptfile.
35
36 -e command
37 Cause djvused to execute the commands specified by the option
38 argument commands. It is advisable to surround the djvused com‐
39 mands by single quotes in order to prevent unwanted shell expan‐
40 sion.
41
42 -s Cause djvused to save the file djvufile after executing the
43 specified commands. This is similar to executing command save
44 immediately before terminating the program.
45
46 -n Cause djvused to disregard save commands. This is useful for
47 debugging djvused scripts without overwriting files on your
48 disk.
49
50
52 There are many ways to use program djvused. The following examples
53 illustrate some common uses of this program.
54
55
56 Obtaining the size of a page
57 Command size outputs the width and height of the selected pages using a
58 HTML friendly syntax. For instance, the following command prints the
59 size of page 3 of document myfile.djvu.
60
61 djvused myfile.djvu -e 'select 3; size'
62
63 Extracting the hidden text
64 Command print-pure-txt outputs the text associated with a page or a
65 document. For instance, the following shell command outputs the text
66 for the entire document. Lines and pages are delimited by the usual
67 control characters.
68
69 djvused myfile.djvu -e 'print-pure-txt'
70
71 Command print-txt produces a more extensive output describing the
72 structure and the location of the text components. The syntax of this
73 output is described later in this man page. For instance, the follow‐
74 ing shell command outputs extended text information for page 3 of docu‐
75 ment myfile.djvu.
76
77 djvused myfile.djvu -e 'select 3; print-txt'
78
79 Extracting the annotations
80 Annotation data can be extracted using command print-ant. The syntax
81 of the annotation data is described later in this man page. For
82 instance, the following shell command outputs the annotation data for
83 the first page of document myfile.djvu.
84
85 djvused myfile.djvu -e 'select 1; print-ant'
86
87 Command print-ant only prints the annotations stored in the selected
88 component file. Command print-merged-ant also retrieves annotations
89 from all the component files referenced by the current page (using INCL
90 chunks) and prints the merged information.
91
92
93 Dumping/restoring annotations and text
94 Three commands, output-txt, output-ant, and output-all, produce djvused
95 scripts. For instance, the following shell command produces a djvused
96 script, myfile.dsed, that recreates all the text and annotation data in
97 document myfile.djvu.
98
99 djvused myfile.djvu -e 'output-all' > myfile.dsed
100
101 Script myfile.dsed is a text file that can be easily edited. The fol‐
102 lowing shell command then recreates the text and annotation information
103 in file myfile.djvu.
104
105 djvused myfile.djvu -f myfile.dsed -s
106
107
108 Extracting a page
109 Both commands save-page and save-page-with create a DjVu file repre‐
110 senting the selected component file of a document. The following shell
111 command, for instance, creates a file p05.djvu containing page 5 of
112 document myfile.djvu.
113
114 djvused myfile.djvu -e 'select 5; save-page p05.djvu'
115
116 Each page of a document might import data from another component file
117 using the so-called inclusion ( INCL ) chunks. Command save-page then
118 produces a file with unresolved references to imported data. Such a
119 file should then be made part of a multi-page document containing the
120 required data in other component files. On the other hand, command
121 save-page-with copies all the imported data into the output file. This
122 file is directly usable. Yet collecting several such files into a
123 multi-page document might lead to useless data replication.
124
125
126 Pre-computing thumbnails
127 Commands set-thumbnails constructs thumbnails that can be later dis‐
128 played by DjVu viewers. The following shell command, for instance,
129 computes thumbnails of size 64x64 pixels for all pages of file
130 myfile.djvu.
131
132 djvused myfile.djvu -e 'set-thumbnails 64' -s
133
134
136 Command lines might contain zero, one, or more djvused commands and an
137 optional comment. Multiple djvused commands must be separated by a
138 semicolon character ';'. Comments are introduced by the '#' character
139 and extend until the end of the command line.
140
141
142 Selection commands
143 Multi-page DjVu documents are composed of a number of component files.
144 Most component files describe a specific page of a document. Some com‐
145 ponent files contain information shared by several pages such as shared
146 image data, shared annotations or thumbnails. Many djvused commands
147 operate on selected component files. All component files are initially
148 selected. The following commands are useful for changing the selec‐
149 tion.
150
151 n Print the total number of pages in the document.
152
153 ls List all component files in the document. Each line contains an
154 optional page number, a letter describing the component file
155 type, the size of the component file, and identifier of the com‐
156 ponent file. Component file type letters P, I, A, and T respec‐
157 tively stand for page data, shared image data, shared annotation
158 data, and thumbnail data. Page numbers are only listed for com‐
159 ponent files containing page data. When it is set, the optional
160 page title (see command set-page-title below) is displayed after
161 the component file identifier.
162
163 select [fileid]
164 Select the component file identified by argument fileid. Argu‐
165 ment fileid must be either a page number or a component file
166 identifier. The select command selects all component files when
167 the argument fileid is omitted.
168
169 select-shared-ant
170 Select a component file containing shared annotations. Only one
171 such component file is supported by the current DjVu software.
172 This component file usually contains annotations pertaining to
173 the whole document as opposed to specific pages. An error mes‐
174 sage is displayed if there is no such component file.
175
176 create-shared-ant
177 Create and select a component file containing shared annota‐
178 tions. This command only selects the shared annotation compo‐
179 nent file if such a component file already exists. Otherwise it
180 creates a new shared annotation component file and makes sure
181 that it is imported by all pages in the document.
182
183
184 Text and annotation commands
185 print-pure-txt
186 Print the text stored in the hidden text layer of the selected
187 pages. A similar capability is offered by program djvutxt.
188 Structural information is sometimes represented by control char‐
189 acters. Text from different pages is delimited by form feed
190 characters ("\f"). Lines are delimited by newline characters
191 ("\n"). Columns, regions, and paragraphs are sometimes delim‐
192 ited by vertical tab ("\013"), group separators ("\035") and
193 unit separators ("\037") respectively.
194
195 print-txt
196 Prints extensive hidden text information for the selected pages.
197 This information describes the structure of the text on the doc‐
198 ument page and locates the structural elements in the page
199 image. The syntax of this output is described later in this man
200 page.
201
202 remove-txt
203 Remove the hidden text information from the selected component
204 files. For instance, executing commands select and remove-txt
205 removes all hidden text information from the DjVu document.
206
207 set-txt [djvusedtxtfile]
208 Insert hidden text information into the selected pages. The
209 optional argument djvusedtxtfile names a file containing the
210 hidden text information. This file must contain data similar to
211 what is produced by command print-txt. When the optional argu‐
212 ment is omitted, the program reads the hidden text information
213 from the djvused script until reaching an end-of-file or a line
214 containing a single period.
215
216 output-txt
217 Prints a djvused script that reconstructs the hidden text infor‐
218 mation for the selected pages. This script can later be edited
219 and executed by invoking program djvused with option -f.
220
221 print-ant
222 Prints the annotations of the selected component file. The
223 annotation data is represented using a simple syntax described
224 later in this document.
225
226 print-merged-ant
227 Merge the annotations stored in the selected component files
228 with the annotations imported from other component files such as
229 the shared annotation component file.. The annotation data is
230 represented using a simple syntax described later in this docu‐
231 ment.
232
233 remove-ant
234 Remove the annotation information from the selected component
235 files. For instance, executing commands select and remove-ant
236 removes all annotation information from the DjVu document.
237
238 set-ant [djvusedantfile]
239 Insert annotations into the selected component file. The
240 optional argument djvusedantfile names a file containing the
241 annotation data. This file must contain data similar to what is
242 produced by command print-ant. When the optional argument is
243 omitted, the program reads the annotation data from the djvused
244 script itself until reaching an end-of-file or a line containing
245 a single period.
246
247 output-ant
248 Print a djvused script that reconstructs the annotation informa‐
249 tion for the selected pages. This script can later be edited
250 and executed by invoking program djvused with option -f.
251
252 print-meta
253 Print the meta-data part of the annotations for the selected
254 component file. This command displays a subset of the informa‐
255 tion printed by command print-ant using a different syntax.
256 Meta-data are organized as key-value pairs. Each printed line
257 contains the key name such as author, title,etc., followed by a
258 tab character ("\t") and a double-quoted string representing the
259 UTF-8 encoded meta-data value.
260
261 set-meta [djvusedmetafile]
262 Set the meta-data part of the annotations of the selected compo‐
263 nent file. The remaining part of the annotations is left
264 unchanged. The optional argument djvusedmetafile names a file
265 containing the meta-data. This file must contain data similar
266 to what is produced by command print-meta. When the optional
267 argument is omitted, the program reads the annotation data from
268 the djvused script itself until reaching an end-of-file or a
269 line containing a single period.
270
271 output-all
272 Print a djvused script that reconstructs both the hidden text
273 and the annotation information for the selected pages. This
274 script can later be edited and executed by invoking program
275 djvused with option -f.
276
277 Outline/bookmarks commands
278 print-outline
279 Print the outline of the document. Nothing is printed if the
280 document contains no outline.
281
282 set-outline [djvusedoutlinefile]
283 Insert outline information into the document. The optional
284 argument djvusedoutlinefile names a file containing the outline
285 information. This file must contain data similar to what is
286 produced by command print-outline. When the optional argument
287 is omitted, the program reads the hidden text information from
288 the djvused script until reaching an end-of-file or a line con‐
289 taining a single period.
290
291 Thumbnail commands
292 set-thumbnails sz
293 Compute thumbnails of size szxsz pixels and insert them into the
294 document. DjVu viewers can later display these thumbnails very
295 efficiently without need to download the data for each page.
296 Typical thumbnail size range from 48 to 128 pixels.
297
298 remove-thumbnails
299 Remove the pre-computed thumbnails from the DjVu document. New
300 thumbnails can then be computed using command set-thumbnails.
301
302
303 Save commands
304 The above commands only modify the memory image of the DjVu document.
305 The following commands provide means to save the modified data into the
306 file system.
307
308 save Save the modified DjVu document back into the input file djvu‐
309 file specified by the arguments of the program djvused. Nothing
310 is done if the DjVu file was not modified. Passing option -s
311 program djvused is equivalent to executing command save before
312 exiting the program.
313
314 save-bundled filename
315 Save the current DjVu document as a bundled multi-page DjVu doc‐
316 ument named filename. A similar capability is offered by pro‐
317 gram djvmcvt.
318
319 save-indirect filename
320 Save the current DjVu document as an indirect multi-page DjVu
321 document. The index file of the indirect document will be named
322 filename. All other files composing the indirect document will
323 be saved into the same directory as the index file. A similar
324 capability is offered by program djvmcvt.
325
326 save-page filename
327 Save the selected component file into DjVu file filename. The
328 selected component file might import data from another component
329 file using the so-called inclusion ( INCL ) chunks. This com‐
330 mand then produces a file with unresolved references to imported
331 data. Such a file should then be made part of a multi-page doc‐
332 ument containing the required data in other component files.
333
334 save-page-with filename
335 Save the selected component file into DjVu file filename. All
336 data imported from other component files is copied into the out‐
337 put file as well. This command always produces a usable DjVu
338 file. On the other hand, collecting several such files into a
339 multi-page document might lead to useless data replication.
340
341
342 Miscellaneous commands
343 help Display a help message listing all commands supported by
344 djvused.
345
346 dump Display the EA IFF 85 structure of the document or of the
347 selected component file. A similar capability is offered by
348 program djvudump.
349
350 size Display the width and the height of the selected pages. The
351 dimensions of each page are displayed using a syntax suitable
352 for direct insertion into the <EMBED...></EMBED> tags.
353
354 set-page-title title
355 Sets a page title for the selected page. When page titles are
356 available, recent versions of the DjVuLibre viewers display
357 these page titles instead of page numbers and also accept them
358 in page selection options. Command ls can be used to see both
359 the page titles and page identifiers. To unset a page title,
360 simply make it equal to the page identifier.
361
362
364 Djvused uses a simple parenthesized syntax to represent both annota‐
365 tions and hidden text.
366
367 * This syntax is the native syntax used by DjVu for storing annota‐
368 tions. Program djvused simply compresses the annotation data using
369 the bzz(1) algorithm.
370
371 * This syntax differs from the native syntax used by DjVu for storing
372 the hidden text. Program djvused performs the translations between
373 the compact binary representation used by DjVu and the easily modi‐
374 fiable parenthesized syntax.
375
376 General syntax
377 Djvused files are ASCII text files. The legal characters in djvused
378 files are the printable ASCII characters and the space, tab, cr, and nl
379 characters. Using other characters has undefined results.
380
381 Djvused files are composed of a sequence of expressions separated by
382 blank characters (space, tab, cr, or nl). There are four kind of
383 expressions, namely integers, symbols, strings and lists.
384
385 Integers:
386 Integer numbers are represented by one or more digits, with the
387 usual interpretation.
388
389 Symbols:
390 Symbols, or identifiers, are sequences of printable ascii char‐
391 acters representing a name or a keyword. Acceptable characters
392 are the alpha-numeric characters, the underscore "_", the minus
393 character "-", and the hash character "#". Names should not
394 begin with a digit or a minus character.
395
396 Strings:
397 Strings denote an arbitrary sequence of bytes, usually inter‐
398 preted as a sequence of UTF-8 encoded characters. Strings in
399 djvused files are similar to strings in the C language. They
400 are surrounded by double quote characters. Certain sequences of
401 characters starting with a backslash ("\") have a special mean‐
402 ing. A backslash followed by letter "a", "b", "t", "n", "v",
403 "f", "r", "\", and stands for the ascii character BEL(007),
404 BS(008), HT(009), LF(010), VT(011), FF(012), CR(013), BACK‐
405 SLASH(134) and DOUBLEQUOTE(042) respectively. A backslash fol‐
406 lowed by one to three digits stands for the byte whose octal
407 code is expressed by the digits. All other backslash sequences
408 are illegal. All non printable ascii characters must be
409 escaped.
410
411 Lists: Lists are sequence of expressions separated by blanks and sur‐
412 rounded by parentheses. All expressions types are acceptable
413 within a list, including sub-lists.
414
415
416 Hidden text syntax
417 The building blocks of the hidden text syntax are lists representing
418 each structural component of the hidden text. Structural components
419 have the following form:
420
421 (type xmin ymin xmax ymax ... )
422
423 The symbol type must be one of page, column, region, para, line, word,
424 or char, listed here by decreasing order of importance. The integers
425 xmin, ymin, xmax, and ymax represent the coordinates of a rectangle
426 indicating the position of the structural component in the page. Coor‐
427 dinates are measured in pixels and have their origin at the bottom left
428 corner of the page. The remaining expressions in the list either is a
429 single string representing the encoded text associated with this struc‐
430 tural component, or is a sequence of structural components with a
431 lesser type.
432
433 The hidden text for each page is simply represented by a single struc‐
434 tural element of type page. Various level of structural information
435 are acceptable. For instance, the page level component might only
436 specify a page level string, or might only provide a list of lines, or
437 might provide a full hierarchy down to the individual characters.
438
439
440 Outline/Bookmark syntax
441 The outline syntax is a single list of the form
442
443 (bookmarks ...)
444
445 The first element of the list is symbol bookmarks. The subsequent ele‐
446 ments are lists representing the toplevel outline entries. Each out‐
447 line entry is represented by a list with the following form:
448
449 (title url ... )
450
451 The string title is the title of the outline entry. The destination
452 string url can be an arbitrary URL or can be composed of the hash char‐
453 acter ("#") followed by either the component file identifier or the
454 page number corresponding to the outline entry. The remaining expres‐
455 sions describe subentries of this outline entry.
456
457
458 Annotation syntax
459 Annotations are represented by a sequence of annotation expressions.
460 The following annotation expressions are recognized:
461
462 (background color)
463 Specify the color of the viewer area surrounding the DjVu image.
464 Colors are represented with the X11 hexadecimal syntax #RRGGBB.
465 For instance, #000000 is black and #FFFFFF is white.
466
467 (zoom zoomvalue)
468 Specify the initial zoom factor of the image. Argument zoom‐
469 value can be one of stretch, one2one, width, page, or composed
470 of the letter d followed by a number in range 1 to 999 repre‐
471 senting a zoom factor (such as in d300 or d150 for instance.)
472
473 (mode modevalue)
474 Specify the initial display mode of the image. Argument mode‐
475 value is one of color, bw, fore, or back.
476
477 (align horzalign vertalign)
478 Specify how the image should be aligned on the viewer surface.
479 By default the image is located in the center. Argument horza‐
480 lign can be one of left, center, or right. Argument vertalign
481 can be one of top, center, or bottom.
482
483 (maparea url comment area ...)
484 Define an hyper-link for the specified destination.
485
486 Argument url can have one of the following forms:
487
488 href
489 (url href target)
490
491 where href is a string representing the destination and target
492 is a string representing the target frame for the hyper-link, as
493 defined by the HTML anchor tag <A>. The destination string href
494 can be an arbitrary URL or can be composed of the hash character
495 ("#") followed by either a component file identifier or a page
496 number. Page numbers may be prefixed with an optional sign to
497 represent a page displacement. For instance the strings "#-1"
498 and "#+1" can be used to access the previous page and the next
499 page.
500
501 Argument comment is a string that might be displayed by the
502 viewer when the user moves the mouse over the hyper-link.
503
504 Argument area defines the shape and the location of the hyper‐
505 link. The following forms are recognized:
506
507 (rect xmin ymin width height)
508 (oval xmin ymin width height)
509 (poly x0 y0 x1 y1 ... )
510 (text xmin ymin width height)
511 (line x0 y0 x1 y1)
512
513 All parameters are numbers representing coordinates. Coordi‐
514 nates are measured in pixels and have their origin at the bottom
515 left corner of the page.
516
517 The remaining expressions in the maparea list represent the vis‐
518 ual effect associated with the hyper-link.
519
520 A first set of options defines how borders are drawn for rect,
521 oval, polygon, or text hyperlink areas.
522
523 (none)
524 (xor)
525 (border color)
526 (shadow_in [thickness])
527 (shadow_out [thickness])
528 (shadow_ein [thickness])
529 (shadow_eout [thickness])
530
531 where parameter color has syntax #RRGGBB as described above, and
532 parameter thickness is an integer in range 1 to 32. The last
533 four border options are only supported for rect hyperlink areas.
534 The default border is a simple black line. Border options do
535 not apply to line areas.
536
537 When a border option is specified, the border becomes visible
538 when the user moves the mouse over the hyperlink. The border may
539 be made always visible by using the following option:
540
541 (border_avis)
542
543 The following two options may be used with rect hyperlink areas.
544 The complete area will be highlighted using the specified color
545 at the specified opacity (0-100, default 50).
546
547 (hilite color)
548 (opacity op)
549
550 This is often used with an empty URL for simply emphasizing a
551 specific segment of an image.
552
553 The following three options may be used with line areas to spec‐
554 ify an optional ending arrow, the line width and color. The
555 default is a black line with width 1 and without arrow.
556
557 (arrow)
558 (width w)
559 (lineclr color)
560
561 Finally the following three options can be used with text areas.
562 The default background color is transparent. The default text
563 color is black. The pushpin option indicates that the text is
564 symbolized by a small pushpin icon. Clicking the icon reveals
565 the text.
566
567 (backclr bkcolor)
568 (textclr txtcolor)
569 (pushpin)
570
571 (metadata ... (key value) ... )
572 Define meta-data entries. Each entry is identified by a symbol
573 key representing the nature of the meta data entry. The string
574 value represents the value associated with the corresponding
575 key. Two sets of keys are noteworthy: keys borrowed from the
576 BibTex bibliography system, and keys borrowed from the PDF
577 DocInfo metadata. BibTex keys are always expressed in lower‐
578 case, such as year, booktitle, editor, author, etc.. DocInfo
579 keys start with an uppercase letter, such as Title, Author, Sub‐
580 ject, Creator, Produced, Trapped, CreationDate, and ModDate.
581 The values associated with the last two keys should be dates
582 expressed according to RFC 3339.
583
584
586 The current version of program djvused only supports selecting one com‐
587 ponent file or all component files. There is no way to select only a
588 few component files.
589
590
592 This program was initially written by Léon Bottou <leonb@users.source‐
593 forge.net> and was improved by Yann Le Cun <profshadoko@users.source‐
594 forge.net>, Florin Nicsa, Bill Riemers <docbill@sourceforge.net> and
595 many others.
596
597
599 djvu(1), djvutxt(1), djvmcvt(1), djvudump(1), bzz(1)
600
601
602
603DjVuLibre-3.5 5/22/2005 DJVUSED(1)