1DJVUTXT(1) DjVuLibre-3.5 DJVUTXT(1)
2
3
4
6 djvutxt - Extract the hidden text from DjVu documents.
7
8
10 djvutxt [options] inputdjvufile [outputtxtfile]
11
12
14 Program djvutxt decodes the hidden text layer of a DjVu document input‐
15 djvufile and prints it into file outputtxtfile or on the standard out‐
16 put. The hidden text layer is usually generated with the help of an
17 optical character recognition software.
18
19 Without options -detail and -escape, this program simply outputs the
20 UTF-8 text. Option -detail cause the output of S-expressions describ‐
21 ing the text and its location. Option -escape uses C-style escape
22 sequences to represent nonprintable non-ASCII characters.
23
24
25
26
28 --page=pagespec
29 Specify which pages should be processed. When this option is
30 not specified, the text of all pages of the documents is con‐
31 catenated into the output file. The page specification pagespec
32 contains one or more comma-separated page ranges. A page range
33 is either a page number, or two page numbers separated by a
34 dash. For instance, specification 1-10 outputs pages 1 to 10,
35 and specification 1,3,99999-4 outputs pages 1 and 3, followed by
36 all the document pages in reverse order up to page 4.
37
38 --detail=keyword
39 This options causes djvutxt to output S-expressions specifying
40 the position of the text in the page. See the manual page
41 djvused(1) for a description of the output format. Argument
42 keyword specifies the maximum level of detail for which text
43 location is reported. The recognized values are: page, column,
44 region, para, line, word, and char. All other values are inter‐
45 preted as char.
46
47 --escape
48 Output escape sequences of the form "ooo" for all non ASCII or
49 non printable UTF-8 characters and for the backslash character.
50
51
52
53
54
56 Use program djvused(1) for more control over the text layer.
57
58
60 This program was initially written by Andrei Erofeev <andrew_ero‐
61 feev@yahoo.com> and was then improved Bill Riemers <docbill@source‐
62 forge.net> and many others. It was then rewritten to use the ddjvuapi
63 by Leon Bottou <leonb@sourceforge.net>.
64
65
67 djvu(1), djvused(1)
68
69
70
71
72DjVuLibre-3.5 10/11/2001 DJVUTXT(1)