1DJVUTXT(1)                       DjVuLibre-3.5                      DJVUTXT(1)
2
3
4

NAME

6       djvutxt - Extract the hidden text from DjVu documents.
7
8

SYNOPSIS

10       djvutxt [options] inputdjvufile [outputtxtfile]
11
12

DESCRIPTION

14       Program djvutxt decodes the hidden text layer of a DjVu document input‐
15       djvufile and prints it into file outputtxtfile or on the standard  out‐
16       put.   The  hidden  text layer is usually generated with the help of an
17       optical character recognition software.
18
19       Without options -detail and -escape, this program  simply  outputs  the
20       UTF-8  text.  Option -detail cause the output of S-expressions describ‐
21       ing the text and its location.   Option  -escape  uses  C-style  escape
22       sequences to represent nonprintable non-ASCII characters.
23
24
25
26

OPTIONS

28       --page=pagespec
29              Specify  which  pages  should be processed.  When this option is
30              not specified, the text of all pages of the  documents  is  con‐
31              catenated into the output file.  The page specification pagespec
32              contains one or more comma-separated page ranges.  A page  range
33              is  either  a  page  number,  or two page numbers separated by a
34              dash.  For instance, specification 1-10 outputs pages 1  to  10,
35              and specification 1,3,99999-4 outputs pages 1 and 3, followed by
36              all the document pages in reverse order up to page 4.
37
38       --detail=keyword
39              This options causes djvutxt to output  S-expressions  specifying
40              the  position  of  the  text  in  the page.  See the manual page
41              djvused(1) for a description of  the  output  format.   Argument
42              keyword  specifies  the  maximum  level of detail for which text
43              location is reported.  The recognized values are: page,  column,
44              region, para, line, word, and char.  All other values are inter‐
45              preted as char.
46
47       --escape
48              Output escape sequences of the form  "ooo" for all non ASCII  or
49              non printable UTF-8 characters and for the backslash character.
50
51
52
53
54

REMARKS

56       Use program djvused(1) for more control over the text layer.
57
58

CREDITS

60       This  program  was  initially  written  by  Andrei Erofeev <andrew_ero‐
61       feev@yahoo.com> and was then  improved  Bill  Riemers  <docbill@source‐
62       forge.net>  and  many others. It was then rewritten to use the ddjvuapi
63       by Leon Bottou <leonb@sourceforge.net>.
64
65

SEE ALSO

67       djvu(1), djvused(1)
68
69
70
71
72DjVuLibre-3.5                     10/11/2001                        DJVUTXT(1)
Impressum