1DJVUXML(1)                    DjVuLibre XML Tools                   DJVUXML(1)
2
3
4

NAME

6       djvutoxml, djvuxmlparser - DjVuLibre XML Tools.
7
8

SYNOPSIS

10       djvutoxml [options] inputdjvufile [outputxmlfile]
11       djvuxmlparser [ -o djvufile ] inputxmlfile
12
13
14

DESCRIPTION

16       The  DjVuLibre  XML  Tools provide for editing the metadata, hyperlinks
17       and hidden text associated with  DjVu  files.   Unlike  djvused(1)  the
18       DjVuLibre  XML  Tools rely on the XML technology and can take advantage
19       of XML editors and verifiers.
20
21

DJVUTOXML

23       Program djvutoxml creates a XML file outputxmlfile containing a  refer‐
24       ence  to  the  original  DjVu  document  inputdjvufile  as well as tags
25       describing the metadata, hyperlinks, and hidden  text  associated  with
26       the DjVu file.
27
28       The following options are supported:
29
30       --page pagenum
31              Select  a  page  in a multi-page document.  Without this option,
32              djvutoxml outputs the XML corresponding to all pages of the doc‐
33              ument.
34
35       --with-text
36              Specifies  the  HIDDENTEXT  element  for  each  page  should  be
37              included in the output.  If specified  without  the  --with-anno
38              flag then the --without-anno is implied.  If none of the --with-
39              text, --without-text, --with-anno, or --without-anno, flags  are
40              specified,  then  the  --with-text  and  --with-anno  flags  are
41              implied.
42
43       --without-text
44              Specifies not to output the HIDDENTEXT element  for  each  page.
45              If  specified  without  the --without-anno flag then the --with-
46              anno flag is implied.
47
48       --with-anno
49              Specifies the area MAP element for each page should be  included
50              in  the  output.  If specified without the --with-text flag then
51              the --without-text flag is implied.
52
53       --without-anno
54              Specifies the area MAP element  for  each  page  should  not  be
55              included in the output.  If specified without the --without-text
56              flag then the --with-text flag is implied.
57
58
59

DJVUXMLPARSER

61       Files produced by djvutoxml can then be modified using  either  a  text
62       editor  or  a  XML  editor.   Program djvuxmlparser parses the XML file
63       inputxmlfile in order to modify the metadata of the corresponding  DjVu
64       file.
65
66       -o djvufile
67              In  principle the target DjVu file is the file referenced by the
68              OBJECT element of the XML file.  This option provides the  means
69              to override the filename specified in the OBJECT element.
70
71

DJVUXML DOCUMENT TYPE DEFINITION

73       The document type definition file (DTD)
74
75         /usr/share/djvu/pubtext/DjVuXML-s.dtd
76
77       defines the input and output of the DjVu XML tools.
78
79       The DjVuXML-s DTD is a simplification of the HTML DTD:
80
81         http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html
82
83       with  a  few new attributes added specific to DjVu.  Each of the speci‐
84       fied pages of a DjVu document are represented as OBJECT elements within
85       the BODY element of the XML file.  Each OBJECT element may contain mul‐
86       tiple PARAM elements to specify attributes like page name,  resolution,
87       and gamma factor.  Each OBJECT element may also contain one HIDDENTTEXT
88       element to specify the hidden  text  (usually  generated  with  an  OCR
89       engine) within the DjVu page.  In addition each OBJECT element may ref‐
90       erence a single area MAP element which contains multiple AREA  elements
91       to represent all the hyperlink and highlight areas within the DjVu doc‐
92       ument.
93
94
95   PARAM Elements
96       Legal PARAM elements of a DjVu OBJECT include but are  not  limited  to
97       PAGE  for specifying the page-name, GAMMA for specifying the gamma cor‐
98       rection factor (normally 2.2), and DPI for specifying the page  resolu‐
99       tion.
100
101
102   HIDDENTEXT Elements
103       The  HIDDENTEXT  elements  consists  of nested elements of PAGECOLUMNS,
104       REGION, PARAGRAPH, LINE, and WORD.   The  most  deeply  nested  element
105       specified,  should  specify  the bounding coordinates of the element in
106       top-down orientation.  The body  of  the  most  deeply  nested  element
107       should  contain  the text.  Most DjVu documents use either LINE or WORD
108       as the lowest level element, but any element is  legal  as  the  lowest
109       level element.  A white space is always added between WORD elements and
110       a line feed is always added between  LINE  elements.   Since  languages
111       such  as  Japanese  do not use spaces between words, it is quite common
112       for Asian OCR engines to use WORD as characters instead.
113
114
115   MAP Elements
116       The body of the MAP elements consist of AREA elements.  In addition  to
117       the attributes listed in
118
119         http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA,
120
121       the attributes bordertype, bordercolor, border, and highlight have been
122       added to specify border type, border color, border width, and highlight
123       colors respectively.  Legal values for each  of  these  attributes  are
124       listed  in  the  DjVuXML-s  DTD.   In addition, the shape oval has been
125       added to the legal list of shapes.  An oval uses a rectangular bounding
126       box.
127
128

BUGS

130       Perhaps it would have been better to use CC2 style sheets with standard
131       HTML elements instead of defining the HIDDENTEXT element.
132
133

CREDITS

135       The DjVu XML tools and DTD  were  written  by  Bill  C.  Riemers  <doc‐
136       bill@sourceforge.net> and Fred Crary.
137
138

SEE ALSO

140       djvu(1), djvused(1), and utf8(7).
141
142
143
144DjVuLibre XML Tools               11/15/2002                        DJVUXML(1)
Impressum