1DJVUXML(1)                    DjVuLibre XML Tools                   DJVUXML(1)
2
3
4

NAME

6       djvutoxml, djvuxmlparser - DjVuLibre XML Tools.
7
8

SYNOPSIS

10       djvutoxml [options] inputdjvufile [outputxmlfile]
11       djvuxmlparser [inputxmlfile]
12
13
14

DESCRIPTION

16       The  DjVuLibre  XML  Tools provide for editing the metadata, hyperlinks
17       and hidden text associated with  DjVu  files.   Unlike  djvused(1)  the
18       DjVuLibre  XML  Tools rely on the XML technology and can take advantage
19       of XML editors and verifiers.
20
21

DJVUTOXML

23       Program djvutoxml creates a XML file outputxmlfile containing a  refer‐
24       ence  to  the  original  DjVu  document  inputdjvufile  as well as tags
25       describing the metadata, hyperlinks, and hidden  text  associated  with
26       the DjVu file.
27
28       The following options are supported:
29
30       --page pagenum
31              Select  a  page  in a multi-page document.  Without this option,
32              djvutoxml outputs the XML corresponding to all pages of the doc‐
33              ument.
34
35       --with-text
36              Specifies  the  HIDDENTEXT  element  for  each  page  should  be
37              included in the output.  If specified  without  the  --with-anno
38              flag then the --without-anno is implied.  If none of the --with-
39              text, --without-text, --with-anno, or --without-anno, flags  are
40              specified,  then  the  --with-text  and  --with-anno  flags  are
41              implied.
42
43       --without-text
44              Specifies not to output the HIDDENTEXT element  for  each  page.
45              If  specified  without  the --without-anno flag then the --with-
46              anno flag is implied.
47
48       --with-anno
49              Specifies the area MAP element for each page should be  included
50              in  the  output.  If specified without the --with-text flag then
51              the --without-text flag is implied.
52
53       --without-anno
54              Specifies the area MAP element  for  each  page  should  not  be
55              included in the output.  If specified without the --without-text
56              flag then the --with-text flag is implied.
57
58
59

DJVUXMLPARSER

61       Files produced by djvutoxml can then be modified using  either  a  text
62       editor  or  a  XML  editor.   Program djvuxmlparser parses the XML file
63       inputxmlfile and modifies the metadata of the DjVu files referenced  by
64       the OBJECT elements.
65
66

DJVUXML DOCUMENT TYPE DEFINITION

68       The document type definition file (DTD)
69
70         /usr/share/djvu/pubtext/DjVuXML-s.dtd
71
72       defines the input and output of the DjVu XML tools.
73
74       The DjVuXML-s DTD is a simplification of the HTML DTD:
75
76         http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html
77
78       with  a  few new attributes added specific to DjVu.  Each of the speci‐
79       fied pages of a DjVu document are represented as OBJECT elements within
80       the BODY element of the XML file.  Each OBJECT element may contain mul‐
81       tiple PARAM elements to specify attributes like page name,  resolution,
82       and gamma factor.  Each OBJECT element may also contain one HIDDENTTEXT
83       element to specify the hidden  text  (usually  generated  with  an  OCR
84       engine) within the DjVu page.  In addition each OBJECT element may ref‐
85       erence a single area MAP element which contains multiple AREA  elements
86       to represent all the hyperlink and highlight areas within the DjVu doc‐
87       ument.
88
89
90   PARAM Elements
91       Legal PARAM elements of a DjVu OBJECT include but are  not  limited  to
92       PAGE  for specifying the page-name, GAMMA for specifying the gamma cor‐
93       rection factor (normally 2.2), and DPI for specifying the page  resolu‐
94       tion.
95
96
97   HIDDENTEXT Elements
98       The  HIDDENTEXT  elements  consists  of nested elements of PAGECOLUMNS,
99       REGION, PARAGRAPH, LINE, and WORD.   The  most  deeply  nested  element
100       specified,  should  specify  the bounding coordinates of the element in
101       top-down orientation.  The body  of  the  most  deeply  nested  element
102       should  contain  the text.  Most DjVu documents use either LINE or WORD
103       as the lowest level element, but any element is  legal  as  the  lowest
104       level element.  A white space is always added between WORD elements and
105       a line feed is always added between  LINE  elements.   Since  languages
106       such  as  Japanese  do not use spaces between words, it is quite common
107       for Asian OCR engines to use WORD as characters instead.
108
109
110   MAP Elements
111       The body of the MAP elements consist of AREA elements.  In addition  to
112       the attributes listed in
113
114         http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html
115         #edef-AREA,
116
117       the attributes bordertype, bordercolor, border, and highlight have been
118       added to specify border type, border color, border width, and highlight
119       colors respectively.  Legal values for each  of  these  attributes  are
120       listed  in  the  DjVuXML-s  DTD.   In addition, the shape oval has been
121       added to the legal list of shapes.  An oval uses a rectangular bounding
122       box.
123
124

BUGS

126       Perhaps it would have been better to use CC2 style sheets with standard
127       HTML elements instead of defining the HIDDENTEXT element.
128
129

CREDITS

131       The DjVu XML tools and DTD  were  written  by  Bill  C.  Riemers  <doc‐
132       bill@sourceforge.net> and Fred Crary.
133
134

SEE ALSO

136       djvu(1), djvused(1), and utf8(7).
137
138
139
140DjVuLibre XML Tools               11/15/2002                        DJVUXML(1)
Impressum