1DJVUXML(1) DjVuLibre XML Tools DJVUXML(1)
2
3
4
6 djvutoxml, djvuxmlparser - DjVuLibre XML Tools.
7
8
10 djvutoxml [options] inputdjvufile [outputxmlfile]
11 djvuxmlparser [inputxmlfile]
12
13
14
16 The DjVuLibre XML Tools provide for editing the metadata, hyperlinks
17 and hidden text associated with DjVu files. Unlike djvused(1) the
18 DjVuLibre XML Tools rely on the XML technology and can take advantage
19 of XML editors and verifiers.
20
21
23 Program djvutoxml creates a XML file outputxmlfile containing a refer‐
24 ence to the original DjVu document inputdjvufile as well as tags
25 describing the metadata, hyperlinks, and hidden text associated with
26 the DjVu file.
27
28 The following options are supported:
29
30 --page pagenum
31 Select a page in a multi-page document. Without this option,
32 djvutoxml outputs the XML corresponding to all pages of the doc‐
33 ument.
34
35 --with-text
36 Specifies the HIDDENTEXT element for each page should be
37 included in the output. If specified without the --with-anno
38 flag then the --without-anno is implied. If none of the --with-
39 text, --without-text, --with-anno, or --without-anno, flags are
40 specified, then the --with-text and --with-anno flags are
41 implied.
42
43 --without-text
44 Specifies not to output the HIDDENTEXT element for each page.
45 If specified without the --without-anno flag then the --with-
46 anno flag is implied.
47
48 --with-anno
49 Specifies the area MAP element for each page should be included
50 in the output. If specified without the --with-text flag then
51 the --without-text flag is implied.
52
53 --without-anno
54 Specifies the area MAP element for each page should not be
55 included in the output. If specified without the --without-text
56 flag then the --with-text flag is implied.
57
58
59
61 Files produced by djvutoxml can then be modified using either a text
62 editor or a XML editor. Program djvuxmlparser parses the XML file
63 inputxmlfile and modifies the metadata of the DjVu files referenced by
64 the OBJECT elements.
65
66
68 The document type definition file (DTD)
69
70 /usr/share/djvu/pubtext/DjVuXML-s.dtd
71
72 defines the input and output of the DjVu XML tools.
73
74 The DjVuXML-s DTD is a simplification of the HTML DTD:
75
76 http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html
77
78 with a few new attributes added specific to DjVu. Each of the speci‐
79 fied pages of a DjVu document are represented as OBJECT elements within
80 the BODY element of the XML file. Each OBJECT element may contain mul‐
81 tiple PARAM elements to specify attributes like page name, resolution,
82 and gamma factor. Each OBJECT element may also contain one HIDDENTTEXT
83 element to specify the hidden text (usually generated with an OCR
84 engine) within the DjVu page. In addition each OBJECT element may ref‐
85 erence a single area MAP element which contains multiple AREA elements
86 to represent all the hyperlink and highlight areas within the DjVu doc‐
87 ument.
88
89
90 PARAM Elements
91 Legal PARAM elements of a DjVu OBJECT include but are not limited to
92 PAGE for specifying the page-name, GAMMA for specifying the gamma cor‐
93 rection factor (normally 2.2), and DPI for specifying the page resolu‐
94 tion.
95
96
97 HIDDENTEXT Elements
98 The HIDDENTEXT elements consists of nested elements of PAGECOLUMNS,
99 REGION, PARAGRAPH, LINE, and WORD. The most deeply nested element
100 specified, should specify the bounding coordinates of the element in
101 top-down orientation. The body of the most deeply nested element
102 should contain the text. Most DjVu documents use either LINE or WORD
103 as the lowest level element, but any element is legal as the lowest
104 level element. A white space is always added between WORD elements and
105 a line feed is always added between LINE elements. Since languages
106 such as Japanese do not use spaces between words, it is quite common
107 for Asian OCR engines to use WORD as characters instead.
108
109
110 MAP Elements
111 The body of the MAP elements consist of AREA elements. In addition to
112 the attributes listed in
113
114 http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html‐
115 #edef-AREA,
116
117 the attributes bordertype, bordercolor, border, and highlight have been
118 added to specify border type, border color, border width, and highlight
119 colors respectively. Legal values for each of these attributes are
120 listed in the DjVuXML-s DTD. In addition, the shape oval has been
121 added to the legal list of shapes. An oval uses a rectangular bounding
122 box.
123
124
126 Perhaps it would have been better to use CC2 style sheets with standard
127 HTML elements instead of defining the HIDDENTEXT element.
128
129
131 The DjVu XML tools and DTD were written by Bill C. Riemers <doc‐
132 bill@sourceforge.net> and Fred Crary.
133
134
136 djvu(1), djvused(1), and utf8(7).
137
138
139
140DjVuLibre XML Tools 11/15/2002 DJVUXML(1)