1DJVUXML(1) DjVuLibre XML Tools DJVUXML(1)
2
3
4
6 djvutoxml, djvuxmlparser - DjVuLibre XML Tools.
7
8
10 djvutoxml [options] inputdjvufile [outputxmlfile]
11 djvuxmlparser [ -o djvufile ] inputxmlfile
12
13
14
16 The DjVuLibre XML Tools provide for editing the metadata, hyperlinks
17 and hidden text associated with DjVu files. Unlike djvused(1) the
18 DjVuLibre XML Tools rely on the XML technology and can take advantage
19 of XML editors and verifiers.
20
21
23 Program djvutoxml creates a XML file outputxmlfile containing a refer‐
24 ence to the original DjVu document inputdjvufile as well as tags
25 describing the metadata, hyperlinks, and hidden text associated with
26 the DjVu file.
27
28 The following options are supported:
29
30 --page pagenum
31 Select a page in a multi-page document. Without this option,
32 djvutoxml outputs the XML corresponding to all pages of the doc‐
33 ument.
34
35 --with-text
36 Specifies the HIDDENTEXT element for each page should be
37 included in the output. If specified without the --with-anno
38 flag then the --without-anno is implied. If none of the --with-
39 text, --without-text, --with-anno, or --without-anno, flags are
40 specified, then the --with-text and --with-anno flags are
41 implied.
42
43 --without-text
44 Specifies not to output the HIDDENTEXT element for each page.
45 If specified without the --without-anno flag then the --with-
46 anno flag is implied.
47
48 --with-anno
49 Specifies the area MAP element for each page should be included
50 in the output. If specified without the --with-text flag then
51 the --without-text flag is implied.
52
53 --without-anno
54 Specifies the area MAP element for each page should not be
55 included in the output. If specified without the --without-text
56 flag then the --with-text flag is implied.
57
58
59
61 Files produced by djvutoxml can then be modified using either a text
62 editor or a XML editor. Program djvuxmlparser parses the XML file
63 inputxmlfile in order to modify the metadata of the corresponding DjVu
64 file.
65
66 -o djvufile
67 In principle the target DjVu file is the file referenced by the
68 OBJECT element of the XML file. This option provides the means
69 to override the filename specified in the OBJECT element.
70
71
73 The document type definition file (DTD)
74
75 /usr/share/djvu/pubtext/DjVuXML-s.dtd
76
77 defines the input and output of the DjVu XML tools.
78
79 The DjVuXML-s DTD is a simplification of the HTML DTD:
80
81 http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html
82
83 with a few new attributes added specific to DjVu. Each of the speci‐
84 fied pages of a DjVu document are represented as OBJECT elements within
85 the BODY element of the XML file. Each OBJECT element may contain mul‐
86 tiple PARAM elements to specify attributes like page name, resolution,
87 and gamma factor. Each OBJECT element may also contain one HIDDENTTEXT
88 element to specify the hidden text (usually generated with an OCR
89 engine) within the DjVu page. In addition each OBJECT element may ref‐
90 erence a single area MAP element which contains multiple AREA elements
91 to represent all the hyperlink and highlight areas within the DjVu doc‐
92 ument.
93
94
95 PARAM Elements
96 Legal PARAM elements of a DjVu OBJECT include but are not limited to
97 PAGE for specifying the page-name, GAMMA for specifying the gamma cor‐
98 rection factor (normally 2.2), and DPI for specifying the page resolu‐
99 tion.
100
101
102 HIDDENTEXT Elements
103 The HIDDENTEXT elements consists of nested elements of PAGECOLUMNS,
104 REGION, PARAGRAPH, LINE, and WORD. The most deeply nested element
105 specified, should specify the bounding coordinates of the element in
106 top-down orientation. The body of the most deeply nested element
107 should contain the text. Most DjVu documents use either LINE or WORD
108 as the lowest level element, but any element is legal as the lowest
109 level element. A white space is always added between WORD elements and
110 a line feed is always added between LINE elements. Since languages
111 such as Japanese do not use spaces between words, it is quite common
112 for Asian OCR engines to use WORD as characters instead.
113
114
115 MAP Elements
116 The body of the MAP elements consist of AREA elements. In addition to
117 the attributes listed in
118
119 http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA,
120
121 the attributes bordertype, bordercolor, border, and highlight have been
122 added to specify border type, border color, border width, and highlight
123 colors respectively. Legal values for each of these attributes are
124 listed in the DjVuXML-s DTD. In addition, the shape oval has been
125 added to the legal list of shapes. An oval uses a rectangular bounding
126 box.
127
128
130 Perhaps it would have been better to use CC2 style sheets with standard
131 HTML elements instead of defining the HIDDENTEXT element.
132
133
135 The DjVu XML tools and DTD were written by Bill C. Riemers <doc‐
136 bill@sourceforge.net> and Fred Crary.
137
138
140 djvu(1), djvused(1), and utf8(7).
141
142
143
144DjVuLibre XML Tools 11/15/2002 DJVUXML(1)