1CSEPDJVU(1) DjVuLibre-3.5 CSEPDJVU(1)
2
3
4
6 csepdjvu - DjVu encoder for separated data files.
7
8
10 csepdjvu [options] [sepfiles]... outputdjvufile
11
12
14 This program creates a DjVuDocument file outputdjvufile from separated
15 data files sepfiles. It can read separated data from the standard
16 input when given a single dash instead of the separated data file
17 names. This feature is intended for pre-processing programs that push
18 separated data into csepdjvu via a pipe.
19
20 Each separated data file represents one or more page images. When the
21 program arguments specify multiple pages, all the pages are encoded and
22 saved as a bundled multi-page document. When the program arguments
23 specify a single page, the page is encoded and saved as a single page
24 file.
25
26
28 -d n Specify the resolution information encoded into the output file
29 expressed in dots per inch. The resolution information encoded
30 in DjVu files determine how the decoder scales the image on a
31 particular display. Meaningful resolutions range from 25 to
32 6000. The default value is 300 dpi.
33
34 -q n,...,n
35
36 -q n+...+n
37 Specify the encoding quality of the IW44 encoded background
38 layer. The option argument contain several integers (one per
39 chunk) separated by either commas or pluses. This option is
40 similar to option -slice of program c44. Please refer to the
41 c44(1) man page for additional details. The default quality
42 specification is -q 72,83,93,103.
43
44 This option does not apply to uniformly white background that
45 were not specified by the separated data but are called for by
46 the DjVu specification. Such background images always come at
47 the lowest possible resolution and with a standard quality set‐
48 ting that ensures the color uniformity.
49
50 -t Program csepdjvu interprets certain comments in the separated
51 file to construct a hidden text layer in the DjVu file. This
52 layer records the location of each word for hiliting purposes.
53 This option reduces the file size by simply recording the loca‐
54 tion of each line.
55
56 -v Display a brief message describing each page.
57
58 -vv Display extensive informational messages during encoding.
59
60
62 Each separated data file contains a concatenation of one or more sepa‐
63 rated page images. Each page is logically represented by a foreground
64 image with a transparent color and by a background image visible
65 through the transparent pixels. The data for each separated page image
66 is the concatenation of the following data blocks:
67
68 * A foreground image encoded using either the "Color RLE format" or
69 the "Bitonal RLE format". These formats are described later in this
70 section.
71
72 * An optional background image encoded as a "Portable Pixmap" ( PPM ).
73 This well known format is summarized later in this section. The
74 absence of a background image simply indicates that a uniformly
75 white background should be assumed.
76
77 * An arbitrary number of comment lines starting with character "#" and
78 terminated by a linefeed character. Comment lines whose first word
79 starts with a capital letter have special meanings documented later
80 in this document.
81
82 The dimensions (width and height) of the background image must be
83 obtained by rounding up the quotient of the foreground image dimensions
84 by an integer reduction factor ranging from 1 to 12. Assume, for
85 instance, that the width of the foreground is 2507 and the reduction
86 factor is 3. The width of the background image will be the integer
87 ratio (2507+2)/3.
88
89
90 Color RLE format
91 The Color RLE format is a simple run-length encoding scheme for color
92 images with a limited number of distinct colors. The data always begin
93 with a text header composed of the two characters "R6", the number of
94 columns, the number of rows, and the number of color palette entries.
95 All numbers are expressed in decimal ASCII. These four items are sepa‐
96 rated by blank characters (space, tab, carriage return, or linefeed) or
97 by comment lines introduced by character "#". The last number is fol‐
98 lowed by exactly one character which usually is a linefeed character.
99
100 The header is followed by the color palette containing three bytes per
101 color entry. The bytes represent the red, green, and blue components
102 of the color.
103
104 The palette is followed by a collection of four bytes integers (most
105 significant bit first) representing runs of pixels with an identical
106 color. The twelve upper bits of this integer indicate the index of the
107 run color in the palette entry. The twenty lower bits of the integer
108 indicate the run length. Color indices greater than 0xff0 are
109 reserved. Color index 0xfff is used for transparent runs. Each row is
110 represented by a sequence of runs whose lengths add up to the image
111 width. Rows are encoded starting with the top row and progressing
112 toward the bottom row.
113
114
115 Bitonal RLE format
116 The Bitonal RLE format is a simple run-length encoding scheme for
117 bitonal images. The data always begin with a text header composed of
118 the two characters "R4", the number of columns, and the number of rows.
119 All numbers are expressed in decimal ASCII. These three items are sep‐
120 arated by blank characters (space, tab, carriage return, or linefeed)
121 or by comment lines introduced by character "#". The last number is
122 followed by exactly one character which usually is a linefeed charac‐
123 ter.
124
125 The rest of the file encodes a sequence of numbers representing the
126 lengths of alternating runs of transparent and black pixels. Lines are
127 encoded starting with the top line and progressing toward the bottom
128 line. Each line starts with a white run. The decoder knows that a line
129 is finished when the sum of the run lengths for that line is equal to
130 the number of columns in the image. Numbers in range 0 to 191 are rep‐
131 resented by a single byte in range 0x00 to 0xbf. Numbers in range 192
132 to 16383 are represented by a two byte sequence: the first byte, in
133 range 0xc0 to 0xff, encodes the six most significant bits of the num‐
134 ber, the second byte encodes the remaining eight bits of the number.
135 This scheme allows for runs of length zero, which are useful when a
136 line starts with a black pixel, and when a very long run (whose length
137 exceeds 16383) must be split into smaller runs.
138
139
140 Portable Pixmap (PPM) format
141 The Portable Pixmap format is a well known format for representing
142 color images. Check the ppm(1) man page for complete information.
143
144 The data always begin with a text header composed of the two characters
145 "P6", the number of columns, the number of rows, and the maximal value
146 of a color component (usually 255). All numbers are expressed in deci‐
147 mal ASCII. These three items are separated by blank characters (space,
148 tab, carriage return, or linefeed) or by comment lines introduced by
149 character "#". The last number is followed by exactly one character
150 which usually is a linefeed character.
151
152 The rest of the file encodes all the pixels. Each pixel is represented
153 by three bytes representing the red, green and blue component of the
154 pixel. Pixels are ordered in left to right, top to bottom.
155
156
157 Comments in separated files
158 Each page is followed by an arbitrary number of comment lines starting
159 with character "#" and terminated by a linefeed character. Comment
160 lines whose first word starts with a capital letter have special mean‐
161 ings. The following constructs are currently defined:
162
163 * # T px:py dx:dy wxh+x+y (string)
164 This constructs indicates that the piece of text string must be
165 associated with an area of size wxh at position x,y relative to the
166 lower left corner of the page. The string is UTF-8 encoded. Special
167 characters can be escaped as in PostScript using the backslash char‐
168 acter. Integers px, and py represent the position of the current
169 point on the text baseline before the text was drawn. The drawing
170 operation then moves the current point by dx, and dy pixels. When
171 such comments are present, csepdjvu produces a hidden text layer for
172 the corresponding pages.
173
174 * # L wxh+x+y (url)
175 This construct indicates that an hyperlink to url url should be
176 associated with area of size wxh at position x,y. When such com‐
177 ments are present, csepdjvu produces pages with an annotation chunk
178 containing the specified hyperlinks.
179
180 * # B count (string) (#pageno)
181 This constructs provides outline information for the document. An
182 outline entry entitled string is associated with page pageno. Inte‐
183 ger count indicates how many of the following outline entries must
184 be attached to the current entry as subentries. When such comments
185 are present in the first page csepdjvu produces an navigation chunk
186 with the specified outline.
187
188
190 This program was initially written by Léon Bottou <leonb@users.source‐
191 forge.net> and was improved by Bill Riemers <docbill@sourceforge.net>
192 and many others.
193
194
196 djvu(1), ppm(5), c44(1)
197
198
199
200DjVuLibre-3.5 10/11/2001 CSEPDJVU(1)