1UPMENDEX(1) General Commands Manual UPMENDEX(1)
2
3
4
6 upmendex - Multilingual index processor
7
9 upmendex [-ilqrcgf] [-s sty] [-d dic] [-o ind] [-t log] [-p no] [--] [
10 idx0 idx1 idx2 ...]
11 upmendex --help
12
14 The program upmendex is a general purpose multilingual hierarchical in‐
15 dex generator working with upLaTeX, XeLaTeX and LuaLaTeX; it accepts
16 one or more input files (.idx; often produced by a text formatter such
17 as LaTeX families), sorts the entries, and produces an output file
18 which can be formatted. It supports Latin (including non-English),
19 Greek, Cyrillic, Korean Hangul and Han (Hanzi ideographs) scripts, as
20 well as Japanese Kana. It is almost compatible with makeindex and
21 mendex, and additional feature for handling readings of kanji words is
22 also available.
23 The formats of the input and output files are specified in a style
24 file. The readings of kanji words can be specified in a dictionary
25 file.
26 The index can have up to three levels (0, 1, and 2) of subitem nesting.
27
29 -i Take input from stdin, even when index files are specified.
30
31 -l Set ´sort by character order´. By default, ´sort by word or‐
32 der´ is used. Details are described below.
33
34 -q Quiet mode; send no message to stderr, except error messages
35 and warnings.
36
37 -r Disable implicit page range formation. By default, three or
38 more successive pages are automatically abbreviated as a
39 range (e.g. 1–5).
40
41 -c Compress sequence of intermediate blanks (space(s) and/or
42 tab(s)) into a space and ignore leading and trailing
43 blank(s). By default, blanks in the index key are retained.
44
45 -g Make Japanese index head A-line (A, Ka, Sa, ...; 10 charac‐
46 ters) of the gojuon table (Japanese syllabary). By default,
47 all 48 characters in the gojuon table are used.
48
49 -f Force to output characters even if the scripts are not sup‐
50 ported by upmendex.
51
52 -s sty Employ sty as the style file.
53
54 -d dic Employ dic as the dictionary file. The dictionary file is
55 composed of lists of <index_word reading>.
56
57 -o ind Employ ind as the output index file. By default, the file
58 name is created by appending the extension ind to the base
59 name of the first input file.
60
61 -t log Employ log as the transcript file. By default, the file name
62 is created by appending the extension ilg to the base name of
63 the first input file.
64
65 -p no Set the starting page number of the output index list to be
66 no. The argument no may be numerical or one of the following:
67 any (the next page to the end of contents), odd (the next odd
68 page to the end of contents), even (the next even page to the
69 end of contents).
70
71 --help Show summary of options.
72
73 -- Arguments after -- are not taken as options. This is useful
74 when the input file name starts with '-'.
75
76
78 The style file informs upmendex about the format of the idx input files
79 and the intended format of the final output file. The format is upper
80 compatible with the one for makeindex and mendex. The style file con‐
81 tains a list of <specifier attribute> pairs. There are two types of
82 specifiers: input and output. Pairs do not have to appear in any par‐
83 ticular order. A line begun by ´%´ is a comment.
84
85
86 Input file style parameter
87
88 keyword <string> "\\indexentry"
89 Command with an argument of index entry
90 which is going to be processed.
91
92 arg_open <char> ´{´
93 Opening delimiter which shows the begin‐
94 ning of index entry.
95
96 arg_close <char> ´}´
97 Closing delimiter which shows the end of
98 index entry.
99
100 range_open <char> ´(´
101 Opening delimiter which shows the begin‐
102 ning of page range.
103
104 range_close <char> ´)´
105 Closing delimiter which shows the end of
106 page range.
107
108 level <char> ´!´
109 Delimiter which shows lower level.
110
111 actual <char> ´@´
112 Symbol which shows the next sequence is
113 to appear as index strings in the output
114 file.
115
116 encap <char> ´|´
117 Symbol which shows the next sequence is
118 to be used as command name attached to
119 the page number.
120
121 page_compositor <string> "-"
122 Separator between page levels for a style
123 with multi-levels of page numbers.
124
125 page_precedence <string> "rnaRA"
126 Priority of expression for page number.
127 ´R´ and ´r´ correspond to Roman. ´n´ cor‐
128 responds to arabic numeral. ´A´ and ´a´
129 correspond to Latin alphabet.
130
131 quote <char> ´"´
132 Escape character for upmendex parameters.
133
134 escape <char> ´\\´
135 Escape character for general scripts.
136
137 Output file style parameter
138
139 preamble <string> "\\begin{theindex}\n"
140 Preamble of output file.
141
142 postamble <string> "\n\n\\end{theindex}\n"
143 Postamble of output file.
144
145 setpage_prefix <string> "\n \\setcounter{page}{"
146 Prefix of page number if start page is
147 designated.
148
149 setpage_suffix <string> "}\n"
150 Suffix of page number if start page is
151 designated.
152
153 group_skip <string> "\n\n \\indexspace\n"
154 Strings to insert vertical space before
155 new section of index.
156
157 lethead_prefix <string> ""
158 Prefix of heading for newly appeared
159 heading letter.
160
161 heading_prefix <string> ""
162 Same as lethead_prefix. (compatible with
163 makeindex)
164
165 lethead_suffix <string> ""
166 Suffix of heading for newly appeared
167 heading letter.
168
169 heading_suffix <string> ""
170 Same as lethead_suffix. (compatible with
171 makeindex)
172
173 lethead_flag <number> 0
174 Flag to control output of heading letters
175 in Latin, Greek and Cyrillic scripts.
176 ´0´, ´1´, ´-1´ and ´2´ respectively de‐
177 notes no output, uppercase, lowercase and
178 titlecase.
179
180 heading_flag <number> 0
181 Same as lethead_flag. (Note: makeindex
182 uses a different name headings_flag)
183
184 headings_flag <number> 0
185 Same as lethead_flag. (compatible with
186 makeindex)
187
188 kana_head <string> ""
189 Heading characters of Kana specified by a
190 string. By default, it is controlled by
191 letter_head and command line option -g.
192 (Extended by upmendex)
193
194 hangul_head <string> "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ"
195 Heading characters of Hangul specified by
196 a string. (Extended by upmendex)
197
198 tumunja <string> "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ"
199 Heading characters of Hangul specified by
200 a string. (Deprecated, Extended by up‐
201 mendex)
202
203 hanzi_head <string> ""
204 Heading strings of hanzi (Kanji, Hanja)
205 specified by a string, which is concate‐
206 nated of items with a separator ´;´.
207 (Extended by upmendex)
208
209 devanagari_head <string> "ऄअआइईउऊऋऌऍऎएऐऑऒओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसह"
210 Heading characters of Devanagari speci‐
211 fied by a string. (Experimental, Ex‐
212 tended by upmendex)
213
214 thai_head <string> "กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮ"
215 Heading characters of Thai script speci‐
216 fied by a string. (Experimental, Ex‐
217 tended by upmendex)
218
219 item_0 <string> "\n \\item "
220 Command sequence inserted between primary
221 level entries.
222
223 item_1 <string> "\n \\subitem "
224 Command sequence inserted between sub
225 level entries.
226
227 item_2 <string> "\n \\subsubitem "
228 Command sequence inserted between subsub
229 level entries.
230
231 item_01 <string> "\n \\subitem "
232 Command sequence inserted between primaly
233 and sub level entries.
234
235 item_x1 <string> "\n \\subitem "
236 Command sequence inserted between primary
237 and sub level entries when main entry
238 does not have page number.
239
240 item_12 <string> "\n \\subsubitem "
241 Command sequence inserted between sub and
242 subsub level entries.
243
244 item_x2 <string> "\n \\subsubitem "
245 Command sequence inserted between sub and
246 subsub level entries when sub level entry
247 does not have page number.
248
249 delim_0 <string> ", "
250 Delimiter string between primary level
251 entry and first page number.
252
253 delim_1 <string> ", "
254 Delimiter string between sub level entry
255 and first page number.
256
257 delim_2 <string> ", "
258 Delimiter string between subsub level en‐
259 try and first page number.
260
261 delim_n <string> ", "
262 Delimiter string between page numbers
263 commonly used for any entry level.
264
265 delim_r <string> "--"
266 Delimiter string between pages to show
267 page range.
268
269 delim_t <string> ""
270 Delimiter string output at the end of
271 page number list.
272
273 suffix_2p <string> ""
274 String to be inserted in place of delim_n
275 and the next page number when the two
276 pages are contiguous.
277 It works only when the parameter is defined.
278
279 suffix_3p <string> ""
280 String to be inserted in place of delim_r
281 and the third page number when the three
282 pages are contiguous. The parameter is
283 prior to suffix_mp.
284 It works only when the parameter is defined.
285
286 suffix_mp <string> ""
287 String to be inserted in place of delim_r
288 and the last page number when the three
289 or more pages are contiguous.
290 It works only when the parameter is defined.
291
292 encap_prefix <string> "\\"
293 Prefix for an encapsulating command when
294 the encapsulating command is added to the
295 page number.
296
297 encap_infix <string> "{"
298 Prefix just before the page number when
299 the encapsulating command is added to the
300 page number.
301
302 encap_suffix <string> "}".
303 Suffix after the page number when the en‐
304 capsulating command is added to the page
305 number.
306
307 line_max <number> 72
308 Maximum number of one line. If exceed
309 the number, lines are folded.
310
311 indent_space <string> ""
312 Space for indent which inserted to top of
313 folded line.
314
315 indent_length <number> 16
316 Length of space for indent which inserted
317 to top of folded line.
318
319 symhead_positive <string> "Symbols"
320 Strings to output as heading letter for
321 symbols when lethead_flag or heading_flag
322 or headings_flag is positive number.
323
324 symhead_negative <string> "symbols"
325 Strings to output as heading letter for
326 symbols when lethead_flag or heading_flag
327 or headings_flag is negative number.
328
329 symbol <string> ""
330 Strings to output as heading letter for
331 symbols when symbol_flag is non zero.
332 If specified, the option is prior to symhead_positive and symhead_nega‐
333 tive. (Extended by (up)mendex)
334
335 numhead_positive <string> "Numbers"
336 Strings to output as heading letter for
337 numbers when lethead_flag or heading_flag
338 or headings_flag is positive number and
339 symbol_flag is 2.
340
341 numhead_negative <string> "numbers"
342 Strings to output as heading letter for
343 numbers when lethead_flag or heading_flag
344 or headings_flag is negative number and
345 symbol_flag is 2.
346
347 symbol_flag <number> 1
348 Flag to output of symbol. If ´0´, do not
349 output headings for symbols and numbers.
350 If ´1´, output symbols and numbers as a
351 group of symbols. If ´2´, output symbols
352 and numbers separately. (Extended by
353 (up)mendex)
354
355 letter_head <number> 1
356 Flag of heading letter for Japanese Kana.
357 If ´1´ and ´2´, Katakana and Hiragana is
358 used, respectively. (Extended by
359 (up)mendex)
360
361 priority <number> 0
362 Flag of sorting method for index words
363 composed of Japanese and non-Japanese
364 (ex. Latin scripts). If non zero, one
365 space (U+0020) is inserted between Japa‐
366 nese sequence and non-Japanese sequence
367 in sorting procedure. (Extended by
368 (up)mendex)
369
370 character_order <string> "SNLGCJKHDTah"
371 Order of scripts and symbols. ´S´, ´N´,
372 ´L´, ´G´, ´C´, ´J´, ´K´, ´H´, ´D´, ´T´,
373 ´a´ and ´h´ respectively denotes symbol,
374 number, Latin, Greek, Cyrillic, Japanese
375 Kana, Korean Hangul, Hanzi, Devanagari,
376 Thai, Arabic and Hebrew script. ´@´ de‐
377 notes scripts which are not explicitly
378 designated and the order are configured
379 by icu_rules or icu_locale. Please make
380 sure that ´S´ and ´N´ are next to each
381 other if symbol_flag=1, since numbers are
382 classified as a part of symbol. (Ex‐
383 tended by upmendex)
384
385 script_preamble <string 1> <string 2>
386 ""
387 Preamble of script block in output file,
388 specified by string 2. One of script
389 names must be specified in the string 1:
390 ´latin´, ´cyrillic´, ´greek´, ´kana´,
391 ´hangul´, ´hanzi´, ´devanagari´, ´thai´,
392 ´arabic´, or ´hebrew´. (Extended by up‐
393 mendex)
394
395 script_postamble <string 1> <string 2>
396 ""
397 Postamble of script block in output file,
398 specified by string 2. One of script
399 names must be specified in the string 1:
400 ´latin´, ´cyrillic´, ´greek´, ´kana´,
401 ´hangul´, ´hanzi´, ´devanagari´, ´thai´,
402 ´arabic´, or ´hebrew´. (Extended by up‐
403 mendex)
404
405 icu_locale <string> ""
406 Locale in ICU collator. By default,
407 "root sort order" is set. (Extended by
408 upmendex)
409
410 icu_rules <string> ""
411 Customized collation rules in ICU colla‐
412 tor. Unicode characters in UTF-8 encod‐
413 ing and following escape sequences are
414 accepted: \Uhhhhhhhh (8-digit hexadecimal
415 [0-9A-Fa-f]), \uhhhh (4-digit hexadeci‐
416 mal), \xhh (2-digit hexadecimal),
417 \x{h...} (1..8-digit hexadecimal), and
418 \ooo (3-digit octal [0-7]). If icu_rules
419 and icu_locale are simultaneously speci‐
420 fied, collation rules specified by
421 icu_rules are added on collation rules
422 specified by icu_locale. By default, lo‐
423 cale is used. (Extended by upmendex)
424 Ref. <https://unicode-org.github.io/icu/userguide/collation/customiza‐
425 tion/>, <http://www.unicode.org/reports/tr35/tr35-collation.html#Rules>
426
427 icu_attributes <string> ""
428 Attributes in ICU collator. Followings
429 are available: "alternate:shifted", "al‐
430 ternate:non-ignorable", "strength:pri‐
431 mary", "strength:secondary",
432 "strength:tertiary", "strength:quater‐
433 nary", "strength:identical", "french-col‐
434 lation:on", "french-collation:off",
435 "case-first:off", "case-first:upper-
436 first", "case-first:lower-first", "case-
437 level:on", "case-level:off", "normaliza‐
438 tion-mode:on", "normalization-mode:off",
439 "numeric-ordering:on", "numeric-order‐
440 ing:off" (Extended by upmendex)
441 Ref. <https://unicode-org.github.io/icu/userguide/collation/customiza‐
442 tion/#default-options>, <http://www.unicode.org/reports/tr35/tr35-col‐
443 lation.html#Setting_Options>
444
446 upmendex has an additional feature to simplify the procedure of han‐
447 dling Japanese indexes, compared to makeindex. Users can save the ef‐
448 fort of manually specifying a reading for every kanji word.
449 Japanese kanji words are usually sorted by the syllables of their read‐
450 ings (´Yomi´), which can be represented by kana (Hiragana, Katakana)
451 scripts. upmendex accepts index words specified in kana expression di‐
452 rectly on an input file, and also accepts conversion from index words
453 in Kanji or symbols to phonogram scripts by referring to Japanese dic‐
454 tionaries.
455
456
457 Examples of internal simplification of syllables are shown below.
458
459 かぶしきがいしゃ かふしきかいしや
460 マッキントッシュ まつきんとつしゆ
461 ワープロ わあふろ
462
463 The dictionary file consists of list with <´index_word´ ´reading´>.
464 The index word can be written in any scripts (kanji, kana, etc), and
465 the reading can be in any phonograms such as Hiragana or Katakana
466 scripts. The delimiter between the index word and its reading is one
467 or more tab(s) or space(s).
468 An example of a Japanese dictionary is shown below.
469
470 漢字 かんじ
471 読み よみ
472 環境 かんきょう
473 $ ドル
474
475 Here, each index word is allowed to have only one Yomi. Though some
476 kanji words (ex. 「表」) may have more than one Yomi´s (ex. 「ひょう」
477 and 「おもて」), only one of them can be registered in the dictionary.
478 When some different Yomi´s are needed, they should be specified explic‐
479 itly in kana expression (ex. \index{ひょう@表} or \index{おもて@表}) on
480 the input file.
481 Moreover, a dictionary file is automatically referred by setting the
482 file name at an environment variable INDEXDEFAULTDICTIONARY. The dic‐
483 tionary set by the environment variable can be used together with
484 file(s) specified by -d option.
485
487 upmendex sorts indexes as is (´sort by word order´) by default. Set‐
488 ting -l option, spaces between words in an index are truncated prior to
489 sorting procedure (´sort by character order´).
490 Even when sort by character order, the index at output remains the
491 original sequence without the truncation.
492 Follows show an example.
493
494 sort by word order sort by character order
495 X Window Xlib
496 Xlib XView
497 XView X Window
498
499 In addition, two sorting methods can be applied for indexes which con‐
500 tains both Japanese kana and other scripts (e.g. Latin script). By
501 setting priority 0 (default) and 1 at a style file, a space between Ja‐
502 panese Kana and other scripts is inserted and not inserted respec‐
503 tively, prior to the sorting procedure.
504 Follows show an example.
505
506 priority=0 priority=1
507 index sort indファイル
508 indファイル index sort
509
511 upmendex refers environment variables as follows.
512
513 INDEXSTYLE
514 Directory where index style files exist.
515
516 INDEXDEFAULTSTYLE
517 Index style file to be referred to as default.
518
519 INDEXDICTIONARY
520 Directory where dictionary files exist.
521
522 INDEXDEFAULTDICTIONARY
523 Dictionary file which is automatically read.
524
526 Detailed specification is compatible with makeindex.
527
529 When plural page number expression is used, .idx files should be speci‐
530 fied along with the order of page numbers. Otherwise, wrong page num‐
531 bers might be output.
532
534 tex(1), latex(1), makeindex(1), mendex(1).
535 International Components for Unicode (ICU): <http://icu.unicode.org/>,
536 <https://unicode-org.github.io/icu/>
537
539 This manual page was written by Takuji Tanaka based on the mendex man‐
540 ual page written by Japanese TeX Development Community.
541
542
543
544 UPMENDEX(1)