1UPMENDEX(1) General Commands Manual UPMENDEX(1)
2
3
4
6 upmendex - Multilingual index processor
7
9 upmendex [-ilqrcgf] [-s sty] [-d dic] [-o ind] [-t log] [-p no] [--] [
10 idx0 idx1 idx2 ...]
11 upmendex --help
12
14 The program upmendex is a general purpose multilingual hierarchical in‐
15 dex generator working with upLaTeX, XeLaTeX and LuaLaTeX; it accepts
16 one or more input files (.idx; often produced by a text formatter such
17 as LaTeX families), sorts the entries, and produces an output file
18 which can be formatted. It supports Latin (including non-English),
19 Greek, Cyrillic, Korean Hangul and Han (Hanzi ideographs) scripts, as
20 well as Japanese Kana. It is almost compatible with makeindex and
21 mendex, and additional feature for handling readings of kanji words is
22 also available.
23 The formats of the input and output files are specified in a style
24 file. The readings of kanji words can be specified in a dictionary
25 file.
26 The index can have up to three levels (0, 1, and 2) of subitem nesting.
27
29 -i Take input from stdin, even when index files are specified.
30
31 -l Set ´sort by character order´. By default, ´sort by word or‐
32 der´ is used. Details are described below.
33
34 -q Quiet mode; send no message to stderr, except error messages
35 and warnings.
36
37 -r Disable implicit page range formation. By default, three or
38 more successive pages are automatically abbreviated as a
39 range (e.g. 1–5).
40
41 -c Compress sequence of intermediate blanks (space(s) and/or
42 tab(s)) into a space and ignore leading and trailing
43 blank(s). By default, blanks in the index key are retained.
44
45 -g Make Japanese index head A-line (A, Ka, Sa, ...; 10 charac‐
46 ters) of the gojuon table (Japanese syllabary). By default,
47 all 48 characters in the gojuon table are used.
48
49 -f Force to output characters even if the scripts are not sup‐
50 ported by upmendex.
51
52 -s sty Employ sty as the style file.
53
54 -d dic Employ dic as the dictionary file. The dictionary file is
55 composed of lists of <index_word reading>.
56
57 -o ind Employ ind as the output index file. By default, the file
58 name is created by appending the extension ind to the base
59 name of the first input file.
60
61 -t log Employ log as the transcript file. By default, the file name
62 is created by appending the extension ilg to the base name of
63 the first input file.
64
65 -p no Set the starting page number of the output index list to be
66 no. The argument no may be numerical or one of the following:
67 any (the next page to the end of contents), odd (the next odd
68 page to the end of contents), even (the next even page to the
69 end of contents).
70
71 --help Show summary of options.
72
73 -- Arguments after -- are not taken as options. This is useful
74 when the input file name starts with '-'.
75
76
78 The style file informs upmendex about the format of the idx input files
79 and the intended format of the final output file. The format is upper
80 compatible with the one for makeindex and mendex. The style file con‐
81 tains a list of <specifier attribute> pairs. There are two types of
82 specifiers: input and output. Pairs do not have to appear in any par‐
83 ticular order. A line begun by ´%´ is a comment.
84
85
86 Input file style parameter
87
88 keyword <string> "\\indexentry"
89 Command with an argument of index entry
90 which is going to be processed.
91
92 arg_open <char> ´{´
93 Opening delimiter which shows the begin‐
94 ning of index entry.
95
96 arg_close <char> ´}´
97 Closing delimiter which shows the end of
98 index entry.
99
100 range_open <char> ´(´
101 Opening delimiter which shows the begin‐
102 ning of page range.
103
104 range_close <char> ´)´
105 Closing delimiter which shows the end of
106 page range.
107
108 level <char> ´!´
109 Delimiter which shows lower level.
110
111 actual <char> ´@´
112 Symbol which shows the next sequence is
113 to appear as index strings in the output
114 file.
115
116 encap <char> ´|´
117 Symbol which shows the next sequence is
118 to be used as command name attached to
119 the page number.
120
121 page_compositor <string> "-"
122 Separator between page levels for a style
123 with multi-levels of page numbers.
124
125 page_precedence <string> "rnaRA"
126 Priority of expression for page number.
127 ´R´ and ´r´ correspond to Roman. ´n´ cor‐
128 responds to arabic numeral. ´A´ and ´a´
129 correspond to Latin alphabet.
130
131 quote <char> ´"´
132 Escape character for upmendex parameters.
133
134 escape <char> ´\\´
135 Escape character for general scripts.
136
137 Output file style parameter
138
139 preamble <string> "\\begin{theindex}\n"
140 Preamble of output file.
141
142 postamble <string> "\n\n\\end{theindex}\n"
143 Postamble of output file.
144
145 setpage_prefix <string> "\n \\setcounter{page}{"
146 Prefix of page number if start page is
147 designated.
148
149 setpage_suffix <string> "}\n"
150 Suffix of page number if start page is
151 designated.
152
153 group_skip <string> "\n\n \\indexspace\n"
154 Strings to insert vertical space before
155 new section of index.
156
157 lethead_prefix <string> ""
158 Prefix of heading for newly appeared
159 heading letter.
160
161 heading_prefix <string> ""
162 Same as lethead_prefix. (compatible with
163 makeindex)
164
165 lethead_suffix <string> ""
166 Suffix of heading for newly appeared
167 heading letter.
168
169 heading_suffix <string> ""
170 Same as lethead_suffix. (compatible with
171 makeindex)
172
173 lethead_flag <number> 0
174 Flag to control output of heading letters
175 in Latin, Greek and Cyrillic scripts.
176 ´0´, ´1´, ´-1´ and ´2´ respectively de‐
177 notes no output, uppercase, lowercase and
178 titlecase.
179
180 heading_flag <number> 0
181 Same as lethead_flag. (Note: makeindex
182 uses a different name headings_flag)
183
184 headings_flag <number> 0
185 Same as lethead_flag. (compatible with
186 makeindex)
187
188 kana_head <string> ""
189 Heading characters of Kana specified by a
190 string. By default, it is controlled by
191 letter_head and command line option -g.
192 (Extended by upmendex)
193
194 hangul_head <string> "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ"
195 Heading characters of Hangul specified by
196 a string. (Extended by upmendex)
197
198 tumunja <string> "ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ"
199 Heading characters of Hangul specified by
200 a string. (Deprecated, Extended by up‐
201 mendex)
202
203 hanzi_head <string> ""
204 Heading strings of hanzi (Kanji, Hanja)
205 specified by a string, which is concate‐
206 nated of items with a separator ´;´.
207 (Extended by upmendex)
208
209 devanagari_head <string> "ऄअआइईउऊऋऌऍऎएऐऑऒओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलळवशषसह"
210 Heading characters of Devanagari speci‐
211 fied by a string. (Experimental, Ex‐
212 tended by upmendex)
213
214 thai_head <string> "กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮ"
215 Heading characters of Thai script speci‐
216 fied by a string. (Experimental, Ex‐
217 tended by upmendex)
218
219 item_0 <string> "\n \\item "
220 Command sequence inserted between primary
221 level entries.
222
223 item_1 <string> "\n \\subitem "
224 Command sequence inserted between sub
225 level entries.
226
227 item_2 <string> "\n \\subsubitem "
228 Command sequence inserted between subsub
229 level entries.
230
231 item_01 <string> "\n \\subitem "
232 Command sequence inserted between primaly
233 and sub level entries.
234
235 item_x1 <string> "\n \\subitem "
236 Command sequence inserted between primary
237 and sub level entries when main entry
238 does not have page number.
239
240 item_12 <string> "\n \\subsubitem "
241 Command sequence inserted between sub and
242 subsub level entries.
243
244 item_x2 <string> "\n \\subsubitem "
245 Command sequence inserted between sub and
246 subsub level entries when sub level entry
247 does not have page number.
248
249 delim_0 <string> ", "
250 Delimiter string between primary level
251 entry and first page number.
252
253 delim_1 <string> ", "
254 Delimiter string between sub level entry
255 and first page number.
256
257 delim_2 <string> ", "
258 Delimiter string between subsub level en‐
259 try and first page number.
260
261 delim_n <string> ", "
262 Delimiter string between page numbers
263 commonly used for any entry level.
264
265 delim_r <string> "--"
266 Delimiter string between pages to show
267 page range.
268
269 delim_t <string> ""
270 Delimiter string output at the end of
271 page number list.
272
273 suffix_2p <string> ""
274 String to be inserted in place of delim_n
275 and the next page number when the two
276 pages are contiguous.
277 It works only when the parameter is defined.
278
279 suffix_3p <string> ""
280 String to be inserted in place of delim_r
281 and the third page number when the three
282 pages are contiguous. The parameter is
283 prior to suffix_mp.
284 It works only when the parameter is defined.
285
286 suffix_mp <string> ""
287 String to be inserted in place of delim_r
288 and the last page number when the three
289 or more pages are contiguous.
290 It works only when the parameter is defined.
291
292 encap_prefix <string> "\\"
293 Prefix for an encapsulating command when
294 the encapsulating command is added to the
295 page number.
296
297 encap_infix <string> "{"
298 Prefix just before the page number when
299 the encapsulating command is added to the
300 page number.
301
302 encap_suffix <string> "}".
303 Suffix after the page number when the en‐
304 capsulating command is added to the page
305 number.
306
307 line_max <number> 72
308 Maximum number of one line. If exceed
309 the number, lines are folded.
310
311 indent_space <string> ""
312 Space for indent which inserted to top of
313 folded line.
314
315 indent_length <number> 16
316 Length of space for indent which inserted
317 to top of folded line.
318
319 symhead_positive <string> "Symbols"
320 Strings to output as heading letter for
321 symbols when lethead_flag or heading_flag
322 or headings_flag is positive number.
323
324 symhead_negative <string> "symbols"
325 Strings to output as heading letter for
326 symbols when lethead_flag or heading_flag
327 or headings_flag is negative number.
328
329 symbol <string> ""
330 Strings to output as heading letter for
331 symbols when symbol_flag is non zero.
332 If specified, the option is prior to symhead_positive and symhead_nega‐
333 tive. (Extended by (up)mendex)
334
335 numhead_positive <string> "Numbers"
336 Strings to output as heading letter for
337 numbers when lethead_flag or heading_flag
338 or headings_flag is positive number and
339 symbol_flag is 2.
340
341 numhead_negative <string> "numbers"
342 Strings to output as heading letter for
343 numbers when lethead_flag or heading_flag
344 or headings_flag is negative number and
345 symbol_flag is 2.
346
347 symbol_flag <number> 1
348 Flag to output of symbol. If ´0´, do not
349 output headings for symbols and numbers.
350 If ´1´, output symbols and numbers as a
351 group of symbols. If ´2´, output symbols
352 and numbers separately. (Extended by
353 (up)mendex)
354
355 letter_head <number> 1
356 Flag of heading letter for Japanese Kana.
357 If ´1´ and ´2´, Katakana and Hiragana is
358 used, respectively. (Extended by
359 (up)mendex)
360
361 priority <number> 0
362 Flag of sorting method for index words
363 composed of Japanese and non-Japanese
364 (ex. Latin scripts). If non zero, one
365 space (U+20) is inserted between Japanese
366 sequence and non-Japanese sequence in
367 sorting procedure. (Extended by
368 (up)mendex)
369
370 character_order <string> "SNLGCJKHDTah"
371 Order of scripts and symbols. ´S´, ´N´,
372 ´L´, ´G´, ´C´, ´J´, ´K´, ´H´, ´D´, ´T´,
373 ´a´ and ´h´ respectively denotes symbol,
374 number, Latin, Greek, Cyrillic, Japanese
375 Kana, Korean Hangul, Hanja, Devanagari,
376 Thai, Arabic and Hebrew script. Please
377 make sure that ´S´ and ´N´ are next to
378 each other if symbol_flag=1, since num‐
379 bers are classified as a part of symbol.
380 (Extended by upmendex)
381
382 script_preamble <string 1> <string 2>
383 ""
384 Preamble of script block in output file,
385 specified by string 2. One of script
386 names must be specified in the string 1:
387 ´latin´, ´cyrillic´, ´greek´, ´kana´,
388 ´hangul´, ´hanzi´, ´devanagari´, ´thai´,
389 ´arabic´, or ´hebrew´. (Extended by up‐
390 mendex)
391
392 script_postamble <string 1> <string 2>
393 ""
394 Postamble of script block in output file,
395 specified by string 2. One of script
396 names must be specified in the string 1:
397 ´latin´, ´cyrillic´, ´greek´, ´kana´,
398 ´hangul´, ´hanzi´, ´devanagari´, ´thai´,
399 ´arabic´, or ´hebrew´. (Extended by up‐
400 mendex)
401
402 icu_locale <string> ""
403 Locale in ICU collator. By default,
404 "root sort order" is set. (Extended by
405 upmendex)
406
407 icu_rules <string> ""
408 Customized collation rules in ICU colla‐
409 tor. Unicode characters in UTF-8 encod‐
410 ing and following escape sequences are
411 accepted: \Uhhhhhhhh (8-digit hexadecimal
412 [0-9A-Fa-f]), \uhhhh (4-digit hexadeci‐
413 mal), \xhh (2-digit hexadecimal),
414 \x{h...} (1..8-digit hexadecimal), and
415 \ooo (3-digit octal [0-7]). If icu_rules
416 and icu_locale are simultaneously speci‐
417 fied, collation rules specified by
418 icu_rules are added on collation rules
419 specified by icu_locale. By default, lo‐
420 cale is used. (Extended by upmendex)
421 Ref. <https://unicode-org.github.io/icu/userguide/collation/customiza‐
422 tion/>, <http://www.unicode.org/reports/tr35/tr35-collation.html#Rules>
423
424 icu_attributes <string> ""
425 Attributes in ICU collator. Followings
426 are available: "alternate:shifted", "al‐
427 ternate:non-ignorable", "strength:pri‐
428 mary", "strength:secondary",
429 "strength:tertiary", "strength:quater‐
430 nary", "strength:identical", "french-col‐
431 lation:on", "french-collation:off",
432 "case-first:off", "case-first:upper-
433 first", "case-first:lower-first", "case-
434 level:on", "case-level:off", "normaliza‐
435 tion-mode:on", "normalization-mode:off",
436 "numeric-ordering:on", "numeric-order‐
437 ing:off" (Extended by upmendex)
438 Ref. <https://unicode-org.github.io/icu/userguide/collation/customiza‐
439 tion/#default-options>, <http://www.unicode.org/reports/tr35/tr35-col‐
440 lation.html#Setting_Options>
441
443 upmendex has an additional feature to simplify the procedure of han‐
444 dling Japanese indexes, compared to makeindex. Users can save the ef‐
445 fort of manually specifying a reading for every kanji word.
446 Japanese kanji words are usually sorted by the syllables of their read‐
447 ings (´Yomi´), which can be represented by kana (Hiragana, Katakana)
448 scripts. upmendex accepts index words specified in kana expression di‐
449 rectly on an input file, and also accepts conversion from index words
450 in Kanji or symbols to phonogram scripts by referring to Japanese dic‐
451 tionaries.
452
453
454 Examples of internal simplification of syllables are shown below.
455
456 かぶしきがいしゃ かふしきかいしや
457 マッキントッシュ まつきんとつしゆ
458 ワープロ わあふろ
459
460 The dictionary file consists of list with <´index_word´ ´reading´>.
461 The index word can be written in any scripts (kanji, kana, etc), and
462 the reading can be in any phonograms such as Hiragana or Katakana
463 scripts. The delimiter between the index word and its reading is one
464 or more tab(s) or space(s).
465 An example of a Japanese dictionary is shown below.
466
467 漢字 かんじ
468 読み よみ
469 環境 かんきょう
470 $ ドル
471
472 Here, each index word is allowed to have only one Yomi. Though some
473 kanji words (ex. 「表」) may have more than one Yomi´s (ex. 「ひょう」
474 and 「おもて」), only one of them can be registered in the dictionary.
475 When some different Yomi´s are needed, they should be specified explic‐
476 itly in kana expression (ex. \index{ひょう@表} or \index{おもて@表}) on
477 the input file.
478 Moreover, a dictionary file is automatically referred by setting the
479 file name at an environment variable INDEXDEFAULTDICTIONARY. The dic‐
480 tionary set by the environment variable can be used together with
481 file(s) specified by -d option.
482
484 upmendex sorts indexes as is (´sort by word order´) by default. Set‐
485 ting -l option, spaces between words in an index are truncated prior to
486 sorting procedure (´sort by character order´).
487 Even when sort by character order, the index at output remains the
488 original sequence without the truncation.
489 Follows show an example.
490
491 sort by word order sort by character order
492 X Window Xlib
493 Xlib XView
494 XView X Window
495
496 In addition, two sorting methods can be applied for indexes which con‐
497 tains both Japanese kana and other scripts (e.g. Latin script). By
498 setting priority 0 (default) and 1 at a style file, a space between Ja‐
499 panese Kana and other scripts is inserted and not inserted respec‐
500 tively, prior to the sorting procedure.
501 Follows show an example.
502
503 priority=0 priority=1
504 index sort indファイル
505 indファイル index sort
506
508 upmendex refers environment variables as follows.
509
510 INDEXSTYLE
511 Directory where index style files exist.
512
513 INDEXDEFAULTSTYLE
514 Index style file to be referred to as default.
515
516 INDEXDICTIONARY
517 Directory where dictionary files exist.
518
519 INDEXDEFAULTDICTIONARY
520 Dictionary file which is automatically read.
521
523 Detailed specification is compatible with makeindex.
524
526 When plural page number expression is used, .idx files should be speci‐
527 fied along with the order of page numbers. Otherwise, wrong page num‐
528 bers might be output.
529
531 tex(1), latex(1), makeindex(1), mendex(1).
532 International Components for Unicode (ICU): <http://icu.unicode.org/>,
533 <https://unicode-org.github.io/icu/>
534
536 This manual page was written by Takuji Tanaka based on the mendex man‐
537 ual page written by Japanese TeX Development Community.
538
539
540
541 UPMENDEX(1)