1GENDICT(1) ICU 73.2 Manual GENDICT(1)
2
3
4
6 gendict - Compiles word list into ICU string trie dictionary
7
9 gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ]
10 [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud‐
11 atadir directory ] input-file output-file
12
14 gendict reads the word list from dictionary-file and creates a string
15 trie dictionary file. Normally this data file has the .dict extension.
16
17 Words begin at the beginning of a line and are terminated by the first
18 whitespace. Lines that begin with whitespace are ignored.
19
21 -h, -?, --help
22 Print help about usage and exit.
23
24 -V, --version
25 Print the version of gendict and exit.
26
27 -c, --copyright
28 Embeds the standard ICU copyright into the output-file.
29
30 -v, --verbose
31 Display extra informative messages during execution.
32
33 -i, --icudatadir directory
34 Look for any necessary ICU data files in directory. For exam‐
35 ple, the file pnames.icu must be located when ICU's data is not
36 built as a shared library. The default ICU data directory is
37 specified by the environment variable ICU_DATA. Most configura‐
38 tions of ICU do not require this argument.
39
40 --uchars
41 Set the output trie type to UChar. Mutually exclusive with
42 --bytes.
43
44 --bytes
45 Set the output trie type to Bytes. Mutually exclusive with
46 --uchars.
47
48 --transform
49 Set the transform type. Should only be specified with --bytes.
50 Currently supported transforms are: offset-<hex-number>, which
51 specifies an offset to subtract from all input characters. It
52 should be noted that the offset transform also maps U+200D to
53 0xFF and U+200C to 0xFE, in order to offer compatibility to lan‐
54 guages that require these characters. A transform must be spec‐
55 ified for a bytes trie, and when applied to the non-value char‐
56 acters in the input-file must produce output between 0x00 and
57 0xFF.
58
59 input-file
60 The source file to read.
61
62 output-file
63 The file to write the output dictionary to.
64
66 The input-file is assumed to be encoded in UTF-8. The integers in the
67 input-file that are used as values must be made up of ASCII digits.
68 They may be specified either in hex, by using a 0x prefix, or in deci‐
69 mal. Either --bytes or --uchars must be specified.
70
72 ICU_DATA Specifies the directory containing ICU data. Defaults to
73 /usr/share/icu/73.2/. Some tools in ICU depend on the pres‐
74 ence of the trailing slash. It is thus important to make sure
75 that it is present if ICU_DATA is set.
76
78 Maxime Serrano
79
81 1.0
82
84 Copyright (C) 2012 International Business Machines Corporation and oth‐
85 ers
86
88 http://www.icu-project.org/userguide/boundaryAnalysis.html
89
90
91
92
93ICU MANPAGE 1 June 2012 GENDICT(1)