1GENDICT(1)                      ICU 67.1 Manual                     GENDICT(1)
2
3
4

NAME

6       gendict - Compiles word list into ICU string trie dictionary
7

SYNOPSIS

9       gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ]
10       [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [  -i,  --icud‐
11       atadir directory ]  input-file  output-file
12

DESCRIPTION

14       gendict  reads  the word list from dictionary-file and creates a string
15       trie dictionary file. Normally this data file has the .dict extension.
16
17       Words begin at the beginning of a line and are terminated by the  first
18       whitespace.  Lines that begin with whitespace are ignored.
19

OPTIONS

21       -h, -?, --help
22              Print help about usage and exit.
23
24       -V, --version
25              Print the version of gendict and exit.
26
27       -c, --copyright
28              Embeds the standard ICU copyright into the output-file.
29
30       -v, --verbose
31              Display extra informative messages during execution.
32
33       -i, --icudatadir directory
34              Look  for  any necessary ICU data files in directory.  For exam‐
35              ple, the file pnames.icu must be located when ICU's data is  not
36              built  as  a  shared library.  The default ICU data directory is
37              specified by the environment variable ICU_DATA.  Most configura‐
38              tions of ICU do not require this argument.
39
40       --uchars
41              Set  the  output  trie  type  to  UChar. Mutually exclusive with
42              --bytes.
43
44       --bytes
45              Set the output trie  type  to  Bytes.  Mutually  exclusive  with
46              --uchars.
47
48       --transform
49              Set  the  transform type. Should only be specified with --bytes.
50              Currently supported transforms are:  offset-<hex-number>,  which
51              specifies  an  offset to subtract from all input characters.  It
52              should be noted that the offset transform also  maps  U+200D  to
53              0xFF and U+200C to 0xFE, in order to offer compatibility to lan‐
54              guages that require these characters.  A transform must be spec‐
55              ified  for a bytes trie, and when applied to the non-value char‐
56              acters in the input-file must produce output  between  0x00  and
57              0xFF.
58
59        input-file
60              The source file to read.
61
62        output-file
63              The file to write the output dictionary to.
64

CAVEATS

66       The  input-file is assumed to be encoded in UTF-8.  The integers in the
67       input-file that are used as values must be made  up  of  ASCII  digits.
68       They  may be specified either in hex, by using a 0x prefix, or in deci‐
69       mal.  Either --bytes or --uchars must be specified.
70

ENVIRONMENT

72       ICU_DATA  Specifies the directory  containing  ICU  data.  Defaults  to
73                 /usr/share/icu/67.1/.   Some tools in ICU depend on the pres‐
74                 ence of the trailing slash. It is thus important to make sure
75                 that it is present if ICU_DATA is set.
76

AUTHORS

78       Maxime Serrano
79

VERSION

81       1.0
82
84       Copyright (C) 2012 International Business Machines Corporation and oth‐
85       ers
86

SEE ALSO

88       http://www.icu-project.org/userguide/boundaryAnalysis.html
89
90
91
92
93ICU MANPAGE                       1 June 2012                       GENDICT(1)
Impressum