1TRIETOOL(1) General Commands Manual TRIETOOL(1)
2
3
4
6 trietool - trie manipulation tool
7
9 trietool [ options ] trie command arg ...
10
12 trietool is the command-line tool for manipulating double-array trie
13 data. It can be used to query, add and remove words in a trie.
14
15 The Trie
16 The trie argument specifies the name of the trie to manipulate. A trie
17 is stored in a file with `.tri' extension. However, to create a new
18 trie, one needs to prepare a file with `.abm' extension, describing the
19 Unicode ranges of alphabet set of the trie. The ABM defines a set of
20 vectors that map Unicode characters into a continuous range of inte‐
21 gers. The mapped integers will be used as internal alphabet for the
22 trie. Such mapping can improve the space allocation within the trie
23 data, regardless of non-continuity of the character set being used, as
24 the mapped range is always continuous.
25
26 The ABM file is a plain text file, with each line listing a range of
27 32-bit Unicodes to be added to the alphabet set, in the format:
28
29 [0xSSSS,0xTTTT]
30
31 where `0xSSSS' and `0xTTTT' are hexadecimal values of starting and end‐
32 ing character code for the range, respectively.
33
34 For example, for a dictionary that contains only English words witout
35 any punctuations, one may prepare `trie.abm' as:
36
37 [0x0041,0x005a]
38 [0x0061,0x007a]
39
40 The first line lists the ASCII codes for A-Z, and the second for a-z.
41
42 No more than 255 alphabets are allowed in a trie.
43
44 The created `.tri' file will incorporate the ABM data. So, the `.abm'
45 file is not required after the first creation, and will be ignored.
46
48 Available commands are:
49
50 add word data ...
51 Add word to trie, associated with integer data. Arbitrary num‐
52 ber of words-data pairs can be given. Two arguments will be
53 read at a time, the first will be treated as word, and the sec‐
54 ond as data.
55
56 add-list [ options ] list-file
57 Add words with associated data listed in list-file to trie. The
58 list-file must be a text file listing one word per line. The
59 associated data can be put after the word in the same line, sep‐
60 arated with tab (`\t') character. If the data field is omitted,
61 a default value (-1) will be used instead.
62
63 Options are available for this command:
64
65 -e, --encoding enc
66 Specify character encoding of the list-file contents,
67 such as `UTF-8'. If omitted, current locale codeset is
68 assumed.
69
70 delete word ...
71 Delete word from trie. Arbitrary number of words to delete can
72 be given.
73
74 delete-list [ options ] list-file
75 Delete words listed in list-file from trie. The list-file must
76 be a text file listing one word per line.
77
78 Options are available for this command:
79
80 -e, --encoding enc
81 Specify character encoding of the list-file contents,
82 such as `UTF-8'. If omitted, current locale codeset is
83 assumed.
84
85 query word
86 Search for word in trie. If word exists, its associated data is
87 printed to standard output. Otherwise, error message is printed
88 to standard error, with nothing printed to standard output.
89
90 list List all words in trie to standard output. The output lists one
91 word-data pair per line, separated with tab (`\t') character,
92 the format appropriate for being list-file for the add-list com‐
93 mand.
94
96 This program follows the usual GNU command line syntax, with long
97 options starting with two dashes (`--'). A summary of options is
98 included below.
99
100 -p, --path dir
101 Set trie directory to dir [default=`.']
102
103 -h, --help
104 Show summary of options.
105
106 -V, --version
107 Show version of program.
108
110 libdatrie was written by Theppitak Karoonboonyanan.
111
112 This manual page was written by Theppitak Karoonboonyanan <theppi‐
113 tak@gmail.com>.
114
115
116
117 DECEMBER 2008 TRIETOOL(1)