1DICTFMT(1) DICTFMT(1)
2
3
4
6 dictfmt - formats a DICT protocol dictionary database
7
9 dictfmt -c5|-t|-e|-f|-h|-j|-p [options] basename
10 dictfmt -i|-I [options]
11
13 dictfmt takes a file, FILE, on stdin, and creates a dictionary database
14 named basename.dict, that conforms to the DICT protocol. It also cre‐
15 ates an index file named basename.index. By default, the index is
16 sorted according to the C locale, and only alphanumeric characters and
17 spaces are used in sorting, however this may be changed with the
18 --locale and --allchars options. ( basename is commonly chosen to cor‐
19 respond to the basename of FILE , but this is not mandatory.)
20
21 Unless the database is extremely small, it is highly recommended that
22 basename.dict be compressed with /usr/bin/dictzip to create base‐
23 name.dict.dz. (dictzip is included in the dictd source package.)
24
25 FILE may be in any of the several formats described by the format
26 options -c5, -t, -e, -f, -h, -j, -p, -i or -I. Exactly one of these
27 options must be given.
28
29 dictfmt prepends several headers are to the .dict file. The 00-data‐
30 base-url header gives the value of the -u option as the URL of the site
31 from which the original database was obtained. The 00-database-short
32 header gives the value of the -s option as the short name of the dic‐
33 tionary. (This "short name" is the identifying name given by the
34 "dict- D" option.) If the -u and/or -s options are omitted, these val‐
35 ues will be shown as "unknown", which is undesirable for a publicly
36 distributed database.
37
38 The date of conversion (formatting) is given in the 00-database-info
39 header. All text in the input file prior to the first headword (as
40 defined by the appropriate formatting option) is appended to this
41 header. All text in the input file following a headword, up to the
42 next headword, is copied unchanged to the .dict file.
43
44
46 -c5 FILE is formatted with headwords preceded by 5 or more under‐
47 score characters (_) and a blank line. All text until the next
48 headword is considered the definition. Any leading `@' charac‐
49 ters are stripped out, but the file is otherwise unchanged. This
50 option was written to format the CIA WORLD FACTBOOK 1995.
51
52 -t -c5, --without-info and --without-headword options are implied.
53 Use this option, if an input database comes from dictunformat
54 utility.
55
56 -e FILE is in html format, with the headword tagged as bold.
57 (<B>headword - </B>)
58 This option was written to format EASTON'S 1897 BIBLE DICTIO‐
59 NARY. A typical entry from Easton is:
60
61 <A NAME="T0000005">
62 <B>Abagtha - </B>
63 one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
64 2:21).
65
66 This is converted to:
67 Abagtha
68 one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
69 2:21).
70
71 The heading "<A NAME="T0000005"> is omitted, and the headword
72 `Abagtha' is indexed.
73
74 NOTE: This option should be used with caution. It removes sev‐
75 eral html tags (enough to format Easton properly), but not all.
76 The Makefile that was originally written to format dict-easton
77 uses sed scripts to modify certain cross reference tags. It may
78 be necessary to pipe the input file through a sed script, or
79 hack the source of dictfmt in order to properly format other
80 html databases.
81
82 -f FILE is formatted with the headwords starting in column 0, with
83 the definition indented at least one space (or tab character) on
84 subsequent lines. The third line starting in column 0 is taken
85 as the first headword , and the first two lines starting in col‐
86 umn 0 are treated as part of the 00-database-info header. This
87 option was written to format the F.O.L.D.O.C.
88
89 -h FILE is formatted with the headwords starting in column 0, fol‐
90 lowed by a comma, with the definition continuing on the same
91 line. All text before the first single character line is
92 included in 00-database-info header, and lines with only one
93 character are omitted from the .dict file. The first headword
94 is on the line following the first single character line. The
95 headword is indexed; the text of the file is not changed. This
96 option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.
97
98 -j FILE is formatted with headwords starting in col 0, enclosed in
99 colons, followed by the definition. The colons surrounding the
100 headword are removed, and the headword is indexed. Lines begin‐
101 ning with '*', '=', or '-' are also removed. All text before
102 the first headword is included in the headers. This option was
103 written to format the JARGON FILE.
104 NOTE: Some recent versions of the JARGON FILE had three blanks
105 inserted before the first colon at each headword. These must be
106 removed before processing with dictfmt. (sed scripts have been
107 used for this purpose. ed, awk, or perl scripts are also possi‐
108 ble.)
109
110 -p FILE is formatted with `%h' in column 0, followed by a blank,
111 followed by the headword, optionally followed by a line contain‐
112 ing `%d' in column 0. The definition starts on the following
113 line. The first line beginning ´%h´ and any lines beginning
114 '%d' are stripped from the .dict file, and '%h ' is stripped
115 from in front of the headword. All text before the first head‐
116 word is included in the headers. The second line beginning '%h'
117 is taken as the first headword. This option was written to for‐
118 mat Jay Kominek's elements database.
119
120 -i -I These two options are different from all other formatting
121 options. They are intended to resort (according to dictd
122 requirement) an .index file given on stdin. That is .dict file
123 is not generated at all. Only resorting is made. Three- or
124 four-column .index like input is expected. -i expects decimal
125 offset and length, while -I expects them in base64 format.
126
128 -u url Specifies the URL of the site from which the raw database was
129 obtained. If this option is specified, 00-database-url headword
130 and appropriate definition will be ignored.
131
132 -s name
133 Specifies the name and, optionally, the version and date, of the
134 database. (If this contains spaces, it must be quoted.) If
135 this option is specified, 00-database-short headword and appro‐
136 priate definition will be ignored.
137
138 -L display license and copyright information
139
140 -V display version information
141
142 -D output debugging information
143
144 --help display a help message
145
146 --locale locale
147 Specifies the locale used for sorting. If no locale is speci‐
148 fied, the "C" locale is used. For using UTF-8 mode, --utf8 is
149 needed.
150
151 --8bit generates database in 8-bit mode, see --locale option also.
152 Note: This option is deprecated. Use it for creating 8-bit
153 (non-UTF8) dictionaries only. In order to create UTF-8 dictio‐
154 nary, use --utf8 option instead.
155
156 --utf8 If specified, UTF-8 database is created.
157
158 --allchars
159 Specifies that all characters should be used for the search, by
160 default only alphabetic, numeric characters and spaces are put
161 to .index file and therefore are used in search. Creates the
162 special entry 00-database-allchars.
163
164 --case-sensitive
165 makes the search case sensitive. Creates the special entry
166 00-database-case-sensitive.
167
168 --headword-separator sep
169 sets the headword separator, which allows several words to have
170 the same definition. For example, if ´--headword-separator %%%'
171 is given, and the input file contains ´autumn%%%fall', both
172 'autumn' and 'fall' will be indexed as headwords, with the same
173 definition.
174
175 --index-data-separator sep
176 sets the index/data separator, which allows to set the first and
177 fourth columns of .index file independently. That is the first
178 column can be treated as an index column (where the MATCH com‐
179 mand searches) and the fourth column as a result column (where
180 the MATCH gets things to be returned), and they (1-st and 4-th
181 columns) are completely independant of each other. The default
182 value for this separator is ASCII symbol " \034".
183
184 --break-headwords
185 multiple headwords will be written on separate lines in the
186 .dict file. For use with '--headword-separator.
187
188 --index-keep-headword
189 When --utf-8 is specified headwords are lowercased and non-
190 alphanumeric characters are removed from it before saving to
191 .index file in order to simplify the search. When --index-keep-
192 headword option is used fourth column is created (if necessary)
193 in .index file, and contains an original headword which is
194 returned by MATCH command. This option may be useful to prevent
195 converting " AT&T" to " ATT" or to keep proper nouns with upper‐
196 cased first letter.
197
198 --without-headword
199 headwords will not be included in .dict file
200
201 --without-header
202 header will not be copied to DB info entry
203
204 --without-url
205 URL will not be copied to DB info entry
206
207 --without-time
208 time of creation will not be copied to DB info entry
209
210 --without-ver
211 By default dictfmt creates a special entry 00-database-dictfmt-
212 X.Y.Z that contains (in .dict file) dictfmt version in format
213 dictfmt-X.Y.Z. This option suppresses this.
214
215 --without-info
216 DB info entry will not be created. This may be useful if
217 00-database-info headword is expected from stdin (dictunformat
218 outputs it).
219
220 --columns columns
221 By default dictfmt wraps strings read from stdin to 72 columns.
222 This option changes this default. If it is set to zero or nega‐
223 tive value, wrapping is off.
224
225 --default-strategy strategy
226 Sets the default search strategy for the database. It will be
227 used instead of strategy '.'. Special entry 00-data‐
228 base-default-strategy is created for this purpose. This option
229 may be useful, for example, for dictionaries containing mainly
230 phrases but the single words. In any case, use this option if
231 you are absolutely sure what you are doing.
232
233 --mime-header mime_header
234 When client sends OPTION MIME command to the dictd , definitions
235 found in this database are prepended by the specified MIME
236 header. Creates the special entry 00-database-mime-header.
237
239 dictfmt was written by Rik Faith (faith@cs.unc.edu) as part of the
240 dict-misc package. dictfmt is distributed under the terms of the GNU
241 General Public License. If you need to distribute under other terms,
242 write to the author.
243
245 This manual page was written by Robert D. Hilliard
246 <hilliard@debian.org> .
247
249 dict(1), dictd(8), dictzip(1), dictunformat(1), http://www.dict.org,
250 RFC 2229
251
252
253
254 25 December 2000 DICTFMT(1)