1DICTFMT(1) DICTFMT(1)
2
3
4
6 dictfmt - formats a DICT protocol dictionary database
7
9 dictfmt -c5|-t|-e|-f|-h|-j|-p [options] basename
10
12 dictfmt takes a file, FILE, on stdin, and creates a dictionary database
13 named basename.dict, that conforms to the DICT protocol. It also cre‐
14 ates an index file named basename.index. By default, the index is
15 sorted according to the C locale, and only alphanumeric characters and
16 spaces are used in sorting, however this may be changed with the
17 --locale and --allchars options. ( basename is commonly chosen to cor‐
18 respond to the basename of FILE , but this is not mandatory.)
19
20 Unless the database is extremely small, it is highly recommended that
21 basename.dict be compressed with /usr/bin/dictzip to create base‐
22 name.dict.dz. (dictzip is included in the dictd source package.)
23
24 FILE may be in any of the several formats described by the format
25 options -c5, -t, -e, -f, -h, -j, or -p. Exactly one of these options
26 must be given.
27
28 dictfmt prepends several headers are to the .dict file. The 00-data‐
29 base-url header gives the value of the -u option as the URL of the site
30 from which the original database was obtained. The 00-database-short
31 header gives the value of the -s option as the short name of the dic‐
32 tionary. (This "short name" is the identifying name given by the
33 "dict- D" option.) If the -u and/or -s options are omitted, these val‐
34 ues will be shown as "unknown", which is undesirable for a publicly
35 distributed database.
36
37 The date of conversion (formatting) is given in the 00-database-info
38 header. All text in the input file prior to the first headword (as
39 defined by the appropriate formatting option) is appended to this
40 header. All text in the input file following a headword, up to the
41 next headword, is copied unchanged to the .dict file.
42
43
45 -c5 FILE is formatted with headwords preceded by 5 or more under‐
46 score characters (_) and a blank line. All text until the next
47 headword is considered the definition. Any leading `@' charac‐
48 ters are stripped out, but the file is otherwise unchanged. This
49 option was written to format the CIA WORLD FACTBOOK 1995.
50
51 -t -c5, --without-info and --without-headword options are implied.
52 Use this option, if an input database comes from dictunformat
53 utility.
54
55 -e FILE is in html format, with the headword tagged as bold.
56 (<B>headword - </B>)
57 This option was written to format EASTON'S 1897 BIBLE DICTIO‐
58 NARY. A typical entry from Easton is:
59
60 <A NAME="T0000005">
61 <B>Abagtha - </B>
62 one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
63 2:21).
64
65 This is converted to:
66 Abagtha
67 one of the seven eunuchs in Ahasuerus's court (Esther 1:10;
68 2:21).
69
70 The heading "<A NAME="T0000005"> is omitted, and the headword
71 `Abagtha' is indexed.
72
73 NOTE: This option should be used with caution. It removes sev‐
74 eral html tags (enough to format Easton properly), but not all.
75 The Makefile that was originally written to format dict-easton
76 uses sed scripts to modify certain cross reference tags. It may
77 be necessary to pipe the input file through a sed script, or
78 hack the source of dictfmt in order to properly format other
79 html databases.
80
81 -f FILE is formatted with the headwords starting in column 0, with
82 the definition indented at least one space (or tab character) on
83 subsequent lines. The third line starting in column 0 is taken
84 as the first headword , and the first two lines starting in col‐
85 umn 0 are treated as part of the 00-database-info header. This
86 option was written to format the F.O.L.D.O.C.
87
88 -h FILE is formatted with the headwords starting in column 0, fol‐
89 lowed by a comma, with the definition continuing on the same
90 line. All text before the first single character line is
91 included in 00-database-info header, and lines with only one
92 character are omitted from the .dict file. The first headword
93 is on the line following the first single character line. The
94 headword is indexed; the text of the file is not changed. This
95 option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.
96
97 -j FILE is formatted with headwords starting in col 0, enclosed in
98 colons, followed by the definition. The colons surrounding the
99 headword are removed, and the headword is indexed. Lines begin‐
100 ning with '*', '=', or '-' are also removed. All text before
101 the first headword is included in the headers. This option was
102 written to format the JARGON FILE.
103 NOTE: Some recent versions of the JARGON FILE had three blanks
104 inserted before the first colon at each headword. These must be
105 removed before processing with dictfmt. (sed scripts have been
106 used for this purpose. ed, awk, or perl scripts are also possi‐
107 ble.)
108
109 -p FILE is formatted with `%h' in column 0, followed by a blank,
110 followed by the headword, optionally followed by a line contain‐
111 ing `%d' in column 0. The definition starts on the following
112 line. The first line beginning ´%h´ and any lines beginnning
113 '%d' are stripped from the .dict file, and '%h ' is stripped
114 from in front of the headword. All text before the first head‐
115 word is included in the headers. The second line beginning '%h'
116 is taken as the first headword. This option was written to for‐
117 mat Jay Kominek's elements database.
118
119
121 -u url Specifies the URL of the site from which the raw database was
122 obtained. If this option is specified, 00-database-
123 url/00databaseurl headword and appropriate definition will be
124 ignored.
125
126 -s name
127 Specifies the name and, optionally, the version and date, of the
128 database. (If this contains spaces, it must be quoted.) If
129 this option is specified, 00-database-short/00databaseshort
130 headword and appropriate definition will be ignored.
131
132 -L display license and copyright information
133
134 -V display version information
135
136 -D output debugging information
137
138 --help display a help message
139
140 --locale locale
141 specifies the locale used for sorting. if no locale is speci‐
142 fied, the "C" locale is used.
143
144 --allchars
145 use all characters (not only alphanumeric and space) in sorting
146 the index
147
148 --headword-separator sep
149 sets the headword separator, which allows several words to have
150 the same definition. For example, if ´--headword-separator %%%'
151 is given, and the input file contains ´autumn%%%fall', both
152 'autumn' and 'fall' will be indexed as headwords, with the same
153 definition.
154
155 --break-headwords
156 multiple headwords will be written on separate lines in the
157 .dict file. For use with '--headword-separator.
158
159 --without-headword
160 headwords will not be included in .dict file
161
162 --without-header
163 header will not be copied to DB info entry
164
165 --without-url
166 URL will not be copied to DB info entry
167
168 --without-time
169 time of creation will not be copied to DB info entry
170
171 --without-info
172 DB info entry will not be created. This may be useful if
173 00-database-info headword is expected from stdin (dictunformat
174 outputs it).
175
176 --columns columns
177 By default dictfmt wraps strings read from stdin to 72 columns.
178 This option changes this default. If it is set to zero or nega‐
179 tive value, wrapping is off.
180
181 --default-strategy strategy
182 Sets the default search strategy for the database. It will be
183 used instead of strategy '.'. Special entry 00-data‐
184 base-default-strategy is created for this purpose. This option
185 may be useful, for example, for dictionaries containing mainly
186 phrases but the single words. In any case, use this option if
187 you are absolutely sure what you are doing.
188
190 dictfmt was written by Rik Faith (faith@cs.unc.edu) as part of the
191 dict-misc package. dictfmt is distributed under the terms of the GNU
192 General Public License. If you need to distribute under other terms,
193 write to the author.
194
196 This manual page was written by Robert D. Hilliard
197 <hilliard@debian.org> .
198
200 dict(1), dictd(8), dictzip(1), dictunformat(1), http://www.dict.org,
201 RFC 2229
202
203
204
205 25 December 2000 DICTFMT(1)