1DICTFMT(1)                                                          DICTFMT(1)
2
3
4

NAME

6       dictfmt - formats a DICT protocol dictionary database
7

SYNOPSIS

9       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
10       dictfmt  -i|-I [options]
11

DESCRIPTION

13       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
14       named basename.dict, that conforms to the DICT protocol.  It also  cre‐
15       ates  an  index  file  named  basename.index.  By default, the index is
16       sorted according to the C locale, and only alphanumeric characters  and
17       spaces  are  used  in  sorting,  however  this  may be changed with the
18       --locale and --allchars options.  ( basename is commonly chosen to cor‐
19       respond to the basename of FILE , but this is not mandatory.)
20
21       Unless  the  database is extremely small, it is highly recommended that
22       basename.dict be  compressed  with  /usr/bin/dictzip  to  create  base‐
23       name.dict.dz.  (dictzip is included in the dictd source package.)
24
25       FILE  may  be  in  any  of  the several formats described by the format
26       options -c5, -t, -e, -f, -h, -j, -p, -i or -I.  Exactly  one  of  these
27       options must be given.
28
29       dictfmt  prepends  several headers are to the .dict file.  The 00-data‐
30       base-url header gives the value of the -u option as the URL of the site
31       from  which  the original database was obtained.  The 00-database-short
32       header gives the value of the -s option as the short name of  the  dic‐
33       tionary.   (This  "short  name"  is  the  identifying name given by the
34       "dict- D" option.)  If the -u and/or -s options are omitted, these val‐
35       ues  will  be  shown  as "unknown", which is undesirable for a publicly
36       distributed database.
37
38       The date of conversion (formatting) is given  in  the  00-database-info
39       header.   All  text  in  the input file prior to the first headword (as
40       defined by the appropriate  formatting  option)  is  appended  to  this
41       header.   All  text  in  the input file following a headword, up to the
42       next headword, is copied unchanged to the .dict file.
43
44

FORMATTING OPTIONS

46       -c5    FILE is formatted with headwords preceded by 5  or  more  under‐
47              score  characters (_) and a blank line.  All text until the next
48              headword is considered the definition.  Any leading `@'  charac‐
49              ters are stripped out, but the file is otherwise unchanged. This
50              option was written to format the CIA WORLD FACTBOOK 1995.
51
52       -t     -c5, --without-info and --without-headword options are  implied.
53              Use  this  option,  if an input database comes from dictunformat
54              utility.
55
56       -e     FILE is in html  format,  with  the  headword  tagged  as  bold.
57              (<B>headword - </B>)
58              This  option  was  written to format EASTON'S 1897 BIBLE DICTIO‐
59              NARY.  A typical entry from Easton is:
60
61              <A NAME="T0000005">
62              <B>Abagtha - </B>
63              one of the seven eunuchs  in  Ahasuerus's  court  (Esther  1:10;
64              2:21).
65
66              This is converted to:
67              Abagtha
68                 one  of  the seven eunuchs in Ahasuerus's court (Esther 1:10;
69              2:21).
70
71              The heading "<A NAME="T0000005"> is omitted,  and  the  headword
72              `Abagtha' is indexed.
73
74              NOTE:  This option should be used with caution.  It removes sev‐
75              eral html tags (enough to format Easton properly), but not  all.
76              The  Makefile  that was originally written to format dict-easton
77              uses sed scripts to modify certain cross reference tags.  It may
78              be  necessary  to  pipe  the input file through a sed script, or
79              hack the source of dictfmt in order  to  properly  format  other
80              html databases.
81
82       -f     FILE  is formatted with the headwords starting in column 0, with
83              the definition indented at least one space (or tab character) on
84              subsequent  lines.  The third line starting in column 0 is taken
85              as the first headword , and the first two lines starting in col‐
86              umn  0 are treated as part of the 00-database-info header.  This
87              option was written to format the F.O.L.D.O.C.
88
89       -h     FILE is formatted with the headwords starting in column 0,  fol‐
90              lowed  by  a  comma,  with the definition continuing on the same
91              line.  All text  before  the  first  single  character  line  is
92              included  in  00-database-info  header,  and lines with only one
93              character are omitted from the .dict file.  The  first  headword
94              is  on  the line following the first single character line.  The
95              headword is indexed; the text of the file is not changed.   This
96              option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.
97
98       -j     FILE  is formatted with headwords starting in col 0, enclosed in
99              colons, followed by the definition.  The colons surrounding  the
100              headword are removed, and the headword is indexed.  Lines begin‐
101              ning with '*', '=', or '-' are also removed.   All  text  before
102              the  first headword is included in the headers.  This option was
103              written to format the JARGON FILE.
104              NOTE: Some recent versions of the JARGON FILE had  three  blanks
105              inserted before the first colon at each headword.  These must be
106              removed before processing with dictfmt.  (sed scripts have  been
107              used  for this purpose. ed, awk, or perl scripts are also possi‐
108              ble.)
109
110       -p     FILE is formatted with `%h' in column 0, followed  by  a  blank,
111              followed by the headword, optionally followed by a line contain‐
112              ing `%d' in column 0.  The definition starts  on  the  following
113              line.   The  first  line  beginning ´%h´ and any lines beginning
114              '%d' are stripped from the .dict file, and  '%h  '  is  stripped
115              from  in front of the headword.  All text before the first head‐
116              word is included in the headers.  The second line beginning '%h'
117              is taken as the first headword.  This option was written to for‐
118              mat Jay Kominek's elements database.
119
120       -i -I  These two  options  are  different  from  all  other  formatting
121              options.   They  are  intended  to  resort  (according  to dictd
122              requirement) an .index file given on stdin.  That is .dict  file
123              is  not  generated  at  all.  Only resorting is made.  Three- or
124              four-column .index like input is expected.  -i  expects  decimal
125              offset and length, while -I expects them in base64 format.
126

OPTIONS

128       -u url Specifies  the  URL  of the site from which the raw database was
129              obtained.  If this option is specified, 00-database-url headword
130              and appropriate definition will be ignored.
131
132       -s name
133              Specifies the name and, optionally, the version and date, of the
134              database.  (If this contains spaces, it  must  be  quoted.)   If
135              this  option is specified, 00-database-short headword and appro‐
136              priate definition will be ignored.
137
138       -L     display license and copyright information
139
140       -V     display version information
141
142       -D     output debugging information
143
144       --help display a help message
145
146       --locale locale
147              Specifies the locale used for sorting.  If no locale  is  speci‐
148              fied,  the  "C"  locale is used. For using UTF-8 mode, --utf8 is
149              needed.
150
151       --8bit generates database in 8-bit mode, see --locale option also.
152              Note: This option is deprecated.   Use  it  for  creating  8-bit
153              (non-UTF8)  dictionaries only.  In order to create UTF-8 dictio‐
154              nary, use --utf8 option instead.
155
156       --utf8 If specified, UTF-8 database is created.
157
158       --allchars
159              Specifies that all characters should be used for the search,  by
160              default  only  alphabetic, numeric characters and spaces are put
161              to .index file and therefore are used  in  search.  Creates  the
162              special entry 00-database-allchars.
163
164       --case-sensitive
165              makes  the  search  case  sensitive.   Creates the special entry
166              00-database-case-sensitive.
167
168       --headword-separator sep
169              sets the headword separator, which allows several words to  have
170              the same definition.  For example, if ´--headword-separator %%%'
171              is given, and the  input  file  contains  ´autumn%%%fall',  both
172              'autumn' and 'fall' will be indexed as  headwords, with the same
173              definition.
174
175       --index-data-separator sep
176              sets the index/data separator, which allows to set the first and
177              fourth  columns  of .index file independently. That is the first
178              column can be treated as an index column (where the  MATCH  com‐
179              mand  searches)  and the fourth column as a result column (where
180              the MATCH gets things to be returned), and they (1-st  and  4-th
181              columns)  are completely independant of each other.  The default
182              value for this separator is ASCII symbol " \034".
183
184       --break-headwords
185              multiple headwords will be written  on  separate  lines  in  the
186              .dict file.  For use with '--headword-separator.
187
188       --index-keep-orig
189              When  --utf-8  is  specified  headwords  are lowercased and non-
190              alphanumeric characters are removed from  it  before  saving  to
191              .index file in order to simplify the search.  When --index-keep-
192              orig option is used fourth column is created (if  necessary)  in
193              .index file, and contains an original headword which is returned
194              by MATCH command.  This option may be useful to prevent convert‐
195              ing  "  AT&T"  to " ATT" or to keep proper nouns with uppercased
196              first letter.
197
198       --without-headword
199              headwords will not be included in .dict file
200
201       --without-header
202              header will not be copied to DB info entry
203
204       --without-url
205              URL will not be copied to DB info entry
206
207       --without-time
208              time of creation will not be copied to DB info entry
209
210       --without-ver
211              By default dictfmt creates a special entry  00-database-dictfmt-
212              X.Y.Z  that  contains  (in .dict file) dictfmt version in format
213              dictfmt-X.Y.Z. This option suppresses this.
214
215       --without-info
216              DB info entry will not  be  created.   This  may  be  useful  if
217              00-database-info  headword  is expected from stdin (dictunformat
218              outputs it).
219
220       --columns columns
221              By default dictfmt wraps strings read from stdin to 72  columns.
222              This  option changes this default. If it is set to zero or nega‐
223              tive value, wrapping is off.
224
225       --default-strategy strategy
226              Sets the default search strategy for the database.  It  will  be
227              used   instead   of   strategy   '.'.   Special  entry  00-data‐
228              base-default-strategy is created for this purpose.  This  option
229              may  be  useful, for example, for dictionaries containing mainly
230              phrases but the single words.  In any case, use this  option  if
231              you are absolutely sure what you are doing.
232
233       --mime-header mime_header
234              When client sends OPTION MIME command to the dictd , definitions
235              found in this database  are  prepended  by  the  specified  MIME
236              header. Creates the special entry 00-database-mime-header.
237

CREDITS

239       dictfmt  was  written  by  Rik  Faith (faith@cs.unc.edu) as part of the
240       dict-misc package.  dictfmt is distributed under the terms of  the  GNU
241       General  Public  License.  If you need to distribute under other terms,
242       write to the author.
243

AUTHOR

245       This   manual   page   was    written    by    Robert    D.    Hilliard
246       <hilliard@debian.org> .
247

SEE ALSO

249       dict(1),  dictd(8),  dictzip(1),  dictunformat(1), http://www.dict.org,
250       RFC 2229
251
252
253
254                               25 December 2000                     DICTFMT(1)
Impressum