1DICTFMT(1)                                                          DICTFMT(1)
2
3
4

NAME

6       dictfmt - formats a DICT protocol dictionary database
7

SYNOPSIS

9       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
10

DESCRIPTION

12       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
13       named basename.dict, that conforms to the DICT protocol.  It also  cre‐
14       ates  an  index  file  named  basename.index.  By default, the index is
15       sorted according to the C locale, and only alphanumeric characters  and
16       spaces  are  used  in  sorting,  however  this  may be changed with the
17       --locale and --allchars options.  ( basename is commonly chosen to cor‐
18       respond to the basename of FILE , but this is not mandatory.)
19
20       Unless  the  database is extremely small, it is highly recommended that
21       basename.dict be  compressed  with  /usr/bin/dictzip  to  create  base‐
22       name.dict.dz.  (dictzip is included in the dictd source package.)
23
24       FILE  may  be  in  any  of  the several formats described by the format
25       options -c5, -t, -e, -f, -h, -j, or -p.  Exactly one of  these  options
26       must be given.
27
28       dictfmt  prepends  several headers are to the .dict file.  The 00-data‐
29       base-url header gives the value of the -u option as the URL of the site
30       from  which  the original database was obtained.  The 00-database-short
31       header gives the value of the -s option as the short name of  the  dic‐
32       tionary.   (This  "short  name"  is  the  identifying name given by the
33       "dict- D" option.)  If the -u and/or -s options are omitted, these val‐
34       ues  will  be  shown  as "unknown", which is undesirable for a publicly
35       distributed database.
36
37       The date of conversion (formatting) is given  in  the  00-database-info
38       header.   All  text  in  the input file prior to the first headword (as
39       defined by the appropriate  formatting  option)  is  appended  to  this
40       header.   All  text  in  the input file following a headword, up to the
41       next headword, is copied unchanged to the .dict file.
42
43

FORMATTING OPTIONS

45       -c5    FILE is formatted with headwords preceded by 5  or  more  under‐
46              score  characters (_) and a blank line.  All text until the next
47              headword is considered the definition.  Any leading `@'  charac‐
48              ters are stripped out, but the file is otherwise unchanged. This
49              option was written to format the CIA WORLD FACTBOOK 1995.
50
51       -t     -c5, --without-info and --without-headword options are  implied.
52              Use  this  option,  if an input database comes from dictunformat
53              utility.
54
55       -e     FILE is in html  format,  with  the  headword  tagged  as  bold.
56              (<B>headword - </B>)
57              This  option  was  written to format EASTON'S 1897 BIBLE DICTIO‐
58              NARY.  A typical entry from Easton is:
59
60              <A NAME="T0000005">
61              <B>Abagtha - </B>
62              one of the seven eunuchs  in  Ahasuerus's  court  (Esther  1:10;
63              2:21).
64
65              This is converted to:
66              Abagtha
67                 one  of  the seven eunuchs in Ahasuerus's court (Esther 1:10;
68              2:21).
69
70              The heading "<A NAME="T0000005"> is omitted,  and  the  headword
71              `Abagtha' is indexed.
72
73              NOTE:  This option should be used with caution.  It removes sev‐
74              eral html tags (enough to format Easton properly), but not  all.
75              The  Makefile  that was originally written to format dict-easton
76              uses sed scripts to modify certain cross reference tags.  It may
77              be  necessary  to  pipe  the input file through a sed script, or
78              hack the source of dictfmt in order  to  properly  format  other
79              html databases.
80
81       -f     FILE  is formatted with the headwords starting in column 0, with
82              the definition indented at least one space (or tab character) on
83              subsequent  lines.  The third line starting in column 0 is taken
84              as the first headword , and the first two lines starting in col‐
85              umn  0 are treated as part of the 00-database-info header.  This
86              option was written to format the F.O.L.D.O.C.
87
88       -h     FILE is formatted with the headwords starting in column 0,  fol‐
89              lowed  by  a  comma,  with the definition continuing on the same
90              line.  All text  before  the  first  single  character  line  is
91              included  in  00-database-info  header,  and lines with only one
92              character are omitted from the .dict file.  The  first  headword
93              is  on  the line following the first single character line.  The
94              headword is indexed; the text of the file is not changed.   This
95              option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.
96
97       -j     FILE  is formatted with headwords starting in col 0, enclosed in
98              colons, followed by the definition.  The colons surrounding  the
99              headword are removed, and the headword is indexed.  Lines begin‐
100              ning with '*', '=', or '-' are also removed.   All  text  before
101              the  first headword is included in the headers.  This option was
102              written to format the JARGON FILE.
103              NOTE: Some recent versions of the JARGON FILE had  three  blanks
104              inserted before the first colon at each headword.  These must be
105              removed before processing with dictfmt.  (sed scripts have  been
106              used  for this purpose. ed, awk, or perl scripts are also possi‐
107              ble.)
108
109       -p     FILE is formatted with `%h' in column 0, followed  by  a  blank,
110              followed by the headword, optionally followed by a line contain‐
111              ing `%d' in column 0.  The definition starts  on  the  following
112              line.   The  first  line beginning ´%h´ and any lines beginnning
113              '%d' are stripped from the .dict file, and  '%h  '  is  stripped
114              from  in front of the headword.  All text before the first head‐
115              word is included in the headers.  The second line beginning '%h'
116              is taken as the first headword.  This option was written to for‐
117              mat Jay Kominek's elements database.
118
119

OPTIONS

121       -u url Specifies the URL of the site from which the  raw  database  was
122              obtained.    If   this   option   is   specified,   00-database-
123              url/00databaseurl headword and appropriate  definition  will  be
124              ignored.
125
126       -s name
127              Specifies the name and, optionally, the version and date, of the
128              database.  (If this contains spaces, it  must  be  quoted.)   If
129              this   option  is  specified,  00-database-short/00databaseshort
130              headword and appropriate definition will be ignored.
131
132       -L     display license and copyright information
133
134       -V     display version information
135
136       -D     output debugging information
137
138       --help display a help message
139
140       --locale locale
141              specifies the locale used for sorting.  if no locale  is  speci‐
142              fied, the "C" locale is used.
143
144       --allchars
145              use  all characters (not only alphanumeric and space) in sorting
146              the index
147
148       --headword-separator sep
149              sets the headword separator, which allows several words to  have
150              the same definition.  For example, if ´--headword-separator %%%'
151              is given, and the  input  file  contains  ´autumn%%%fall',  both
152              'autumn' and 'fall' will be indexed as  headwords, with the same
153              definition.
154
155       --break-headwords
156              multiple headwords will be written  on  separate  lines  in  the
157              .dict file.  For use with '--headword-separator.
158
159       --without-headword
160              headwords will not be included in .dict file
161
162       --without-header
163              header will not be copied to DB info entry
164
165       --without-url
166              URL will not be copied to DB info entry
167
168       --without-time
169              time of creation will not be copied to DB info entry
170
171       --without-info
172              DB  info  entry  will  not  be  created.   This may be useful if
173              00-database-info headword is expected from  stdin  (dictunformat
174              outputs it).
175
176       --columns columns
177              By  default dictfmt wraps strings read from stdin to 72 columns.
178              This option changes this default. If it is set to zero or  nega‐
179              tive value, wrapping is off.
180
181       --default-strategy strategy
182              Sets  the  default search strategy for the database.  It will be
183              used  instead  of  strategy   '.'.    Special   entry   00-data‐
184              base-default-strategy  is created for this purpose.  This option
185              may be useful, for example, for dictionaries  containing  mainly
186              phrases  but  the single words.  In any case, use this option if
187              you are absolutely sure what you are doing.
188

CREDITS

190       dictfmt was written by Rik Faith  (faith@cs.unc.edu)  as  part  of  the
191       dict-misc  package.   dictfmt is distributed under the terms of the GNU
192       General Public License.  If you need to distribute under  other  terms,
193       write to the author.
194

AUTHOR

196       This    manual    page    was    written    by   Robert   D.   Hilliard
197       <hilliard@debian.org> .
198

SEE ALSO

200       dict(1), dictd(8),  dictzip(1),  dictunformat(1),  http://www.dict.org,
201       RFC 2229
202
203
204
205                               25 December 2000                     DICTFMT(1)
Impressum