1DICTD(8) DICTD(8)
2
3
4
6 dictd - a dictionary database server
7
9 dictd [options]
10
12 dictd is a server for the Dictionary Server Protocol (DICT), a TCP
13 transaction based query/response protocol that allows a client to
14 access dictionary definitions from a set of natural language dictionary
15 databases.
16
17 For security reasons, dictd drops root permissions after startup. If
18 user dictd exists on the system, the daemon will run as that user,
19 group dictd, otherwise it will run as user nobody, group nobody or
20 nogroup (depending on the operating system distribution).
21
22 Since startup time is significant, the server is designed to run con‐
23 tinuously, and should not be run from inetd(8). (However, with a fast
24 processor, it is feasible to do so.)
25
26 Databases are distributed separately from the server.
27
28 By default, dictd assumes that the index files are sorted alphabeti‐
29 cally, and only alphanumeric characters from the 7-bit ASCII character
30 set are used for search. This default may be overridden by a header in
31 the data file. The only such features implemented at this time are the
32 headers "00-database-allchars" which tells dictd that non-alphanumeric
33 characters may also be used for search, the header "00-database-utf8"
34 which indicates that the database uses utf8 encoding, and the "00-data‐
35 base-8bit-new" which indicates that the database is encoded and sorted
36 according to a locale that uses an 8-bit encoding.
37
38 A header "00-database-plugin" may also be present and is used for inte‐
39 grating plugins into dictd. See "dictfmt_plugin --help" and "dictdplu‐
40 gin.h" for more information.
41
42 A header "00-database-virtual" identifies "virtual dictionaries", which
43 are lists of real dictionaries to be searched by dictd.
44
46 For many years, the Internet community has relied on the "webster" pro‐
47 tocol for access to natural language definitions. The webster protocol
48 supports access to a single dictionary and (optionally) to a single
49 thesaurus. In recent years, the number of publicly available webster
50 servers on the Internet has dramatically decreased.
51
52 Fortunately, several freely-distributable dictionaries and lexicons
53 have recently become available on the Internet. However, these freely-
54 distributable databases are not accessible via a uniform interface, and
55 are not accessible from a single site. They are often small and incom‐
56 plete individually, but would collectively provide an interesting and
57 useful database of English words. Examples include the Jargon file,
58 the WordNet database, MICRA's version of the 1913 Webster's Revised
59 Unabridged Dictionary, and the Free Online Dictionary of Computing.
60 (See the DICT protocol specification (RFC) for references.) Translat‐
61 ing and non-English dictionaries are also becoming available (for exam‐
62 ple, the FOLDOC dictionary is being translated into Spanish).
63
64 The webster protocol is not suitable for providing access to a large
65 number of separate dictionary databases, and extensions to the current
66 webster protocol were not felt to be a clean solution to the dictionary
67 database problem.
68
69 The DICT protocol is designed to provide access to multiple databases.
70 Word definitions can be requested, the word index can be searched
71 (using an easily extended set of algorithms), information about the
72 server can be provided (e.g., which index search strategies are sup‐
73 ported, or which databases are available), and information about a
74 database can be provided (e.g., copyright, citation, or distribution
75 information). Further, the DICT protocol has hooks that can be used to
76 restrict access to some or all of the databases.
77
78 dictd(8) is a server that implements the DICT protocol. Bret Martin
79 implemented another server, and several people (including Bret and
80 myself) have implemented clients in a variety of languages.
81
83 -V or --version
84 Display version information.
85
86 --license
87 Display copyright and license information.
88
89 -h or --help
90 Display help information.
91
92 -v or --verbose or -dverbose
93 Be verbose.
94
95 -c file or --config file
96 Specify configuration file. The default is /etc/dictd.conf ,
97 but may be changed in the defs.h file at compile time
98 (DICTD_CONFIG_FILE).
99
100 -p port or --port port
101 Specifies the port (e.g., 2628). The default is 2628, as speci‐
102 fied in the DICT Protocol RFC, but may be changed in the defs.h
103 file at compile time (DICT_DEFAULT_SERVICE).
104
105 -i or --inetd
106 Communicate on standard input/output, suitable for use from
107 inetd. Although, due to its rather large startup time, this
108 daemon was not intended to run from inetd, with a fast processor
109 it is feasible to do so. This option also implies --fast-start.
110
111 --pp prog
112 Sets a preprocessor for configuarion file. like m4 or cpp .
113 See example_complex.conf file from distribution. By default con‐
114 figuration file is parsed without preprocessor.
115
116 --depth length
117 Specify the queue length for listen(2). Specifies the number of
118 pending socket connections which are queued by the operating
119 system. Some operating systems may silently limit this value to
120 5 (older BSD systems) or 128 (Linux). The default is 10 but may
121 be changed in the defs.h file at compile time
122 (DICT_QUEUE_DEPTH).
123
124 --delay seconds
125 Specifies the number of seconds a client may be idle before the
126 server will close the connection. Idle time is defined to be
127 the time the server is waiting for input and does not include
128 the time the server spends searching the database. Connections
129 are closed without warning since no provision for premature con‐
130 nection termination is specified in the DICT protocol RFC. The
131 default is 600 seconds (10 minutes), but may be changed in the
132 defs.h file at compile time (DICT_DEFAULT_DELAY).
133
134 --facility facility
135 Specifies the syslog facility to use. The use of this option
136 implies the -s option to turn on logging via syslog. When the
137 operating system libraries support SYSLOG_NAMES, the names used
138 for this option should be those listed in syslog.conf(5). Oth‐
139 erwise, the following names are used (assuming the particular
140 facility is defined in the header files): auth, authpriv, cron,
141 daemon, ftp, kern, lpr, mail, news, syslog, user, uucp, local0,
142 local1, local2, local3, local4, local5, local6, and local7.
143
144 -f or --force
145 Force the daemon to start even if an instance of the daemon is
146 already running. (This is of little value unless a non-default
147 port is specified with -p, since, if one instance is bound to a
148 port, the second one fails when it can not bind to the port.)
149
150 --limit children
151 Specifies the number of daemons that may be running simultane‐
152 ously. Each daemon services a single connection. If the limit
153 is exceeded, a (serialized) connection will be made by the
154 server process, and a response code 420 (server temporarily
155 unavailable) will be sent to the client. This parameter should
156 be adjusted to prevent the server machine from being overloaded
157 by dict clients, but should not be set so low that many clients
158 are denied useful connections. The default is 100, but may be
159 changed in the defs.h file at compile time (DICT_DAEMON_LIMIT).
160
161 --listen-to address
162 Binds socket to the specified address. If you want to allow
163 connections to dict server from localhost only, apply --lis‐
164 ten-to 127.0.0.1
165
166 --locale locale
167 Specifies the locale used for searching. If no locale is speci‐
168 fied, the "C" locale is used. The locale used for the server
169 should be the same as that used for dictfmt when the database
170 was built (specifically, the locale under which the index was
171 sorted). The locale should be specified for both 8-bit and UTF-8
172 formats. If locale contains utf8 or utf-8 substring, UTF-8 for‐
173 mat is expected. Note that if your database is not in ASCII7 or
174 UTF-8 format, then the dictd server will not be compliant to RFC
175 2229.
176
177 NOTE If utf-8 or 8-bit dictionaries are included in the configu‐
178 ration file, and the appropriate --locale has not been speci‐
179 fied, dictd will fail to start. This implies that dictd will
180 not run with both utf-8 and 8-bit dictionaries in the configura‐
181 tion file.
182
183 -s Log using the syslog(3) facility.
184
185 -L file or --logfile file
186 Specify the file for logging. The filename specified is recom‐
187 puted on each use using the strftime(3) call. For example, a
188 filename ending in ".%Y%m%d" will write to log files ending in
189 the year, month, and date that the log entry was written. NOTE:
190 If dictd does not have write permission for this file, it will
191 silently fail.
192
193 -m minutes or --mark minutes
194 How often a timestamp should be logged. (This is effective only
195 if logging has been enabled with the -s or -L option, or with a
196 debugging option.)
197
198 --default-strategy strategy
199 Set the server's default search strategy for MATCH search type.
200 The default is 'lev'. It is also possible to set default strat‐
201 egy per database. See default_strategy keyword in Database
202 specification section.
203
204 --without-strategy strat1,strat2,...
205 Disable specified strategies. By default all search strategies
206 are enabled.
207
208 --add-strategy strat:descr
209 Adds strategy 'strat' with the description 'descr'. A new
210 search strategy may be implemented with a help of plugins.
211
212 --test word or -t word
213 self test -- lookup word
214
215 --test-file file or --ftest file
216 self test -- lookup all words in file
217
218 --test-strategy strategy
219 self test -- set search strategy for --test and --ftest. The
220 default is 'exact'.
221
222 --test-db database
223 self test -- set dictionary to be searched. The default is '*'.
224
225 --test-match
226 self test -- set search type to MATCH. The default is DEFINE.
227
228 --fast-start
229 By default, dictd creates (in memory) additional index to make
230 the search faster. This option disables this behaviour and
231 makes startup faster.
232
233 --without-mmap
234 do not use the mmap() function and read entire files into memory
235 instead. Use this option, if you know exactly what you are
236 doing.
237
238
239 -l option or --log option
240 Specify a logging option. This is effective only if logging has
241 been enabled with the -s or -L option, or logging to the console
242 has been activated with a debugging option (e.g., --debug node‐
243 tach. Only one option may be set with each invocation of this
244 option; however, multiple invocations of this option may be made
245 in one dictd command line. For instance:
246 dictd -s --log stats --log found --log notfound
247 is a valid command line, and sets three logging options.
248
249 Some of the more verbose logging options are used primarily for
250 debugging the server code, and are not practical for normal use.
251
252 server Log server diagnostics. This is extremely verbose.
253
254 connect
255 Log all connections.
256
257 stats Log all children terminations.
258
259 command
260 Log all commands. This is extremely verbose.
261
262 client Log results of CLIENT command.
263
264 found Log all words found in the databases.
265
266 notfound
267 Log all words not found in the databases.
268
269 timestamp
270 When logging to a file, use a full timestamp like that
271 which syslog would produce. Otherwise, no timestamp is
272 made, making the files shorter.
273
274 host Log name of foreign host.
275
276 auth Log authentication failures.
277
278 min Set a minimal number of options. If logging is activated
279 (to a file, or via syslog), and no options are set, then
280 the minimal set of options will be used. If options are
281 set, then only those options specified will be used.
282
283 all Set all of the options.
284
285 none Clear all of the options.
286
287 To facilitate location of interesting information in the log
288 file, entries are marked with initial letters indicating the
289 class of the line being logged:
290
291 I Information about the server, connections, or termination
292 statistics. These lines are generally not designed to be
293 parsed automatically.
294
295 E Error messages.
296
297 C CLIENT command information.
298
299 D Definitions found in the databases searched.
300
301 M Matches found in the database searched.
302
303 N Matches which were not found in the databases searched.
304
305 T Trace of exact line sent by client.
306
307 A Authentication information.
308
309 To preserve anonymity of the client, do not use the connect or
310 host options. Clients may or may not send host information
311 using the CLIENT command, but this should be an option that is
312 selectable on the client side.
313
314 -d option
315 Activate a debugging option. There are several, all of which
316 are only useful to developers. They are documented here for
317 completeness. A list can be obtained interactively by using -d
318 with an illegal option.
319
320 verbose
321 The same as -v or --verbose. Adds verbosity to other
322 options.
323
324 scan Debug the scanner for the configuration file.
325
326 parse Debug the parser for the configuration file.
327
328 search Debug the character folding and binary search routines.
329
330 init Report database initialization.
331
332 port Log client-side port number to the log file.
333
334 lev Debug Levenshtein search algorithm.
335
336 auth Debug the authorization routines.
337
338 nodetach
339 Do not detach as a background process. Implies that a
340 copy of the log file will appear on the standard output.
341
342 nofork Do not fork daemons to service requests. Be a single-
343 threaded server. This option implies nodetach, and is
344 most useful for using a debugger to find the point at
345 which daemon processes are dumping core.
346
347 alt Debugs altcompare in index.c.
348
350 Introduction
351 The configuration file defaults to /etc/dictd.conf but can be
352 specified on the command line with the -c option (see above).
353
354 The configuration file is read into memory at startup, and is
355 not referenced again by dictd unless a signal 1 (SIGHUP) is
356 received, which will cause dictd to reread the configuration
357 file.
358
359 The file is divided into sections. The Site Section should come
360 first, followed by the Access Section, the Database Section, and
361 the User Section. The Database Section is required; the others
362 are optional, but they must be in the order listed here.
363
364 Syntax The following keywords are valid in a configuration file:
365 access, allow, deny, group, database, data, index, filter, pre‐
366 filter, postfilter, name, include, user, authonly, site. Key‐
367 words are case sensitive. String arguments that contain spaces
368 should be surrounded by double quotes. Without quoting, strings
369 may contain alphanumeric characters and _, -, ., and *, but not
370 spaces. Strings can be continued between lines. \", \\, \n,
371 \<NL> are treated as double quote, backslash, new line and no
372 symbol respectively. Comments start with # and extend to the
373 end of the line.
374
375 Site Section
376
377 site string
378 Used to specify the filename for the site information
379 file, a flat text file which will be displayed in
380 response to the SHOW SERVER command. This section, if
381 present, must be first.
382
383 Access Section
384
385 access { access specification }
386 This section, the second if the Site Section is present,
387 contains access restrictions for the server and all of
388 the databases collectively. Per-database control is
389 specified in the Database Section.
390
391 Database Section
392
393 database string { database specification }
394 The string specifies the name of the database (e.g., wn
395 or web1913). (This is an arbitrary name selected by the
396 administrator, and is not necessarily related to the file
397 name or any name listed in the data file. A short, easy
398 to type name is often selected for easy use with dict
399 -d.)
400
401 NOTE: If the files specified in the database specifica‐
402 tion do not exist on the system, dictd may silently fail.
403
404 database_virtual string { virtual database specification }
405 This section specifies the virtual database. The string
406 specifies the name of the database (e.g., en-ru or fren).
407
408 database_plugin string { plugin specification }
409 This section specifies the plugin. The string specifies
410 the name of the database.
411
412 database_exit
413 Excludes following databases from the '*' database. By
414 default '*' means all databases available. Look at
415 'example_virtual.conf' file for example configuration.
416
417 NOTE: If you use 'virtual' dictionaries, you should use
418 this directive, otherwise you will search the same dic‐
419 tionary twice.
420
421 User Section
422
423 user string string
424 The first string specifies the username, and the second
425 string specifies the shared secret for this username.
426 When the AUTH command is used, the client will provide
427 the username and a hashed version of the shared secret.
428 If the shared secret matches, the user is said to have
429 authenticated, and will have access to databases whose
430 access specifications allow that user (by name, or by
431 wildcard). If present, this section must appear last in
432 the configuration file. There may be many user entries.
433 The shared secret should be kept secret, as anyone who
434 has access to it can access the shared databases (assum‐
435 ing access is not denied by domain name).
436
437 Access Specification
438 Access specifications may occur in the Access Section or in the
439 Database Section. The access specification will be described
440 here.
441
442 For allow, deny, and authonly, a star (*) may be used as a wild
443 card that matches any number of characters. A question mark (?)
444 may be used as a wildcard that matches a single character. For
445 example, 10.0.0.* and *.edu are valid strings.
446
447 Further, a range of IP addresses and an IP address followed by a
448 netmask may be specified. For example, 10.0.0.0:10.0.0.255,
449 10.0.0.0/24, and 10.0.0.* all specify the same range of IP num‐
450 bers. Notation cannot be combined on the same line. If the
451 notation does not make sense, access will be denied by default.
452 Use the --debug auth option to debug related problems.
453
454 Note that these specifications take only one string per specifi‐
455 cation line. However, you can have multiple lines of each type.
456
457 The syntax is as follows:
458
459 allow string
460 The string specifies a domain name or IP address which is
461 allowed access to the server (in the Access Section) or
462 to a database (in the Database Section). Note that more
463 than one string is not permitted for a single "allow"
464 line, but more than one "allow" lines are permitted in
465 the configuration file.
466
467 deny string
468 The string specifies a domain name or IP address which is
469 denied access to the server (in the Access Section) or to
470 a database (in the Database Section). Note that if
471 reverse DNS is not working, then only the IP number will
472 be checked. Therefore, it is essential to deny networks
473 based on IP number, since a denial based on domain name
474 may not always be checked.
475
476 authonly string
477 This form is only useful in the Access Section. The
478 string specifies a domain name or IP address which is
479 allowed access to the server but not to any of the data‐
480 bases. All commands are valid except DEFINE, MATCH, and
481 SHOW DB. More specifically AUTH is a valid command, and
482 commands which access the databases are not allowed.
483
484 user string
485 This form is only useful in the Database Section. The
486 string specifies a username that is allowed to access
487 this database after a successful AUTH command is exe‐
488 cuted.
489
490 Database Specification
491 The database specification describes the database:
492
493 data string
494 Specifies the filename for the flat text database. If
495 the filename does not begin with '.' or '/', it is
496 prepended with $datadir/. It is a compile time option.
497 You can change this behaviour by editing Makefile or run‐
498 ning ./configure --datadir=...
499
500 index string
501 Specifies the filename for the index file. Path matter
502 is similar to that described above in "data" option .
503
504 index_suffix string
505 This is optional index file to make 'suffix' search
506 strategy faster (binary search). It is generated by
507 'dictfmt_index2suffix'. Run "dictfmt_index2suffix --help"
508 for more information. Path matter is similar to that
509 described above in "data" option .
510
511 index_word string
512 This is optional index file to make 'word' search strat‐
513 egy faster (binary search). It is generated by
514 'dictfmt_index2word'. Run "dictfmt_index2word --help" for
515 more information. Path matter is similar to that
516 described above in "data" option .
517
518 prefilter string
519 Specifies the prefilter command. When a chunk of the
520 compressed database is read, it will be filtered with
521 this filter before being decompressed. This may be used
522 to provide some additional compression that knows about
523 the data and can provide better compression than the LZ77
524 algorithm used by zlib.
525
526 postfilter string
527 Specifies the postfilter command. When a chunk of the
528 compressed database is read, it will be filtered with
529 this filter before the offset and length for the entry
530 are used to access data. This is provided for symmetry
531 with the prefilter command, and may also be useful for
532 providing additional database compression.
533
534 filter string
535 Specifies the filter command. After the entry is
536 extracted from the database, it will be filtered with
537 this filter. This may be used to provide formatting for
538 the entry (e.g., for html).
539
540 name string
541 Specifies the short name of the database (e.g., "1913
542 Webster's"). If the string begins with @, then it speci‐
543 fies the headword to look up in the dictionary to find
544 the short name of the database. The default is
545 "@00-database-short", but this may be changed in the
546 defs.h file at compile time (DICT_SHORT_ENTRY_NAME).
547
548 info string
549 Specifies the information about database. If the string
550 begins with @, then it specifies the headword to look up
551 in the dictionary to find information. The default is
552 "@00-database-info", but this may be changed in the
553 defs.h file at compile time (DICT_INFO_ENTRY_NAME).
554
555 invisible
556 Makes dictionary invisible to the clients i.e. this dic‐
557 tionary will not be recognized or shown by DEFINE, MATCH,
558 SHOW INFO, SHOW SERVER and SHOW DB commands. If some def‐
559 initions or matches are found in invisible dictionary,
560 the name of the upper visible virtual dictionary is
561 returned. Dictionaries '*' and '!' don't include invisi‐
562 ble ones. NOTE: There is no sense to make dictionary
563 invisible unless it is included to the virtual dictio‐
564 nary.
565
566 disable_strategy string
567 Disables the specified strategy for database. This may
568 be useful for slow dictionaries (plugins) or for dictio‐
569 naries included to virtual ones. For an example see file
570 example_complex.conf.
571
572 default_strategy string
573 Specifies the strategy which will be used if the database
574 is accessed using the strategy '.'. I.e. this directive
575 is the way to set the preferred search strategy per data‐
576 base. For example, instead of strategy lev , the strategy
577 word may be prefered for databases mainly containing the
578 multiword phrases but the single words.
579
580
581 Virtual Database Specification
582 The virtual database specification describes the virtual data‐
583 base:
584
585 database_list string
586 Specifies a list of databases which are included into the
587 virtual database. Database names are in the string and
588 are separated by comma.
589
590 name string
591 Specifies the short name of the database. See database
592 specification
593
594 info string
595 Specifies the information about database. See database
596 specification
597
598 invisible
599 Makes dictionary invisible to the clients. See database
600 specification
601
602 disable_strategy string
603 Disables the specified strategy for database. See data‐
604 base specification
605
606 NOTE: Another way to implement a virtual database is to create
607 database files by dictfmt_virtual executable
608
609 Plugin Specification
610
611 plugin string
612 Specifies a filename of the plugin.
613
614 data string
615 Specifies data for initializing plugin.
616
617 name string
618 Specifies the short name of the database. See Database
619 Specification for more information.
620
621 info string
622 Specifies the information about database. See Database
623 Specification for more information.
624
625 invisible
626 Makes dictionary invisible to the clients. See Database
627 Specification for more information.
628
629 disable_strategy string
630 Disables the specified strategy for database. See Data‐
631 base Specification for more information.
632
633 default_strategy string
634 Sets the default search strategy for database. See Data‐
635 base Specification for more information.
636
637 NOTE: Another way to configure plugin is to create database
638 files by dictfmt_plugin executable
639
640 include string
641 The text of the file "string" (usually a database specification)
642 will be read as if it appeared at this location in the configu‐
643 ration file. Nested includes are not permitted.
644
645
647 When a client connects, the global access specification is scanned, in
648 order, until a specification matches. If no access specification
649 exists, all access is allowed (e.g., the action is the same as if
650 "allow *" was the only item in the specification). For each item, both
651 the hostname and IP are checked. For example, consider the following
652 access specification:
653 allow 10.42.*
654 authonly *.edu
655 deny *
656 With this specification, all clients in the 10.42 network will be
657 allowed access to unrestricted databases; all clients from *.edu sites
658 will be allowed to authenticate, but will be denied access to all data‐
659 bases, even those which are otherwise unrestricted; and all other
660 clients will have their connection terminated immediately. The 10.42
661 network clients can send an AUTH command and gain access to restricted
662 databases. The *.edu clients must send an AUTH command to gain access
663 to any databases, restricted or unrestricted.
664
665 When the AUTH command is sent, the access list for each database is
666 scanned, in order, just as the global access list is scanned. However,
667 after authentication, the client has an associated username. For exam‐
668 ple, consider the following access specification:
669 user u1
670 deny *.com
671 user u2
672 allow *
673 If the client authenticated as u1, then the client will have access to
674 this database, even if the client comes from a *.com site. In con‐
675 trast, if the client authenticated as u2, the client will only have
676 access if it does not come from a *.com site. In this case, the "user
677 u2" is redundant, since that client would also match "allow *".
678
679 Warning: Checks are performed for domain names and for IP addresses.
680 However, if reverse DNS for a specific site is not working, it is pos‐
681 sible that a domain name may not be available for checking. Make sure
682 that all denials use IP addresses. (And consider a future enhancement:
683 if a domain name is not available, should denials that depend on a
684 domain name match anything? This is the more conservative viewpoint,
685 but it is not currently implemented.)
686
688 The DICT standard specifies a few search algorithms that must be imple‐
689 mented, and permits others to be supported on a server-dependent basis.
690 The following search strategies are supported by this server. Note
691 that all strategies are case insensitive. Most ignore non-alphanu‐
692 meric, non-whitespace characters.
693
694 exact An exact match. This algorithm uses a binary search and is one
695 of the fastest search algorithms available.
696
697 lev The Levenshtein algorithm (string edit distance of one). This
698 algorithm searches for all words which are within an edit dis‐
699 tance of one from the target word. An "edit" means an inser‐
700 tion, deletion, or transposition. This is a rapid algorithm for
701 correcting spelling errors, since many spelling errors are
702 within a Levenshtein distance of one from the original word.
703
704 prefix Prefix match. This algorithm also uses a binary search and is
705 very fast.
706
707 re POSIX 1003.2 (modern) regular expression search. Modern regular
708 expressions are the ones used by egrep(1). These regular
709 expressions allow predefined character classes (e.g.,
710 [[:alnum:]], [[:alpha:]], [[:digit:]], and [[:xdigit:]] are use‐
711 ful for this application); uses * to match a sequence 0 or more
712 matches of the previous atom; uses + to match a sequence of 1 or
713 more matches of the previous atom; uses ? to match a sequence of
714 0 or 1 matches of the previous atom; used ^ to match the begin‐
715 ning of a word, uses $ to match the end of a word, and allows
716 nested subexpression and alternation with () and |. For exam‐
717 ple, "(foo|bar)" matches all words that contain either "foo" or
718 "bar". To match these special characters, they must be quoted
719 with two backslashes (due to the quoting characteristics of the
720 server). Warning: Regular expression matches can take 10 to 300
721 times longer than substring matches. On a busy server, with
722 many databases, this can required more than 5 minutes of waiting
723 time, depending on the complexity of the regular expression.
724
725 regexp Old (basic) regular expressions. These regular expressions
726 don't support |, +, or ?. Groups use escaped parentheses.
727 While modern regular expressions are generally easier to use,
728 basic regular expressions have a back reference feature. This
729 can be used to match a second occurrence of something that was
730 already matched. For example, the following expression finds
731 all words that begin and end with the same three letters:
732 ^\\(...\\).*\\1$
733
734 Note the use of the double backslashes to escape the special
735 characters. This is required by the DICT protocol string speci‐
736 fication (a single backslash quotes the next character -- we use
737 two to get a single backslash through to the regular expression
738 engine). Warning: Note that the use of backtracking is even
739 slower than the use of general regular expressions.
740
741 soundex
742 The Soundex algorithm, a classic algorithm for finding words
743 that sound similar to each other. The algorithm encodes each
744 word using the first letter of the word and up to three digits.
745 Since the first letter is known, this search is relatively fast,
746 and it sometimes good for correcting spelling errors when the
747 Levenshtein algorithm doesn't help.
748
749 substring
750 Match a substring anywhere in the headword. This search strat‐
751 egy uses a modified Boyer-Moore-Horspool algorithm. Since it
752 must search the whole index file, it is not as fast as the exact
753 and prefix matches.
754
755 suffix Suffix match. This search strategy also uses a modified Boyer-
756 Moore-Horspool algorithm, and is as fast as the substring
757 search. If the optional index_suffix string file is listed in
758 the configuration file this search is much faster.
759
760 word Match any single word, even if part of a multi-word entry. If
761 the optional index_word string file is listed in the configura‐
762 tion file this search is much faster.
763
765 Databases for dictd are distributed separately. A database consists of
766 two files. One is a flat text file, the other is the index.
767
768 The flat text file contains dictionary entries (or any other suitable
769 data), and the index contains tab-delimited tuples consisting of the
770 headword, the byte offset at which this entry begins in the flat text
771 file, and the length of the entry in bytes. The offset and length are
772 encoded using base 64 encoding using the 64-character subset of Inter‐
773 national Alphabet IA5 discussed in RFC 1421 (printable encoding) and
774 RFC 1522 (base64 MIME). Encoding the offsets in base 64 saves consid‐
775 erable space when compared with the usual base 10 encoding, while still
776 permitting tab characters (ASCII 9) to be used for delimiting fields in
777 a record. Each record ends with a newline (ASCII 10), so the index
778 file is human readable.
779
780 The flat text file may be compressed using gzip(1) (not recommended) or
781 dictzip(1) (highly recommended). Optimal speed will be obtained using
782 an uncompressed file. However, the gzip compression algorithm works
783 very well on plain text, and can result in space savings typically
784 between 60 and 80%. Using a file compressed with gzip(1) is not recom‐
785 mended, however, because random access on the file can only be accom‐
786 plished by serially decompressing the whole file, a process which is
787 prohibitively slow. dictzip(1) uses the same compression algorithm and
788 file format as does gzip(1), but provides a table that can be used to
789 randomly access compressed blocks in the file. The use of 50-64kB
790 blocks for compression typically degrades compression by less than 10%,
791 while maintaining acceptable random access capabilities for all data in
792 the file. As an added benefit, files compressed with dictzip(1) can be
793 decompressed with gzip(1) or zcat(1). (Note: recompressing a dictzip'd
794 file using, for example, znew(1) will destroy the random access charac‐
795 teristics of the file. Always compress data files using dictzip(1).)
796
797
799 SIGHUP causes dictd to reread configuration file and reinitialize data‐
800 bases.
801
802 SIGUSR1 causes dictd to unload databases. Then dictd returns 420 status
803 (instead of 220). To load databases again, send SIGHUP signal. Because
804 database files are mmap'ed(2) , it is impossible to update them while
805 dictd is running. So, if you need to update database files and reread
806 configuration file, first, send SIGUSR1 signal to dictd to unload data‐
807 bases, update files, and then send SUGHUP signal to load them again.
808
809
811 Special thanks to Jean-loup Gailly and Mark Adler for writing the zlib
812 general purpose data compression library. The version contained with
813 dictd is not necessarily an original version and may have been modified
814 (unnecessary files may have been deleted to make the distribution
815 smaller; makefiles may have been modified to ease compilation; see
816 zlib/README.DICT for any significant changes). For more information on
817 zlib, please see the zlib home page at
818 http://www.gzip.org/zlib/
819
820 The key features of the dictzip random-access compression algorithm
821 utilize a documented extension of the gzip format, and do not require
822 any modifications to zlib.
823
824 Special thanks to Henry Spencer for his regex package. The package
825 contained with dictd is not necessarily an original version and may
826 have been modified (unnecessary files may have been deleted to make the
827 distribution smaller; makefiles may have been modified to ease compila‐
828 tion; see regex/README.DICT for any significant changes). For more
829 information on regex, please see
830 ftp://zoo.toronto.edu/pub/regex.shar
831
833 The main source files for the dictd server and the dictzip compression
834 program were written by Rik Faith (faith@dict.org) and are distributed
835 under the terms of the GNU General Public License. If you need to dis‐
836 tribute under other terms, write to the author.
837
838 The main libraries used by these programs (zlib, regex, libmaa) are
839 distributed under different terms, so you may be able to use the
840 libraries for applications which are incompatible with the GPL --
841 please see the copyright notices and license information that come with
842 the libraries for more information, and consult with your attorney to
843 resolve these issues.
844
846 The regular expression searches do not ignore non-whitespace, non-
847 alphanumeric characters as do the other searches. In practice, this
848 isn't much of a problem.
849
850 The 'lev' strategy doesn't work with utf8 dictionaries.
851
853 Conformance of regular expressions (used by 're' and 'regexp' search
854 strategies) to ERE and BRE depends on library you build dictd with.
855
856 Whether 're' and 'regex' strategies support utf8 depends on library you
857 build dictd with.
858
859
861 /etc/dictd.conf
862 /usr/sbin/dictd
863
865 dictfmt(1), dictfmt_virtual(1), dict(1), dictzip(1), gunzip(1),
866 zcat(1), webster(1), RFC 2229
867
868
869
870 29 March 2002 DICTD(8)