1htdump(1)                   General Commands Manual                  htdump(1)
2
3
4

NAME

6       htdump - write out an ASCII-text version of the document database
7

SYNOPSIS

9       htdump [options]
10

DESCRIPTION

12       Htdump writes out an ASCII-text version of the document database in the
13       same form as the -t option of htdig.
14

OPTIONS

16       -a     Use alternate work files. Tells htdump to append .work to  data‐
17              base files, allowing it to operate on a second set of databases.
18
19       -c configfile
20              Use the specified configfile instead of the default.
21
22       -v     Verbose mode. This doesn't have much effect.
23

File Formats

25       Document Database
26              Each  line in the file starts with the document id followed by a
27              list of fieldname : value separated by tabs. The  fields  always
28              appear in the order listed below:
29
30       u      URL
31
32       t      Title
33
34       a      State (0 = normal, 1 = not found, 2 = not indexed, 3 = obsolete)
35
36       m      Last modification time as reported by the server
37
38       s      Size in bytes
39
40       H      Excerpt
41
42       h      Meta description
43
44       l      Time of last retrieval
45
46       L      Count of the links in the document (outgoing links)
47
48       b      Count of the links to the document (incoming links or backlinks)
49
50       c      HopCount of this document
51
52       g      Signature of the document used for duplicate-detection
53
54       e      E-mail address to use for a notification message from htnotify
55
56       n      Date to send out a notification e-mail message
57
58       S      Subject for a notification e-mail message
59
60       d      The   text   of  links  pointing  to  this  document.  (e.g.  <a
61              href="docURL">description</a>)
62
63       A      Anchors in the document (i.e. <A NAME=...)
64
65       Word Database
66              While htdump and  htload  don't  deal  with  the  word  database
67              directly, it's worth mentioning it here because you need to deal
68              with it when copying the ASCII  databases  from  one  system  to
69              another.  The initial word database produced by htdig is already
70              in ASCII format, and a binary  version  of  it  is  produced  by
71              htmerge,  for  use by htsearch. So, when you copy over the ASCII
72              version of the document database produced by htdump, you need to
73              copy  over  the  wordlist  as  well, then run htload to make the
74              binary document database on the target system, followed by  run‐
75              ning htmerge to make the word index.
76
77       Each line in the word list file starts with the word
78              followed  by  a list of fieldname : value separated by tabs. The
79              fields always appear in the order listed below,  with  the  last
80              two being optional:
81
82       i      Document ID
83
84       l      Location of word in document (1 to 1000)
85
86       w      Weight of word based on scoring factors
87
88       c      Count of word's appearances in document, if more than 1
89
90       a      Anchor number if word occurred after a named anchor
91

FILES

93       /etc/htdig/htdig.conf
94              The default configuration file.
95
96       /var/lib/htdig/db.docs
97              The default ASCII document database file.
98
99       /var/lib/htdig/db.wordlist
100              The default ASCII word database file.
101

SEE ALSO

103       Please   refer   to   the   HTML   pages  (in  the  htdig-doc  package)
104       /usr/share/doc/htdig-doc/html/index.html and the manual pages  htdig(1)
105       ,  and  htload(1)  for  a detailed description of ht://Dig and its com‐
106       mands.
107

AUTHOR

109       This manual page was written by Stijn de Bekker, based on the HTML doc‐
110       umentation of ht://Dig.
111
112
113
114                                15 October 2001                      htdump(1)
Impressum