senseidx(5)

1SENSEIDX(5)                  WordNet™ File Formats                 SENSEIDX(5)
2
3
4

NAME

6       index.sense, sense.idx - WordNet's sense index
7

DESCRIPTION

9       The  WordNet  sense  index  provides  an alternate method for accessing
10       synsets and word senses in the  WordNet  database.   It  is  useful  to
11       applications  that  retrieve  synsets or other information related to a
12       specific sense in WordNet, rather than all the senses of a word or col‐
13       location.   It  can  also be used with tools like grep and Perl to find
14       all senses of a word in one or more parts of speech.  A specific  Word‐
15       Net  sense,  encoded  as a sense_key, can be used as an index into this
16       file to obtain its WordNet sense number, the database  byte  offset  of
17       the  synset  containing  the sense, and the number of times it has been
18       tagged in the semantic concordance texts.
19
20       Concatenating the lemma and lex_sense fields of a  semantically  tagged
21       word  (represented  in  a <wf ... > attribute/value pair) in a semantic
22       concordance file, using % as the concatenation character,  creates  the
23       sense_key for that sense, which can in turn be used to search the sense
24       index file.
25
26       A sense_key is the best way to represent a sense in semantic tagging or
27       other systems that refer to WordNet senses.  sense_keys are independent
28       of WordNet sense numbers and synset_offsets, which  vary  between  ver‐
29       sions of the database.  Using the sense index and a sense_key, the cor‐
30       responding synset (via the synset_offset) and WordNet sense number  can
31       easily  be  obtained.  A mapping from noun sense_keys in WordNet 1.6 to
32       corresponding 2.0 sense_keys is  provided  with  version  2.0,  and  is
33       described in sensemap(5).
34
35       See wndb(5) for a thorough discussion of the WordNet database files.
36
37   File Format
38       The  sense  index  file lists all of the senses in the WordNet database
39       with each line representing one sense.  The  file  is  in  alphabetical
40       order,  fields  are separated by one space, and each line is terminated
41       with a newline character.
42
43       Each line is of the form:
44
45              sense_key  synset_offset  sense_number  tag_cnt
46
47       sense_key is an encoding of the word sense.  Programs can  construct  a
48       sense  key  in  this  format and use it as a binary search key into the
49       sense index file.  The format of a sense_key is described below.
50
51       synset_offset is the byte offset that the synset containing  the  sense
52       is  found  at  in the database "data" file corresponding to the part of
53       speech encoded in the sense_key.  synset_offset is an  8  digit,  zero-
54       filled  decimal integer, and can be used with fseek(3) to read a synset
55       from the data file.   When  passed  to  the  WordNet  library  function
56       read_synset()  along with the syntactic category, a data structure con‐
57       taining the parsed synset is returned.
58
59       sense_number is a decimal integer indicating the sense  number  of  the
60       word,  within  the  part of speech encoded in sense_key, in the WordNet
61       database.  See wndb(5) for information  about  how  sense  numbers  are
62       assigned.
63
64       tag_cnt  represents  the decimal number of times the sense is tagged in
65       various semantic concordance texts.  A tag_cnt of 0 indicates that  the
66       sense has not been semantically tagged.
67
68   Sense Key Encoding
69       A sense_key is represented as:
70
71              lemma%lex_sense
72
73       where lex_sense is encoded as:
74
75              ss_type:lex_filenum:lex_id:head_word:head_id
76
77       lemma  is  the  ASCII  text  of the word or collocation as found in the
78       WordNet database index file corresponding to pos.  lemma  is  in  lower
79       case,  and  collocations are formed by joining individual words with an
80       underscore (_) character.
81
82       ss_type is a one digit decimal integer representing the synset type for
83       the  sense.   See Synset Type below for a listing of the numbers corre‐
84       sponding to each synset type.
85
86       lex_filenum is a two digit decimal integer representing the name of the
87       lexicographer  file  containing  the  synset  for  the sense.  See lex‐
88       names(5) for the list of lexicographer file names and their correspond‐
89       ing numbers.
90
91       lex_id  is  a two digit decimal integer that, when appended onto lemma,
92       uniquely identifies a sense within a lexicographer file.   lex_id  num‐
93       bers usually start with 00, and are incremented as additional senses of
94       the word are added to the same file, although there is  no  requirement
95       that the numbers be consecutive or begin with 00.  Note that a value of
96       00 is the default, and therefore is not present in lexicographer files.
97       Only  non-default lex_id values must be explicitly assigned in lexicog‐
98       rapher files.  See wninput(5) for information on the format of lexicog‐
99       rapher files.
100
101       head_word  is  only  present  if the sense is in an adjective satellite
102       synset.  It is the lemma of the first  word  of  the  satellite's  head
103       synset.
104
105       head_id  is  a  two  digit  decimal  integer  that,  when appended onto
106       head_word, uniquely identifies the sense of head_word within a lexicog‐
107       rapher  file,  as described for lex_id.  There is a value in this field
108       only if head_word is present.
109
110   Synset Type
111       The synset type is encoded as follows:
112
113              1    NOUN
114              2    VERB
115              3    ADJECTIVE
116              4    ADVERB
117              5    ADJECTIVE SATELLITE
118

NOTES

120       For non-satellite senses the head_word and head_id fields have no  val‐
121       ues, however the field separator character (:) is present.
122

ENVIRONMENT VARIABLES (UNIX)

124       WNHOME              Base    directory    for   WordNet.    Default   is
125                           /usr/local/WordNet-3.0.
126
127       WNSEARCHDIR         Directory in which the WordNet  database  has  been
128                           installed.  Default is WNHOME/dict.
129

REGISTRY (WINDOWS)

131       HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
132                           Base  directory  for  WordNet.   Default is C:\Pro‐
133                           gram Files\WordNet\3.0.
134

FILES

136       index.sense         sense index
137

NAME

DESCRIPTION

NOTES

ENVIRONMENT VARIABLES (UNIX)

REGISTRY (WINDOWS)

FILES

SEE ALSO