1WNSEARCH(3) WordNet™ Library Functions WNSEARCH(3)
2
3
4
6 findtheinfo, findtheinfo_ds, is_defined, in_wn, index_lookup,
7 parse_index, getindex, read_synset, parse_synset, free_syns,
8 free_synset, free_index, traceptrs_ds, do_trace - functions for search‐
9 ing the WordNet database
10
12 #include "wn.h"
13
14 char *findtheinfo(char *searchstr, int pos, int ptr_type, int
15 sense_num);
16
17 SynsetPtr findtheinfo_ds(char *searchstr, int pos, int ptr_type, int
18 sense_num );
19
20 unsigned int is_defined(char *searchstr, int pos);
21
22 unsigned int in_wn(char *searchstr, int pos);
23
24 IndexPtr index_lookup(char *searchstr, int pos);
25
26 IndexPtr parse_index(long offset, int dabase, char *line);
27
28 IndexPtr getindex(char *searchstr, int pos);
29
30 SynsetPtr read_synset(int pos, long synset_offset, char *searchstr);
31
32 SynsetPtr parse_synset(FILE *fp, int pos, char *searchstr);
33
34 void free_syns(SynsetPtr synptr);
35
36 void free_synset(SynsetPtr synptr);
37
38 void free_index(IndexPtr idx);
39
40 SynsetPtr traceptrs_ds(SynsetPtr synptr, int ptr_type, int pos, int
41 depth);
42
43 char *do_trace(SynsetPtr synptr, int ptr_type, int pos, int depth);
44
46 These functions are used for searching the WordNet database. They gen‐
47 erally fall into several categories: functions for reading and parsing
48 index file entries; functions for reading and parsing synsets in data
49 files; functions for tracing pointers and hierarchies; functions for
50 freeing space occupied by data structures allocated with malloc(3).
51
52 In the following function descriptions, pos is one of the following:
53
54 1 NOUN
55 2 VERB
56 3 ADJECTIVE
57 4 ADVERB
58
59 findtheinfo() is the primary search algorithm for use with database
60 interface applications. Search results are automatically formatted,
61 and a pointer to the text buffer is returned. All searches listed in
62 WNHOME/include/wn.h can be done by findtheinfo(). findtheinfo_ds() can
63 be used to perform most of the searches, with results returned in a
64 linked list data structure. This is for use with applications that
65 need to analyze the search results rather than just display them.
66
67 Both functions are passed the same arguments: searchstr is the word or
68 collocation to search for; pos indicates the syntactic category to
69 search in; ptr_type is one of the valid search types for searchstr in
70 pos. (Available searches can be obtained by calling is_defined()
71 described below.) sense_num should be ALLSENSES if the search is to be
72 done on all senses of searchstr in pos, or a positive integer indicat‐
73 ing which sense to search.
74
75 findtheinfo_ds() returns a linked list data structures representing
76 synsets. Senses are linked through the nextss field of a Synset data
77 structure. For each sense, synsets that match the search specified
78 with ptr_type are linked through the ptrlist field. See Synset Naviga‐
79 tion , below, for detailed information on the linked lists returned.
80
81 is_defined() sets a bit for each search type that is valid for search‐
82 str in pos, and returns the resulting unsigned integer. Each bit num‐
83 ber corresponds to a pointer type constant defined in
84 WNHOME/include/wn.h. For example, if bit 2 is set, the HYPERPTR search
85 is valid for searchstr. There are 29 possible searches.
86
87 in_wn() is used to find the syntactic categories in the WordNet data‐
88 base that contain one or more senses of searchstr. If pos is ALL_POS,
89 all syntactic categories are checked. Otherwise, only the part of
90 speech passed is checked. An unsigned integer is returned with a bit
91 set corresponding to each syntactic category containing searchstr. The
92 bit number matches the number for the part of speech. 0 is returned if
93 searchstr is not present in pos.
94
95 index_lookup() finds searchstr in the index file for pos and returns a
96 pointer to the parsed entry in an Index data structure. searchstr must
97 exactly match the form of the word (lower case only, hyphens and under‐
98 scores in the same places) in the index file. NULL is returned if a
99 match is not found.
100
101 parse_index() parses an entry from an index file and returns a pointer
102 to the parsed entry in an Index data structure. Passed the byte offset
103 and syntactic category, it reads the index entry at the desired loca‐
104 tion in the corresponding file. If passed line, line contains an index
105 file entry and the database index file is not consulted. However, off‐
106 set and dbase should still be passed so the information can be stored
107 in the Index structure.
108
109 getindex() is a "smart" search for searchstr in the index file corre‐
110 sponding to pos. It applies to searchstr an algorithm that replaces
111 underscores with hyphens, hyphens with underscores, removes hyphens and
112 underscores, and removes periods in an attempt to find a form of the
113 string that is an exact match for an entry in the index file corre‐
114 sponding to pos. index_lookup() is called on each transformed string
115 until a match is found or all the different strings have been tried.
116 It returns a pointer to the parsed Index data structure for searchstr,
117 or NULL if a match is not found.
118
119 read_synset() is used to read a synset from a byte offset in a data
120 file. It performs an fseek(3) to synset_offset in the data file corre‐
121 sponding to pos, and calls parse_synset() to read and parse the synset.
122 A pointer to the Synset data structure containing the parsed synset is
123 returned.
124
125 parse_synset() reads the synset at the current offset in the file indi‐
126 cated by fp. pos is the syntactic category, and searchstr, if not
127 NULL, indicates the word in the synset that the caller is interested
128 in. An attempt is made to match searchstr to one of the words in the
129 synset. If an exact match is found, the whichword field in the Synset
130 structure is set to that word's number in the synset (beginning to
131 count from 1).
132
133 free_syns() is used to free a linked list of Synset structures allo‐
134 cated by findtheinfo_ds(). synptr is a pointer to the list to free.
135
136 free_synset() frees the Synset structure pointed to by synptr.
137
138 free_index() frees the Index structure pointed to by idx.
139
140 traceptrs_ds() is a recursive search algorithm that traces pointers
141 matching ptr_type starting with the synset pointed to by synptr. Set‐
142 ting depth to 1 when traceptrs_ds() is called indicates a recursive
143 search; 0 indicates a non-recursive call. synptr points to the data
144 structure representing the synset to search for a pointer of type
145 ptr_type. When a pointer type match is found, the synset pointed to is
146 read is linked onto the nextss chain. Levels of the tree generated by
147 a recursive search are linked via the ptrlist field structure until
148 NULL is found, indicating the top (or bottom) of the tree. This func‐
149 tion is usually called from findtheinfo_ds() for each sense of the
150 word. See Synset Navigation , below, for detailed information on the
151 linked lists returned.
152
153 do_trace() performs the search indicated by ptr_type on synset synptr
154 in syntactic category pos. depth is defined as above. do_trace()
155 returns the search results formatted in a text buffer.
156
157 Synset Navigation
158 Since the Synset structure is used to represent the synsets for both
159 word senses and pointers, the ptrlist and nextss fields have different
160 meanings depending on whether the structure is a word sense or pointer.
161 This can make navigation through the lists returned by findtheinfo_ds()
162 confusing.
163
164 Navigation through the returned list involves the following:
165
166 Following the nextss chain from the synset returned moves through the
167 various senses of searchstr. NULL indicates that end of the chain of
168 senses.
169
170 Following the ptrlist chain from a Synset structure representing a
171 sense traces the hierarchy of the search results for that sense. Sub‐
172 sequent links in the ptrlist chain indicate the next level (up or down,
173 depending on the search) in the hierarchy. NULL indicates the end of
174 the chain of search result synsets.
175
176 If a synset pointed to by ptrlist has a value in the nextss field, it
177 represents another pointer of the same type at that level in the hier‐
178 archy. For example, some noun synsets have two hypernyms. Following
179 this nextss pointer, and then the ptrlist chain from the Synset struc‐
180 ture pointed to, traces another, parallel, hierarchy, until the end is
181 indicated by NULL on that ptrlist chain. So, a synset representing a
182 pointer (versus a sense of searchstr) having a non-NULL value in nextss
183 has another chain of search results linked through the ptrlist chain of
184 the synset pointed to by nextss.
185
186 If searchstr contains more than one base form in WordNet (as in the
187 noun axes, which has base forms axe and axis), synsets representing the
188 search results for each base form are linked through the nextform
189 pointer of the Synset structure.
190
191 WordNet Searches
192 There is no extensive description of what each search type is or the
193 results returned. Using the WordNet interface, examining the source
194 code, and reading wndb(5) are the best ways to see what types of
195 searches are available and the data returned for each.
196
197 Listed below are the valid searches that can be passed as ptr_type to
198 findtheinfo(). Passing a negative value (when applicable) causes a
199 recursive, hierarchical search by setting depth to 1 when traceptrs()
200 is called.
201
202
203 ┌─────────────────┬───────┬─────────┬────────────────────────────────────────────┐
204 │ptr_type │ Value │ Pointer │ Search │
205 │ │ │ Symbol │ │
206 ├─────────────────┼───────┼─────────┼────────────────────────────────────────────┤
207 │ANTPTR │ 1 │ ! │ Antonyms │
208 │HYPERPTR │ 2 │ @ │ Hypernyms │
209 │HYPOPTR │ 3 │ ∼ │ Hyponyms │
210 │ENTAILPTR │ 4 │ * │ Entailment │
211 │SIMPTR │ 5 │ & │ Similar │
212 │ISMEMBERPTR │ 6 │ #m │ Member meronym │
213 │ISSTUFFPTR │ 7 │ #s │ Substance meronym │
214 │ISPARTPTR │ 8 │ #p │ Part meronym │
215 │HASMEMBERPTR │ 9 │ %m │ Member holonym │
216 │HASSTUFFPTR │ 10 │ %s │ Substance holonym │
217 │HASPARTPTR │ 11 │ %p │ Part holonym │
218 │MERONYM │ 12 │ % │ All meronyms │
219 │HOLONYM │ 13 │ # │ All holonyms │
220 │CAUSETO │ 14 │ > │ Cause │
221 │PPLPTR │ 15 │ < │ Participle of verb │
222 │SEEALSOPTR │ 16 │ ^ │ Also see │
223 │PERTPTR │ 17 │ \ │ Pertains to noun or derived from adjective │
224 │ATTRIBUTE │ 18 │ \= │ Attribute │
225 │VERBGROUP │ 19 │ $ │ Verb group │
226 │DERIVATION │ 20 │ + │ Derivationally related form │
227 │CLASSIFICATION │ 21 │ ; │ Domain of synset │
228 │CLASS │ 22 │ - │ Member of this domain │
229 │SYNS │ 23 │ n/a │ Find synonyms │
230 │FREQ │ 24 │ n/a │ Polysemy │
231 │FRAMES │ 25 │ n/a │ Verb example sentences and generic frames │
232 │COORDS │ 26 │ n/a │ Noun coordinates │
233 │RELATIVES │ 27 │ n/a │ Group related senses │
234 │HMERONYM │ 28 │ n/a │ Hierarchical meronym search │
235 │HHOLONYM │ 29 │ n/a │ Hierarchical holonym search │
236 │WNGREP │ 30 │ n/a │ Find keywords by substring │
237 │OVERVIEW │ 31 │ n/a │ Show all synsets for word │
238 │CLASSIF_CATEGORY │ 32 │ ;c │ Show domain topic │
239 │CLASSIF_USAGE │ 33 │ ;u │ Show domain usage │
240 │CLASSIF_REGIONAL │ 34 │ ;r │ Show domain region │
241 │CLASS_CATEGORY │ 35 │ -c │ Show domain terms for topic │
242 │CLASS_USAGE │ 36 │ -u │ Show domain terms for usage │
243 │CLASS_REGIONAL │ 37 │ -r │ Show domain terms for region │
244 │INSTANCE │ 38 │ @i │ Instance of │
245 │INSTANCES │ 39 │ ∼i │ Show instances │
246 └─────────────────┴───────┴─────────┴────────────────────────────────────────────┘
247 findtheinfo_ds() cannot perform the following searches:
248
249 SEEALSOPTR
250 PERTPTR
251 VERBGROUP
252 FREQ
253 FRAMES
254 RELATIVES
255 WNGREP
256 OVERVIEW
257
259 Applications that use WordNet and/or the morphological functions must
260 call wninit() at the start of the program. See wnutil(3) for more
261 information.
262
263 In all function calls, searchstr may be either a word or a collocation
264 formed by joining individual words with underscore characters (_).
265
266 The SearchResults structure defines fields in the wnresults global
267 variable that are set by the various search functions. This is a way
268 to get additional information, such as the number of senses the word
269 has, from the search functions. The searchds field is set by findthe‐
270 info_ds().
271
272 The pos passed to traceptrs_ds() is not used.
273
274
276 wn(1), wnb(1), wnintro(3), binsrch(3), malloc(3), morph(3), wnutil(3),
277 wnintro(5).
278
280 parse_synset() must find an exact match between the searchstr passed
281 and a word in the synset to set whichword. No attempt is made to
282 translate hyphens and underscores, as is done in getindex().
283
284 The WordNet database and exception list files must be opened with
285 wninit prior to using any of the searching functions.
286
287 A large search may cause findtheinfo() to run out of buffer space. The
288 maximum buffer size is determined by computer platform. If the buffer
289 size is exceeded the following message is printed in the output buffer:
290 "Search too large. Narrow search and try again...".
291
292 Passing an invalid pos will probably result in a core dump.
293
294
295
296WordNet 3.0 Dec 2006 WNSEARCH(3)