1WNINPUT(5) WordNet™ File Formats WNINPUT(5)
2
3
4
6 noun.suffix, verb.suffix, adj.suffix, adv.suffix - WordNet lexicogra‐
7 pher files that are input to grind(1)
8
10 WordNet's source files are written by lexicographers. They are the
11 product of a detailed relational analysis of lexical semantics: a vari‐
12 ety of lexical and semantic relations are used to represent the organi‐
13 zation of lexical knowledge. Two kinds of building blocks are distin‐
14 guished in the source files: word forms and word meanings. Word forms
15 are represented in their familiar orthography; word meanings are repre‐
16 sented by synonym sets (synsets) - lists of synonymous word forms that
17 are interchangeable in some context. Two kinds of relations are recog‐
18 nized: lexical and semantic. Lexical relations hold between word
19 forms; semantic relations hold between word meanings.
20
21 Lexicographer files correspond to the syntactic categories implemented
22 in WordNet - noun, verb, adjective and adverb. All of the synsets in a
23 lexicographer file are in the same syntactic category. Each synset
24 consists of a list of synonymous words or collocations (eg. "fountain
25 pen", "take in"), and pointers that describe the relations between this
26 synset and other synsets. These relations include (but are not limited
27 to) hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy. A
28 word or collocation may appear in more than one synset, and in more
29 than one part of speech. Each use of a word in a synset represents a
30 sense of that word in the part of speech corresponding to the synset.
31
32 Adjectives may be organized into clusters containing head synsets and
33 satellite synsets. Adverbs generally point to the adjectives from
34 which they are derived.
35
36 See wngloss(7) for a glossary of WordNet terminology and a discussion
37 of the database's content and logical organization.
38
39 Lexicographer File Names
40 The names of the lexicographer files are of the form:
41
42 pos.suffix
43
44 where pos is either noun, verb, adj or adv. suffix may be used to
45 organize groups of synsets into different files, for example noun.ani‐
46 mal and noun.plant. See lexnames(5) for a list of lexicographer file
47 names that are used in building WordNet.
48
49 Pointers
50 Pointers are used to represent the relations between the words in one
51 synset and another. Semantic pointers represent relations between word
52 meanings, and therefore pertain to all of the words in the source and
53 target synsets. Lexical pointers represent relations between word
54 forms, and pertain only to specific words in the source and target
55 synsets. The following pointer types are usually used to indicate lex‐
56 ical relations: Antonym, Pertainym, Participle, Also See, Derivation‐
57 ally Related. The remaining pointer types are generally used to repre‐
58 sent semantic relations.
59
60 A relation from a source to a target synset is formed by specifying a
61 word from the target synset in the source synset, followed by the
62 pointer_symbol indicating the pointer type. The location of a pointer
63 within a synset defines it as either lexical or semantic. The Lexicog‐
64 rapher File Format section describes the syntax for entering a semantic
65 pointer, and Word Syntax describes the syntax for entering a lexical
66 pointer.
67
68 Although there are many pointer types, only certain types of relations
69 are permitted between synsets of each syntactic category.
70
71 The pointer_symbols for nouns are:
72 ! Antonym
73 @ Hypernym
74 @i Instance Hypernym
75 ∼ Hyponym
76 ∼i Instance Hyponym
77 #m Member holonym
78 #s Substance holonym
79 #p Part holonym
80 %m Member meronym
81 %s Substance meronym
82 %p Part meronym
83 = Attribute
84 + Derivationally related form
85 ;c Domain of synset - TOPIC
86 -c Member of this domain - TOPIC
87 ;r Domain of synset - REGION
88 -r Member of this domain - REGION
89 ;u Domain of synset - USAGE
90 -u Member of this domain - USAGE
91
92 The pointer_symbols for verbs are:
93 ! Antonym
94 @ Hypernym
95 ∼ Hyponym
96 * Entailment
97 > Cause
98 ^ Also see
99 $ Verb Group
100 + Derivationally related form
101 ;c Domain of synset - TOPIC
102 ;r Domain of synset - REGION
103 ;u Domain of synset - USAGE
104
105 The pointer_symbols for adjectives are:
106 ! Antonym
107 & Similar to
108 < Participle of verb
109 \ Pertainym (pertains to noun)
110 = Attribute
111 ^ Also see
112 ;c Domain of synset - TOPIC
113 ;r Domain of synset - REGION
114 ;u Domain of synset - USAGE
115
116 The pointer_symbols for adverbs are:
117 ! Antonym
118 \ Derived from adjective
119 ;c Domain of synset - TOPIC
120 ;r Domain of synset - REGION
121 ;u Domain of synset - USAGE
122
123 Many pointer types are reflexive, meaning that if a synset contains a
124 pointer to another synset, the other synset should contain a corre‐
125 sponding reflexive pointer. grind(1) automatically inserts missing
126 reflexive pointers for the following pointer types:
127
128
129 ┌───────────────────────┬────────────────────────┐
130 │ Pointer │ Reflect │
131 ├───────────────────────┼────────────────────────┤
132 │Antonym │ Antonym │
133 │Hyponym │ Hypernym │
134 │Hypernym │ Hyponym │
135 │Instance Hyponym │ Instance Hypernym │
136 │Instance Hypernym │ Instance Hyponym │
137 │Holonym │ Meronym │
138 │Meronym │ Holonym │
139 │Similar to │ Similar to │
140 │Attribute │ Attribute │
141 │Verb Group │ Verb Group │
142 │Derivationally Related │ Derivationally Related │
143 │Domain of synset │ Member of Doman │
144 └───────────────────────┴────────────────────────┘
145 Verb Frames
146 Each verb synset contains a list of generic sentence frames illustrat‐
147 ing the types of simple sentences in which the verbs in the synset can
148 be used. For some verb senses, example sentences illustrating actual
149 uses of the verb are provided. (See Verb Example Sentences in
150 wndb(5).) Whenever there is no example sentence, the generic sentence
151 frames specified by the lexicographer are used. The generic sentence
152 frames are entered in a synset as a comma-separated list of integer
153 frame numbers. The following list is the text of the generic frames,
154 preceded by their frame numbers:
155
156 1 Something ----s
157 2 Somebody ----s
158 3 It is ----ing
159 4 Something is ----ing PP
160 5 Something ----s something Adjective/Noun
161 6 Something ----s Adjective/Noun
162 7 Somebody ----s Adjective
163 8 Somebody ----s something
164 9 Somebody ----s somebody
165 10 Something ----s somebody
166 11 Something ----s something
167 12 Something ----s to somebody
168 13 Somebody ----s on something
169 14 Somebody ----s somebody something
170 15 Somebody ----s something to somebody
171 16 Somebody ----s something from somebody
172 17 Somebody ----s somebody with something
173 18 Somebody ----s somebody of something
174 19 Somebody ----s something on somebody
175 20 Somebody ----s somebody PP
176 21 Somebody ----s something PP
177 22 Somebody ----s PP
178 23 Somebody's (body part) ----s
179 24 Somebody ----s somebody to INFINITIVE
180 25 Somebody ----s somebody INFINITIVE
181 26 Somebody ----s that CLAUSE
182 27 Somebody ----s to somebody
183 28 Somebody ----s to INFINITIVE
184 29 Somebody ----s whether INFINITIVE
185 30 Somebody ----s somebody into V-ing something
186 31 Somebody ----s something with something
187 32 Somebody ----s INFINITIVE
188 33 Somebody ----s VERB-ing
189 34 It ----s that CLAUSE
190 35 Something ----s INFINITIVE
191
192 Lexicographer File Format
193 Synsets are entered one per line, and each line is terminated with a
194 newline character. A line containing a synset may be as long as neces‐
195 sary, but no newlines can be entered within a synset. Within a synset,
196 spaces or tabs may be used to separate entities. Items enclosed in
197 italicized square brackets may not be present.
198
199 The general synset syntax is:
200
201 { words pointers ( gloss ) }
202
203 Synsets of this form are valid for all syntactic categories except
204 verb, and are referred to as basic synsets. At least one word and a
205 gloss are required to form a valid synset. Pointers entered following
206 all the words in a synset represent semantic relations between all the
207 words in the source and target synsets.
208
209 For verbs, the basic synset syntax is defined as follows:
210
211 { words pointers frames ( gloss ) }
212
213 Adjective may be organized into clusters containing one or more head
214 synsets and optional satellite synsets. Adjective clusters are of the
215 form:
216
217 [
218 head synset
219 [satellite synsets]
220 [-]
221 [additional head/satellite synsets]
222 ]
223
224 Each adjective cluster is enclosed in square brackets, and may have one
225 or more parts. Each part consists of a head synset and optional satel‐
226 lite synsets that are conceptually similar to the head synset's mean‐
227 ing. Parts of a cluster are separated by one or more hyphens (-) on a
228 line by themselves, with the terminating square bracket following the
229 last synset. Head and satellite synsets follow the syntax of basic
230 synsets, however a "Similar to" pointer must be specified in a head
231 synset for each of its satellite synsets. Most adjective clusters con‐
232 tain two antonymous parts. See wngloss(7) for a discussion of adjec‐
233 tive clusters, and Special Adjective Syntax for more information on
234 adjective cluster syntax.
235
236 Synsets for relational adjectives (pertainyms) and participial adjec‐
237 tives do not adhere to the cluster structure. They use the basic
238 synset syntax.
239
240 Comments can be entered in a lexicographer file by enclosing the text
241 of the comment in parentheses. Note that comments cannot appear within
242 a synset, as parentheses within a synset have an entirely different
243 meaning (see Gloss Syntax ). However, entire synsets (or adjective
244 clusters) can be "commented out" by enclosing them in parentheses.
245 This is often used by the lexicographers to verify the syntax of files
246 under development or to leave a note to oneself while working on
247 entries.
248
249 Word Syntax
250 A synset must have at least one word, and the words of a synset must
251 appear after the opening brace and before any other synset constructs.
252 A word may be entered in either the simple word or word/pointer syntax.
253
254 A simple word is of the form:
255
256 word[ ( marker ) ][lex_id] ,
257
258 word may be entered in any combination of upper and lower case unless
259 it is in an adjective cluster. A collocation is entered by joining the
260 individual words with an underscore character (_). Numbers (integer or
261 real) may be entered, either by themselves or as part of a word string,
262 by following the number with a double quote (").
263
264 See Special Adjective Syntax for a description of adjective clusters
265 and markers.
266
267 word may be followed by an integer lex_id from 1 to 15. The lex_id is
268 used to distinguish different senses of the same word within a lexicog‐
269 rapher file. The lexicographer assigns lex_id values, usually in
270 ascending order, although there is no requirement that the numbers be
271 consecutive. The default is 0, and does not have to be specified. A
272 lex_id must be used on pointers if the desired sense has a non-zero
273 lex_id in its synset specification.
274
275 Word/pointer syntax is of the form:
276
277 [ word[ ( marker ) ][lex_id] , pointers ]
278
279 This syntax is used when one or more pointers correspond only to the
280 specific word in the word/pointer set, rather than all the words in the
281 synset, and represents a lexical relation. Note that a word/pointer
282 set appears within a synset, therefore the square brackets used to
283 enclose it are treated differently from those used to define an adjec‐
284 tive cluster. Only one word can be specified in each word/pointer set,
285 and any number of pointers may be included. A synset can have any num‐
286 ber of word/pointer sets. Each is treated by grind(1) essentially as a
287 word, so they all must appear before any synset pointers representing
288 semantic relations.
289
290 For verbs, the word/pointer syntax is extended in the following manner
291 to allow the user to specify generic sentence frames that, like point‐
292 ers, correspond only to a specific word, rather than all the words in
293 the synset. In this case, pointers are optional.
294
295 [ word , [pointers] frames ]
296
297 Pointer Syntax
298 Pointers are optional in synsets. If a pointer is specified outside of
299 a word/pointer set, the relation is applied to all of the words in the
300 synset, including any words specified using the word/pointer syntax.
301 This indicates a semantic relation between the meanings of the words in
302 the synsets. If specified within a word/pointer set, the relation cor‐
303 responds only to the word in the set and represents a lexical relation.
304
305 A pointer is of the form:
306
307 [lex_filename: ]word[lex_id],pointer_symbol
308
309 or:
310
311 [lex_filename: ]word[lex_id]^word[lex_id],pointer_symbol
312
313 For pointers, word indicates a word in another synset. When the second
314 form of a pointer is used, the first word indicates a word in a head
315 synset, and the second is a word in a satellite of that cluster. word
316 may be followed by a lex_id that is used to match the pointer to the
317 correct target synset. The synset containing word may reside in
318 another lexicographer file. In this case, word is preceded by
319 lex_filename as shown.
320
321 See Pointers for a list of pointer_symbols and their meanings.
322
323 Verb Frame List Syntax
324 Frame numbers corresponding to generic sentence frames must be entered
325 in each verb synset. If a frame list is specified outside of a
326 word/pointer set, the verb frames in the list apply to all of the words
327 in the synset, including any words specified using the word/pointer
328 syntax. If specified within a word/pointer set, the verb frames in the
329 list correspond only to the word in the set.
330
331 A frame number list is entered as follows:
332
333 frames: f_num[,f_num...]
334
335 Where f_num specifies a generic frame number. See Verb Frames for a
336 list of generic sentences and their corresponding frame numbers.
337
338 Gloss Syntax
339 A gloss is included in all synsets. The lexicographer may enter a text
340 string of any length desired. A gloss is simply a string enclosed in
341 parentheses with no embedded carriage returns. It provides a defini‐
342 tion of what the synset represents and/or example sentences.
343
344 Special Adjective Syntax
345 The syntax for representing antonymous adjective synsets requires sev‐
346 eral additional conditions.
347
348 The first word of a head synset must be entered in upper case, and can
349 be thought of as the head word of the head synset. The word part of a
350 pointer from one head synset to another head synset within the same
351 cluster (usually an antonym) must also be entered in upper case. Usu‐
352 ally antonymous adjectives are entered using the word/pointer syntax
353 described in Word Syntax to indicate a lexical relation. There is no
354 restriction on the number of parts that a cluster may have, and some
355 clusters have three parts, representing antonymous triplets, such as
356 solid, liquid, and gas.
357
358 A cross-cluster pointer may be specified, allowing a head or satellite
359 synset to point to a head synset in a different cluster. A cross-clus‐
360 ter pointer is indicated by entering the word part of the pointer in
361 upper case.
362
363 An adjective may be annotated with a syntactic marker indicating a lim‐
364 itation on the syntactic position the adjective may have in relation to
365 noun that it modifies. If so marked, the marker appears between the
366 word and its following comma. If a lex_id is specified, the marker
367 immediately follows it. The syntactic markers are:
368 (p) predicate position
369 (a) prenominal (attributive) position
370 (ip) immediately postnominal position
371
373 (Note that these are hypothetical examples not found in the WordNet
374 lexicographer files.)
375
376 Sample noun synsets:
377 { canine, [ dog1, cat,! ] pooch, canid,@ }
378 { collie, dog1,@ (large multi-colored dog with pointy nose) }
379 { hound, hunting_dog, pack,#m dog1,@ }
380 { dog, }
381
382 Sample verb synsets:
383 { [ confuse, clarify,! frames: 1 ] blur, obscure, frames: 8, 10 }
384 { [ clarify, confuse,! ] make_clear, interpret,@ frames: 8 }
385 { interpret, construe, understand,@ frames: 8 }
386
387 Sample adjective clusters:
388 [
389 { [ HOT, COLD,! ] lukewarm(a), TEPID,^ (hot to the touch) }
390 { warm, }
391 -
392 { [ COLD, HOT,! ] frigid, (cold to the touch) }
393 { freezing, }
394 ]
395
396 Sample adverb synsets:
397 { [ basically, adj.all:essential^basic,\ ] [ essentially, adj.all:basic^fundamental,\ ] ( by one's very nature )}
398 { pointedly, adj.all:pungent^pointed,\ }
399 { [ badly, adj.all:bad,\ well,! ] ill, ("He was badly prepared") }
400
402 grind(1), wnintro(5), lexnames(5), wndb(5), uniqbeg(7), wngloss(7).
403
404 Fellbaum, C. (1998), ed. "WordNet: An Electronic Lexical Database".
405 MIT Press, Cambridge, MA.
406
407
408
409
410WordNet 3.0 Dec 2006 WNINPUT(5)