1MORPHY(7) WordNet™ MORPHY(7)
2
3
4
6 morphy - discussion of WordNet's morphological processing
7
9 Although only base forms of words are usually stored in WordNet,
10 searches may be done on inflected forms. A set of morphology func‐
11 tions, Morphy, is applied to the search string to generate a form that
12 is present in WordNet.
13
14 Morphology in WordNet uses two types of processes to try to convert the
15 string passed into one that can be found in the WordNet database.
16 There are lists of inflectional endings, based on syntactic category,
17 that can be detached from individual words in an attempt to find a form
18 of the word that is in WordNet. There are also exception list files,
19 one for each syntactic category, in which a search for an inflected
20 form is done. Morphy tries to use these two processes in an intelli‐
21 gent manner to translate the string passed to the base form found in
22 WordNet. Morphy first checks for exceptions, then uses the rules of
23 detachment. The Morphy functions are not independent from WordNet.
24 After each transformation, WordNet is searched for the resulting string
25 in the syntactic category specified.
26
27 The Morphy functions are passed a string and a syntactic category. A
28 string is either a single word or a collocation. Since some words,
29 such as axes can have more than one base form (axe and axis), Morphy
30 works in the following manner. The first time that Morphy is called
31 with a specific string, it returns a base form. For each subsequent
32 call to Morphy made with a NULL string argument, Morphy returns another
33 base form. Whenever Morphy cannot perform a transformation, whether on
34 the first call for a word or subsequent calls, NULL is returned. A
35 transformation to a valid English string will return NULL if the base
36 form of the string is not in WordNet.
37
38 The morphological functions are found in the WordNet library. See
39 morph(3) for information on using these functions.
40
41 Rules of Detachment
42 The following table shows the rules of detachment used by Morphy. If a
43 word ends with one of the suffixes, it is stripped from the word and
44 the corresponding ending is added. Then WordNet is searched for the
45 resulting string. No rules are applicable to adverbs.
46
47 │ │
48 POS │ Suffix │ Ending
49 ─────┼────────┼────────
50 NOUN │ "s" │ ""
51 NOUN │ "ses" │ "s"
52 NOUN │ "xes" │ "x"
53 NOUN │ "zes" │ "z"
54 NOUN │ "ches" │ "ch"
55 NOUN │ "shes" │ "sh"
56 NOUN │ "men" │ "man"
57 NOUN │ "ies" │ "y"
58 VERB │ "s" │ ""
59 VERB │ "ies" │ "y"
60 VERB │ "es" │ "e"
61 VERB │ "es" │ ""
62 VERB │ "ed" │ "e"
63 VERB │ "ed" │ ""
64 VERB │ "ing" │ "e"
65 VERB │ "ing" │ ""
66
67 ADJ │ "er" │ ""
68 ADJ │ "est" │ ""
69 ADJ │ "er" │ "e"
70 ADJ │ "est" │ "e"
71
72 Exception Lists
73 There is one exception list file for each syntactic category. The
74 exception lists contain the morphological transformations for strings
75 that are not regular and therefore cannot be processed in an algorith‐
76 mic manner. Each line of an exception list contains an inflected form
77 of a word or collocation, followed by one or more base forms. The list
78 is kept in alphabetical order and a binary search is used to find words
79 in these lists. See wndb(5) for information on the format of the
80 exception list files.
81
82 Single Words
83 In general, single words are relatively easy to process. Morphy first
84 looks for the word in the exception list. If it is found the first
85 base form is returned. Subsequent calls with a NULL argument return
86 additional base forms, if present. A NULL is returned when there are
87 no more base forms of the word.
88
89 If the word is not found in the exception list corresponding to the
90 syntactic category, an algorithmic process using the rules of detach‐
91 ment looks for a matching suffix. If a matching suffix is found, a
92 corresponding ending is applied (sometimes this ending is a NULL
93 string, so in effect the suffix is removed from the word), and WordNet
94 is consulted to see if the resulting word is found in the desired part
95 of speech.
96
97 Collocations
98 As opposed to single words, collocations can be quite difficult to
99 transform into a base form that is present in WordNet. In general,
100 only base forms of words, even those comprising collocations, are
101 stored in WordNet, such as attorney general. Transforming the colloca‐
102 tion attorneys general is then simply a matter of finding the base
103 forms of the individual words comprising the collocation. This usually
104 works for nouns, therefore non-conforming nouns, such as customs duty
105 are presently entered in the noun exception list.
106
107 Verb collocations that contain prepositions, such as ask for it, are
108 more difficult. As with single words, the exception list is searched
109 first. If the collocation is not found, special code in Morphy deter‐
110 mines whether a verb collocation includes a preposition. If it does, a
111 function is called to try to find the base form in the following man‐
112 ner. It is assumed that the first word in the collocation is a verb
113 and that the last word is a noun. The algorithm then builds a search
114 string with the base forms of the verb and noun, leaving the remainder
115 of the collocation (usually just the preposition, but more words may be
116 involved) in the middle. For example, passed asking for it, the data‐
117 base search would be performed with ask for it, which is found in Word‐
118 Net, and therefore returned from Morphy. If a verb collocation does
119 not contain a preposition, then the base form of each word in the col‐
120 location is found and WordNet is searched for the resulting string.
121
122 Hyphenation
123 Hyphenation also presents special difficulties when searching WordNet.
124 It is often a subjective decision as to whether a word is hyphenated,
125 joined as one word, or is a collocation of several words, and which of
126 the various forms are entered into WordNet. When Morphy breaks a
127 string into "words", it looks for both spaces and hyphens as delim‐
128 iters. It also looks for periods in strings and removes them if an
129 exact match is not found. A search for an abbreviation like oct.
130 return the synset for { October, Oct }. Not every pattern of hyphen‐
131 ated and collocated string is searched for properly, so it may be
132 advantageous to specify several search strings if the results of a
133 search attempt seem incomplete.
134
135 Special Processing for nouns ending with 'ful'
136 Morphy contains code that searches for nouns ending with ful and per‐
137 forms a transformation on the substring preceeding it. It then appends
138 'ful' back onto the resulting string and returns it. For example, if
139 passed the nouns boxesful, it will return boxful.
140
142 Since many noun collocations contains prepositions, such as
143 line of products, an algorithm similar to that used for verbs should be
144 written for nouns. In the present scheme, if Morphy is passed
145 lines of products, the search string becomes line of product, which is
146 not in WordNet
147
148 Morphy will allow non-words to be converted to words, if they follow
149 one of the rules described above. For example, it will happily convert
150 plantes to plants.
151
153 WNHOME Base directory for WordNet. Default is
154 /usr/local/WordNet-3.0.
155
156 WNSEARCHDIR Directory in which the WordNet database has been
157 installed. Default is WNHOME/dict.
158
160 HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
161 Base directory for WordNet. Default is C:\Pro‐
162 gram Files\WordNet\3.0.
163
165 pos.exc morphology exception lists
166
168 wn(1), wnb(1), binsrch(3), morph(3), wndb(5), wninput(7).
169
170
171
172WordNet 3.0 Dec 2006 MORPHY(7)