1hspell(3) Ivrix hspell(3)
2
3
4
6 hspell - Hebrew spellchecker (C API)
7
9 #include <hspell.h>
10
11 int hspell_init(struct dict_radix **dictp, int flags);
12
13 void hspell_uninit(struct dict_radix *dictp);
14
15 int hspell_check_word(struct dict_radix *dict, const char *word, int
16 *preflen);
17
18 void hspell_trycorrect(struct dict_radix *dict, const char *word,
19 struct corlist *cl);
20
21 int corlist_init(struct corlist *cl);
22
23 int corlist_free(struct corlist *cl);
24
25 int corlist_n(struct corlist *cl);
26
27 char *corlist_str(struct corlist *cl, int i);
28
29 unsigned int hspell_is_canonic_gimatria(const char *word);
30
31 typedef int hspell_word_split_callback_func(const char *word, const
32 char *baseword, int preflen, int prefspec);
33
34 int hspell_enum_splits(struct dict_radix *dict, const char *word,
35 hspell_word_split_callback_func *enumf);
36
37 void hspell_set_dictionary_path(const char *path);
38
39 const char *hspell_get_dictionary_path(void);
40
42 This manual describes the C API of the Hspell Hebrew spellchecker.
43 Please refer to hspell(1) for a description of the Hspell project, its
44 spelling standard, and how it works.
45
46 The hspell_init() function must be called first to initialize the
47 Hspell library. It sets up some global structures (see CAVEATS section)
48 and then reads the necessary dictionary files (whose places are fixed
49 when the library is built). The 'dictp' parameter is a pointer to a
50 struct dict_radix* object, which is modified to point to a newly allo‐
51 cated dictionary. A typical hspell_init() call therefore looks like
52
53 struct dict_radix *dict;
54 hspell_init(&dict, flags);
55
56 Note that the (struct dict_radix*) type is an opaque pointer - the
57 library user has no access to the separate fields in this structure.
58
59 The 'flags' parameter can contain a bitwise or'ing of several flags
60 that modify Hspell's default behavior; Turning on HSPELL_OPT_HE_SHEELA
61 allows Hspell to recognize the interrogative He prefix (he ha-she'ela).
62 HSPELL_OPT_DEFAULT is a synonym for turning on no special flag, i.e.,
63 it evaluates to 0.
64
65 hspell_init() returns 0 on success, or negative numbers on errors. Cur‐
66 rently, the only error is -1, meaning the dictionary files could not be
67 read.
68
69 The hspell_uninit() function undoes the effects of hspell_init(), free‐
70 ing any memory that was allocated during initialization.
71
72 The hspell_check_word() function checks whether a certain word is a
73 correct Hebrew word (possibly with prefix particles attached in a syn‐
74 tacticly-correct manner). 1 is returned if the word is correct, or 0 if
75 it is incorrect.
76
77 The 'word' parameter should be a single Hebrew word, in the iso8859-8
78 encoding, possibly containing the ASCII quote or double-quote charac‐
79 ters (signifying the geresh and gershayim used in Hebrew for abbrevia‐
80 tions, acronyms, and a few foreign sounds). If the calling programs
81 works with other encodings, it must convert the word to iso8859-8
82 first. In particular cp1255 (the MS-Windows Hebrew encoding) extensions
83 to iso8859-8 like niqqud characters, geresh or gershayim, are currently
84 not recognized and must be removed from the word prior to calling
85 hspell_check_word().
86
87 Into the 'preflen' parameter, the function writes back the number of
88 characters it recognized as a prefix particle - the rest of the 'word'
89 is a stand-alone word. Because Hebrew words typically can be read in
90 several different ways, this feature (of getting just one prefix from
91 one possible reading) is usually not very useful, and it is likely to
92 be removed in a future version.
93
94 The hspell_enum_splits() function provides a way to get all possible
95 splitting of the given 'word' into an optional prefix particle and a
96 stand-alone word. For each possible (and legal, as some words cannot
97 accept certain prefixes) split, a user-defined callback function is
98 called. This callback function is given the whole word, the length of
99 the prefix, the stand-alone word, and a bitfield which describes what
100 types of words this prefix can get. Note that in some cases, a word
101 beginning with the letter waw gets this waw doubled before a prefix, so
102 sometimes strlen(word)!=strlen(baseword)+preflen.
103
104 The hspell_trycorrect() tries to find a list of possible corrections
105 for an incorrect word. Because in Hebrew the word density is high (a
106 random string of letters, especially if short, has a high probability
107 of being a correct word), this function attempts to try corrections
108 based on the assumption of a spelling error (replacement of letters
109 that sound alike, missing or spurious immot qri'a), not typo (slipped
110 finger on the keyboard, etc.) - see also CAVEATS.
111
112 hspell_trycorrect() returns the correction list into a structure of
113 type struct corlist. This structure must be first allocated with a
114 call to corlist_init() and subsequently freed with corlist_free(). The
115 corlist_n() macro returns the number of words held in an allocated
116 corlist, and corlist_str() returns the i'th word. Accordingly, here is
117 an example usage of hspell_trycorrect():
118
119 struct corlist cl;
120 printf ("Found misspelled word %s. Possible corrections:\n", w);
121 corlist_init (&cl);
122 hspell_trycorrect (dict, w, &cl);
123 for (i=0; i<corlist_n(&cl); i++) {
124 printf ("%s\n", corlist_str(&cl, i));
125 }
126
127 The hspell_is_canonic_gimatria() function checks whether the given word
128 is a canonic gimatria - i.e., the proper way to write in gimatria the
129 number it represents. The caller might want to accept canonic gimatria
130 as proper Hebrew words, even if hspell_check_word() previously reported
131 such word to be a non-existent word. hspell_is_canonic_gimatria()
132 returns the number represented as gimatria in 'word' if it is indeed
133 proper gimatria (in canonic form), or 0 otherwise.
134
135 hspell_init() normally reads the dictionary files from a path compiled
136 into the library. This makes sense when the library's code and the dic‐
137 tionaries are distributed together, but in some scenarios the library
138 user might want to use the Hspell dictionaries that are already present
139 on the system in an arbitrary path. The function hspell_set_dictio‐
140 nary_path() can be used to set this path, and should be used before
141 calling hspell_init(). The given path is that of the word list, and
142 other input files have that path with an appended prefix.
143 hspell_get_dictionary_path() can be used to find the current path. On
144 many installations, this defaults to
145 "/usr/local/share/hspell/hebrew.wgz".
146
147
149 On most systems, the Hspell library is compiled to use the Zlib library
150 for reading the compressed dictionaries. Therefore, a program linking
151 with the Hspell library must also be linked with the Zlib library (usu‐
152 ally, by adding "-lz" to the compilation line).
153
154 Programs that use autoconf to search for the Hspell library, should
155 remember to tell AC_CHECK_LIB to also link with the -lz library when
156 checking for -lhspell.
157
158
160 While the API described here has been stable for years, it may change
161 in the future. Users are encouraged to compare the values of the inte‐
162 ger macros HSPELL_VERSION_MAJOR and HSPELL_VERSION_MINOR to those
163 expected by the writer of the program. A third macro, HSPELL_VER‐
164 SION_EXTRA contains a string which can describe subrelease modifica‐
165 tions (e.g., beta versions).
166
167 The current Hspell C API is very low-level, in the sense that it leaves
168 the user to implement many features that some users take for granted
169 that a spell-checker should provide. For example it doesn't provide any
170 facilities for a user-defined personal dictionary. It also has separate
171 functions for checking valid Hebrew words and valid gimatria, and no
172 function to do both. It is assumed that the caller - a bigger spell-
173 checking library or word processor (for example), will already have
174 these facilities. If not, you may wish to look at the sources of
175 hspell(1) for an example implementation.
176
177 Currently there is no concept of separate Hspell "contexts" in an
178 application. Some of the context is now global for the entire applica‐
179 tion: currently, a single list of legal prefix-particles is kept, and
180 the dictionary read by hspell_init() is always read from the global
181 default place. This may be solved in a later version, e.g., by switch‐
182 ing to an API like:
183
184 context = hspell_new_context();
185 hspell_set_dictionary_path(context, "/some/path/hebrew.wgz");
186 hspell_init(context, flags);
187 ...
188 hspell_check_word(context, word, preflenp);
189
190 Note that despite the global context mentioned above, after initializa‐
191 tion all functions described here are thread-safe, because they only
192 read the dictionary data, not write to it.
193
194 hspell_trycorrect() is not as powerful as it could have been, with
195 typos or certain kinds of spelling mistakes not giving useful correc‐
196 tion suggestions. Along with more types of corrections, hspell_trycor‐
197 rect() needs a better way to order the likelihood of the corrections,
198 as an unordered list of 100 corrections would be just as useful (or
199 rather, useless) as none.
200
201 In some cases of errors during hspell_init(), warning messages are
202 printed to the standard errors. This is a bad thing for a library to
203 do.
204
205 There are too many CAVEATS in this manual.
206
207
209 The version of hspell described by this manual page is 1.2.
210
212 Copyright (C) 2000-2012, Nadav Har'El <nyh@math.technion.ac.il> and Dan
213 Kenigsberg <danken@cs.technion.ac.il>.
214
215 Hspell is free software, released under the GNU Affero General Public
216 License (AGPL) version 3. Note that not only the programs in the dis‐
217 tribution, but also the dictionary files and the generated word lists,
218 are licensed under the AGPL. There is no warranty of any kind.
219
220 See the LICENSE file for more information and the exact license terms.
221
222 The latest version of this software can be found in
223 http://hspell.ivrix.org.il/
224
226 hspell(1)
227
228
229
230Hspell 1.2 28 February 2012 hspell(3)