1hspell(3)                            Ivrix                           hspell(3)
2
3
4

NAME

6       hspell - Hebrew spellchecker (C API)
7

SYNOPSIS

9       #include <hspell.h>
10
11       int hspell_init(struct dict_radix **dictp, int flags);
12
13       void hspell_uninit(struct dict_radix *dictp);
14
15       int  hspell_check_word(struct  dict_radix  *dict, const char *word, int
16       *preflen);
17
18       void  hspell_trycorrect(struct  dict_radix  *dict,  const  char  *word,
19       struct corlist *cl);
20
21       int corlist_init(struct corlist *cl);
22
23       int corlist_free(struct corlist *cl);
24
25       int corlist_n(struct corlist *cl);
26
27       char *corlist_str(struct corlist *cl, int i);
28
29       int hspell_is_canonic_gimatria(const char *word);
30
31       typedef  int  hspell_word_split_callback_func(const  char  *word, const
32       char *baseword, int preflen, int prefspec);
33
34       int  hspell_enum_splits(struct  dict_radix  *dict,  const  char  *word,
35       hspell_word_split_callback_func *enumf);
36
37       void hspell_set_dictionary_path(const char *path);
38
39       const char *hspell_get_dictionary_path(void);
40

DESCRIPTION

42       This  manual  describes  the  C  API of the Hspell Hebrew spellchecker.
43       Please refer to hspell(1) for a description of the Hspell project,  its
44       spelling standard, and how it works.
45
46       The  hspell_init()  function  must  be  called  first to initialize the
47       Hspell library. It sets up some global structures (see CAVEATS section)
48       and  then  reads the necessary dictionary files (whose places are fixed
49       when the library is built). The 'dictp' parameter is  a  pointer  to  a
50       struct  dict_radix* object, which is modified to point to a newly allo‐
51       cated dictionary.  A typical hspell_init() call therefore looks like
52
53          struct dict_radix *dict;
54          hspell_init(&dict, flags);
55
56       Note that the (struct dict_radix*) type is  an  opaque  pointer  -  the
57       library user has no access to the separate fields in this structure.
58
59       The  'flags'  parameter  can  contain a bitwise or'ing of several flags
60       that modify Hspell's default behavior; Turning on  HSPELL_OPT_HE_SHEELA
61       allows Hspell to recognize the interrogative He prefix (he ha-she'ela).
62       HSPELL_OPT_DEFAULT is a synonym for turning on no special  flag,  i.e.,
63       it evaluates to 0.
64
65       hspell_init() returns 0 on success, or negative numbers on errors. Cur‐
66       rently, the only error is -1, meaning the dictionary files could not be
67       read.
68
69       The hspell_uninit() function undoes the effects of hspell_init(), free‐
70       ing any memory that was allocated during initialization.
71
72       The hspell_check_word() function checks whether a  certain  word  is  a
73       correct  Hebrew word (possibly with prefix particles attached in a syn‐
74       tacticly-correct manner). 1 is returned if the word is correct, or 0 if
75       it is incorrect.
76
77       The  'word'  parameter should be a single Hebrew word, in the iso8859-8
78       encoding, possibly containing the ASCII quote or  double-quote  charac‐
79       ters  (signifying the geresh and gershayim used in Hebrew for abbrevia‐
80       tions, acronyms, and a few foreign sounds).  If  the  calling  programs
81       works  with  other  encodings,  it  must  convert the word to iso8859-8
82       first. In particular cp1255 (the MS-Windows Hebrew encoding) extensions
83       to iso8859-8 like niqqud characters, geresh or gershayim, are currently
84       not recognized and must be removed  from  the  word  prior  to  calling
85       hspell_check_word().
86
87       Into  the  'preflen'  parameter, the function writes back the number of
88       characters it recognized as a prefix particle - the rest of the  'word'
89       is  a  stand-alone word.  Because Hebrew words typically can be read in
90       several different ways, this feature (of getting just one  prefix  from
91       one  possible  reading) is usually not very useful, and it is likely to
92       be removed in a future version.
93
94       The hspell_enum_splits() function provides a way to  get  all  possible
95       splitting  of  the  given 'word' into an optional prefix particle and a
96       stand-alone word.  For each possible (and legal, as some  words  cannot
97       accept  certain  prefixes)  split,  a user-defined callback function is
98       called. This callback function is given the whole word, the  length  of
99       the  prefix,  the stand-alone word, and a bitfield which describes what
100       types of words this prefix can get.  Note that in some  cases,  a  word
101       beginning with the letter waw gets this waw doubled before a prefix, so
102       sometimes strlen(word)!=strlen(baseword)+preflen.
103
104       The hspell_trycorrect() tries to find a list  of  possible  corrections
105       for  an  incorrect word.  Because in Hebrew the word density is high (a
106       random string of letters, especially if short, has a  high  probability
107       of  being  a  correct  word), this function attempts to try corrections
108       based on the assumption of a spelling  error  (replacement  of  letters
109       that  sound  alike, missing or spurious immot qri'a), not typo (slipped
110       finger on the keyboard, etc.) - see also CAVEATS.
111
112       hspell_trycorrect() returns the correction list  into  a  structure  of
113       type  struct  corlist.   This  structure must be first allocated with a
114       call to corlist_init() and subsequently freed with corlist_free().  The
115       corlist_n()  macro  returns  the  number  of words held in an allocated
116       corlist, and corlist_str() returns the i'th word. Accordingly, here  is
117       an example usage of hspell_trycorrect():
118
119          struct corlist cl;
120          printf ("Found misspelled word %s. Possible corrections:\n", w);
121          corlist_init (&cl);
122          hspell_trycorrect (dict, w, &cl);
123          for (i=0; i<corlist_n(&cl); i++) {
124              printf ("%s\n", corlist_str(&cl, i));
125          }
126
127       The hspell_is_canonic_gimatria() function checks whether the given word
128       is a canonic gimatria - i.e., the proper way to write in  gimatria  the
129       number  it represents. The caller might want to accept canonic gimatria
130       as proper Hebrew words, even if hspell_check_word() previously reported
131       such  word  to  be  a  non-existent word.  hspell_is_canonic_gimatria()
132       returns the number represented as gimatria in 'word' if  it  is  indeed
133       proper gimatria (in canonic form), or 0 otherwise.
134
135       hspell_init()  normally reads the dictionary files from a path compiled
136       into the library. This makes sense when the library's code and the dic‐
137       tionaries  are  distributed together, but in some scenarios the library
138       user might want to use the Hspell dictionaries that are already present
139       on  the  system  in  an arbitrary path. The function hspell_set_dictio‐
140       nary_path() can be used to set this path, and  should  be  used  before
141       calling  hspell_init().   The  given path is that of the word list, and
142       other  input  files  have  that   path   with   an   appended   prefix.
143       hspell_get_dictionary_path()  can  be used to find the current path. On
144       many         installations,          this          defaults          to
145       "/usr/local/share/hspell/hebrew.wgz".
146
147

LINKING

149       On most systems, the Hspell library is compiled to use the Zlib library
150       for reading the compressed dictionaries. Therefore, a  program  linking
151       with the Hspell library must also be linked with the Zlib library (usu‐
152       ally, by adding "-lz" to the compilation line).
153
154       Programs that use autoconf to search for  the  Hspell  library,  should
155       remember  to  tell  AC_CHECK_LIB to also link with the -lz library when
156       checking for -lhspell.
157
158

CAVEATS

160       While the API described here has been stable for years, it  may  change
161       in  the future. Users are encouraged to compare the values of the inte‐
162       ger  macros  HSPELL_VERSION_MAJOR  and  HSPELL_VERSION_MINOR  to  those
163       expected  by  the  writer  of  the  program. A third macro, HSPELL_VER‐
164       SION_EXTRA contains a string which can  describe  subrelease  modifica‐
165       tions (e.g., beta versions).
166
167       The current Hspell C API is very low-level, in the sense that it leaves
168       the user to implement many features that some users  take  for  granted
169       that a spell-checker should provide. For example it doesn't provide any
170       facilities for a user-defined personal dictionary. It also has separate
171       functions  for  checking  valid Hebrew words and valid gimatria, and no
172       function to do both. It is assumed that the caller -  a  bigger  spell-
173       checking  library  or  word  processor (for example), will already have
174       these facilities. If not, you may  wish  to  look  at  the  sources  of
175       hspell(1) for an example implementation.
176
177       Currently  there  is  no  concept  of  separate Hspell "contexts" in an
178       application.  Some of the context is now global for the entire applica‐
179       tion:  currently,  a single list of legal prefix-particles is kept, and
180       the dictionary read by hspell_init() is always  read  from  the  global
181       default  place. This may be solved in a later version, e.g., by switch‐
182       ing to an API like:
183
184          context = hspell_new_context();
185          hspell_set_dictionary_path(context, "/some/path/hebrew.wgz");
186          hspell_init(context, flags);
187          ...
188          hspell_check_word(context, word, preflenp);
189
190       Note that despite the global context mentioned above, after initializa‐
191       tion  all  functions  described here are thread-safe, because they only
192       read the dictionary data, not write to it.
193
194       hspell_trycorrect() is not as powerful as  it  could  have  been,  with
195       typos  or  certain kinds of spelling mistakes not giving useful correc‐
196       tion suggestions. Along with more types of corrections,  hspell_trycor‐
197       rect()  needs  a better way to order the likelihood of the corrections,
198       as an unordered list of 100 corrections would be  just  as  useful  (or
199       rather, useless) as none.
200
201       In  some  cases  of  errors  during hspell_init(), warning messages are
202       printed to the standard errors. This is a bad thing for  a  library  to
203       do.
204
205       There are too many CAVEATS in this manual.
206
207

VERSION

209       The  version  of  hspell described by this manual page is 1.1 (December
210       31, 2009)
211
213       Copyright (C) 2000-2009, Nadav Har'El <nyh@math.technion.ac.il> and Dan
214       Kenigsberg <danken@cs.technion.ac.il>.
215
216       Hspell  is free software, released under the GNU General Public License
217       (GPL).  Note that not only the programs in the distribution,  but  also
218       the  dictionary  files and the generated word lists, are licensed under
219       the GPL.  There is no warranty of any kind.
220
221       See the LICENSE file for more information and the exact license terms.
222
223       The   latest   version   of   this   software   can   be    found    in
224       http://hspell.ivrix.org.il/
225

SEE ALSO

227       hspell(1)
228
229
230
231Hspell 1.1                     31 December 2009                      hspell(3)
Impressum