1Lingua::Stem::EnBroken(U3s)er Contributed Perl DocumentatLiionngua::Stem::EnBroken(3)
2
3
4

NAME

6       Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic'
7       English
8

SYNOPSIS

10           use Lingua::Stem::EnBroken;
11           my $stems   = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
12                                               -locale => 'en',
13                                           -exceptions => $exceptions_hash,
14                                            });
15

DESCRIPTION

17       This routine MIS-applies the Porter Stemming Algorithm to its
18       parameters, returning the stemmed words. It is an intentionally broken
19       version of Lingua::Stem::En for people needing backwards compatibility
20       with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you
21       aren't one of those people.
22
23       It is derived from the C program "stemmer.c" as found in freewais and
24       elsewhere, which contains these notes:
25
26          Purpose:    Implementation of the Porter stemming algorithm documented
27                      in: Porter, M.F., "An Algorithm For Suffix Stripping,"
28                      Program 14 (3), July 1980, pp. 130-137.
29          Provenance: Written by B. Frakes and C. Cox, 1986.
30
31       I have re-interpreted areas that use Frakes and Cox's "WordSize"
32       function. My version may misbehave on short words starting with "y",
33       but I can't think of any examples.
34
35       The step numbers correspond to Frakes and Cox, and are probably in
36       Porter's article (which I've not seen).  Porter's algorithm still has
37       rough spots (e.g current/currency, -ings words), which I've not
38       attempted to cure, although I have added support for the British -ise
39       suffix.
40

CHANGES

42        2003.09.28 -  Documentation fix
43
44        2000.09.14 -  Forked from the Lingua::Stem::En.pm module to provide
45                      a backward compatibly broken version for people needing
46                      consistent behavior with 0.30 and 0.40 more than accurate
47                      stemming.
48

METHODS

50       stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
51       });
52           Stems a list of passed words using the rules of US English. Returns
53           an anonymous array reference to the stemmed words.
54
55           Example:
56
57             my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words,
58                                                         -locale => 'en',
59                                                     -exceptions => \%exceptions,
60                                     });
61
62       stem_caching({ -level => 0|1|2 });
63           Sets the level of stem caching.
64
65           '0' means 'no caching'. This is the default level.
66
67           '1' means 'cache per run'. This caches stemming results during a
68           single
69               call to 'stem'.
70
71           '2' means 'cache indefinitely'. This caches stemming results until
72               either the process exits or the 'clear_stem_cache' method is
73           called.
74
75       clear_stem_cache;
76           Clears the cache of stemmed words
77

NOTES

79       This code is almost entirely derived from the Porter 2.1 module written
80       by Jim Richardson.
81

SEE ALSO

83        Lingua::Stem
84

AUTHOR

86         Jim Richardson, University of Sydney
87         jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html
88
89         Integration in Lingua::Stem by
90         Benjamin Franz, FreeRun Technologies,
91         snowhare@nihongo.org or http://www.nihongo.org/snowhare/
92
94       Jim Richardson, University of Sydney Benjamin Franz, FreeRun
95       Technologies
96
97       This code is freely available under the same terms as Perl.
98

BUGS

TODO

101perl v5.12.1                      2010-09-14         Lingua::Stem::EnBroken(3)
Impressum