1Lingua::Stem::En(3)   User Contributed Perl Documentation  Lingua::Stem::En(3)
2
3
4

NAME

6       Lingua::Stem::En - Porter's stemming algorithm for 'generic' English
7

SYNOPSIS

9           use Lingua::Stem::En;
10           my $stems   = Lingua::Stem::En::stem({ -words => $word_list_reference,
11                                               -locale => 'en',
12                                           -exceptions => $exceptions_hash,
13                                            });
14

DESCRIPTION

16       This routine applies the Porter Stemming Algorithm to its parameters,
17       returning the stemmed words.
18
19       It is derived from the C program "stemmer.c" as found in freewais and
20       elsewhere, which contains these notes:
21
22          Purpose:    Implementation of the Porter stemming algorithm documented
23                      in: Porter, M.F., "An Algorithm For Suffix Stripping,"
24                      Program 14 (3), July 1980, pp. 130-137.
25          Provenance: Written by B. Frakes and C. Cox, 1986.
26
27       I have re-interpreted areas that use Frakes and Cox's "WordSize"
28       function. My version may misbehave on short words starting with "y",
29       but I can't think of any examples.
30
31       The step numbers correspond to Frakes and Cox, and are probably in
32       Porter's article (which I've not seen).  Porter's algorithm still has
33       rough spots (e.g current/currency, -ings words), which I've not
34       attempted to cure, although I have added support for the British -ise
35       suffix.
36

CHANGES

38        1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace,
39                     optionalized the export of the 'stem' routine
40                     into the caller's namespace, added named parameters
41
42        1999.06.24 - Switch core implementation of the Porter stemmer to
43                     the one written by Jim Richardson <jimr@maths.usyd.edu.au>
44
45        2000.08.25 - 2.11 Added stemming cache
46
47        2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm
48                     Error was entirely my fault - I completely forgot to include
49                     rule sets 2,3, and 4 starting with Lingua::Stem 0.30.
50                     -- Jerilyn Franz
51
52        2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens.
53
54        2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention
55                     for 'private' subroutines. Changed Exporter invokation to more
56                     portable 'require' vice 'use'.
57
58        2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming.
59
60        2009.07.27 - 2.16 Documentation Fix
61
62        2020.06.20 - 2.30 Version renumber for module consistency.
63
64        2020.09.26 - 2.31 Fix for Latin1/UTF8 issue in documentation
65

METHODS

67       stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
68       });
69           Stems a list of passed words using the rules of US English. Returns
70           an anonymous array reference to the stemmed words.
71
72           Example:
73
74             my @words         = ( 'wordy', 'another' );
75             my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words,
76                                                         -locale => 'en',
77                                                     -exceptions => \%exceptions,
78                                     });
79
80           If the first element of @words is a list reference, then the
81           stemming is performed 'in place' on that list (modifying the passed
82           list directly instead of copying it to a new array).
83
84           This is only useful if you do not need to keep the original list.
85           If you do need to keep the original list, use the normal semantic
86           of having 'stem' return a new list instead - that is faster than
87           making your own copy and using the 'in place' semantics since the
88           primary difference between 'in place' and 'by value' stemming is
89           the creation of a copy of the original list.  If you don't need the
90           original list, then the 'in place' stemming is about 60% faster.
91
92           Example of 'in place' stemming:
93
94             my $words         = [ 'wordy', 'another' ];
95             my $stemmed_words = Lingua::Stem::En::stem({ -words => [$words],
96                                     -locale => 'en',
97                                 -exceptions => \%exceptions,
98                                 });
99
100           The 'in place' mode returns a reference to the original list with
101           the words stemmed.
102
103       stem_caching({ -level => 0|1|2 });
104           Sets the level of stem caching.
105
106           '0' means 'no caching'. This is the default level.
107
108           '1' means 'cache per run'. This caches stemming results during a
109           single
110               call to 'stem'.
111
112           '2' means 'cache indefinitely'. This caches stemming results until
113               either the process exits or the 'clear_stem_cache' method is
114           called.
115
116       clear_stem_cache;
117           Clears the cache of stemmed words
118

NOTES

120       This code is almost entirely derived from the Porter 2.1 module written
121       by Jim Richardson.
122

SEE ALSO

124        Lingua::Stem
125

AUTHOR

127         Jim Richardson, University of Sydney
128         jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html
129
130         Integration in Lingua::Stem by
131         Jerilyn Franz, FreeRun Technologies,
132         <cpan@jerilyn.info>
133
135       Jim Richardson, University of Sydney Jerilyn Franz, FreeRun
136       Technologies
137
138       This code is freely available under the same terms as Perl.
139

BUGS

TODO

142perl v5.32.1                      2021-01-27               Lingua::Stem::En(3)
Impressum