1Lingua::Stem::En(3) User Contributed Perl Documentation Lingua::Stem::En(3)
2
3
4
6 Lingua::Stem::En - Porter's stemming algorithm for 'generic' English
7
9 use Lingua::Stem::En;
10 my $stems = Lingua::Stem::En::stem({ -words => $word_list_reference,
11 -locale => 'en',
12 -exceptions => $exceptions_hash,
13 });
14
16 This routine applies the Porter Stemming Algorithm to its parameters,
17 returning the stemmed words.
18
19 It is derived from the C program "stemmer.c" as found in freewais and
20 elsewhere, which contains these notes:
21
22 Purpose: Implementation of the Porter stemming algorithm documented
23 in: Porter, M.F., "An Algorithm For Suffix Stripping,"
24 Program 14 (3), July 1980, pp. 130-137.
25 Provenance: Written by B. Frakes and C. Cox, 1986.
26
27 I have re-interpreted areas that use Frakes and Cox's "WordSize"
28 function. My version may misbehave on short words starting with "y",
29 but I can't think of any examples.
30
31 The step numbers correspond to Frakes and Cox, and are probably in
32 Porter's article (which I've not seen). Porter's algorithm still has
33 rough spots (e.g current/currency, -ings words), which I've not
34 attempted to cure, although I have added support for the British -ise
35 suffix.
36
38 1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace,
39 optionalized the export of the 'stem' routine
40 into the caller's namespace, added named parameters
41
42 1999.06.24 - Switch core implementation of the Porter stemmer to
43 the one written by Jim Richardson <jimr@maths.usyd.edu.au>
44
45 2000.08.25 - 2.11 Added stemming cache
46
47 2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm
48 Error was entirely my fault - I completely forgot to include
49 rule sets 2,3, and 4 starting with Lingua::Stem 0.30.
50 -- Benjamin Franz
51
52 2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens.
53
54 2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention
55 for 'private' subroutines. Changed Exporter invokation to more
56 portable 'require' vice 'use'.
57
58 2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming.
59
60 2009.07.27 2.16 Documentation Fix
61
63 stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
64 });
65 Stems a list of passed words using the rules of US English. Returns
66 an anonymous array reference to the stemmed words.
67
68 Example:
69
70 my @words = ( 'wordy', 'another' );
71 my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words,
72 -locale => 'en',
73 -exceptions => \%exceptions,
74 });
75
76 If the first element of @words is a list reference, then the
77 stemming is performed 'in place' on that list (modifying the passed
78 list directly instead of copying it to a new array).
79
80 This is only useful if you do not need to keep the original list.
81 If you do need to keep the original list, use the normal semantic
82 of having 'stem' return a new list instead - that is faster than
83 making your own copy and using the 'in place' semantics since the
84 primary difference between 'in place' and 'by value' stemming is
85 the creation of a copy of the original list. If you don't need the
86 original list, then the 'in place' stemming is about 60% faster.
87
88 Example of 'in place' stemming:
89
90 my $words = [ 'wordy', 'another' ];
91 my $stemmed_words = Lingua::Stem::En::stem({ -words => [$words],
92 -locale => 'en',
93 -exceptions => \%exceptions,
94 });
95
96 The 'in place' mode returns a reference to the original list with
97 the words stemmed.
98
99 stem_caching({ -level => 0|1|2 });
100 Sets the level of stem caching.
101
102 '0' means 'no caching'. This is the default level.
103
104 '1' means 'cache per run'. This caches stemming results during a
105 single
106 call to 'stem'.
107
108 '2' means 'cache indefinitely'. This caches stemming results until
109 either the process exits or the 'clear_stem_cache' method is
110 called.
111
112 clear_stem_cache;
113 Clears the cache of stemmed words
114
116 This code is almost entirely derived from the Porter 2.1 module written
117 by Jim Richardson.
118
120 Lingua::Stem
121
123 Jim Richardson, University of Sydney
124 jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html
125
126 Integration in Lingua::Stem by
127 Benjamin Franz, FreeRun Technologies,
128 snowhare@nihongo.org or http://www.nihongo.org/snowhare/
129
131 Jim Richardson, University of Sydney Benjamin Franz, FreeRun
132 Technologies
133
134 This code is freely available under the same terms as Perl.
135
138perl v5.30.1 2020-01-30 Lingua::Stem::En(3)