1Lingua::Stem::EnBroken(U3s)er Contributed Perl DocumentatLiionngua::Stem::EnBroken(3)
2
3
4
6 Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic'
7 English
8
10 use Lingua::Stem::EnBroken;
11 my $stems = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
12 -locale => 'en',
13 -exceptions => $exceptions_hash,
14 });
15
17 This routine MIS-applies the Porter Stemming Algorithm to its
18 parameters, returning the stemmed words. It is an intentionally broken
19 version of Lingua::Stem::En for people needing backwards compatibility
20 with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you
21 aren't one of those people.
22
23 It is derived from the C program "stemmer.c" as found in freewais and
24 elsewhere, which contains these notes:
25
26 Purpose: Implementation of the Porter stemming algorithm documented
27 in: Porter, M.F., "An Algorithm For Suffix Stripping,"
28 Program 14 (3), July 1980, pp. 130-137.
29 Provenance: Written by B. Frakes and C. Cox, 1986.
30
31 I have re-interpreted areas that use Frakes and Cox's "WordSize"
32 function. My version may misbehave on short words starting with "y",
33 but I can't think of any examples.
34
35 The step numbers correspond to Frakes and Cox, and are probably in
36 Porter's article (which I've not seen). Porter's algorithm still has
37 rough spots (e.g current/currency, -ings words), which I've not
38 attempted to cure, although I have added support for the British -ise
39 suffix.
40
42 2003.09.28 - Documentation fix
43
44 2000.09.14 - Forked from the Lingua::Stem::En.pm module to provide
45 a backward compatibly broken version for people needing
46 consistent behavior with 0.30 and 0.40 more than accurate
47 stemming.
48
50 stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions
51 });
52 Stems a list of passed words using the rules of US English. Returns
53 an anonymous array reference to the stemmed words.
54
55 Example:
56
57 my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words,
58 -locale => 'en',
59 -exceptions => \%exceptions,
60 });
61
62 stem_caching({ -level => 0|1|2 });
63 Sets the level of stem caching.
64
65 '0' means 'no caching'. This is the default level.
66
67 '1' means 'cache per run'. This caches stemming results during a
68 single
69 call to 'stem'.
70
71 '2' means 'cache indefinitely'. This caches stemming results until
72 either the process exits or the 'clear_stem_cache' method is
73 called.
74
75 clear_stem_cache;
76 Clears the cache of stemmed words
77
79 This code is almost entirely derived from the Porter 2.1 module written
80 by Jim Richardson.
81
83 Lingua::Stem
84
86 Jim Richardson, University of Sydney
87 jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html
88
89 Integration in Lingua::Stem by
90 Benjamin Franz, FreeRun Technologies,
91 snowhare@nihongo.org or http://www.nihongo.org/snowhare/
92
94 Jim Richardson, University of Sydney Benjamin Franz, FreeRun
95 Technologies
96
97 This code is freely available under the same terms as Perl.
98
101perl v5.12.1 2010-09-14 Lingua::Stem::EnBroken(3)