1Lingua::StopWords(3pm)User Contributed Perl DocumentationLingua::StopWords(3pm)
2
3
4

NAME

6       Lingua::StopWords - Stop words for several languages.
7

SYNOPSIS

9           use Lingua::StopWords qw( getStopWords );
10           my $stopwords = getStopWords('en');
11
12           my @words = qw( i am the walrus goo goo g'joob );
13
14           # prints "walrus goo goo g'joob"
15           print join ' ', grep { !$stopwords->{$_} } @words;
16

DESCRIPTION

18       In keyword search, it is common practice to suppress a collection of
19       "stopwords": words such as "the", "and", "maybe", etc. which exist in
20       in a large number of documents and do not tell you anything important
21       about any document which contains them.  This module provides such
22       "stoplists" in several languages.
23
24   Supported Languages
25           |-----------------------------------------------------------|
26           | Language   | ISO code | default encoding | also available |
27           |-----------------------------------------------------------|
28           | Danish     | da       | ISO-8859-1       | UTF-8          |
29           | Dutch      | nl       | ISO-8859-1       | UTF-8          |
30           | English    | en       | ISO-8859-1       | UTF-8          |
31           | Finnish    | fi       | ISO-8859-1       | UTF-8          |
32           | French     | fr       | ISO-8859-1       | UTF-8          |
33           | German     | de       | ISO-8859-1       | UTF-8          |
34           | Hungarian  | hu       | ISO-8859-2       | UTF-8          |
35           | Indonesian | id       | ISO-8859-1       | UTF-8          |
36           | Italian    | it       | ISO-8859-1       | UTF-8          |
37           | Norwegian  | no       | ISO-8859-1       | UTF-8          |
38           | Portuguese | pt       | ISO-8859-1       | UTF-8          |
39           | Romanian   | ro       | ISO-8859-2       | UTF-8          |
40           | Spanish    | es       | ISO-8859-1       | UTF-8          |
41           | Swedish    | sv       | ISO-8859-1       | UTF-8          |
42           | Russian    | ru       | KOI8-R           | UTF-8          |
43           |-----------------------------------------------------------|
44

FUNCTIONS

46   getStopWords
47           my $stoplist      = getStopWords('en');
48           my $utf8_stoplist = getStopWords('en', 'UTF-8');
49
50       Retrieve a stoplist in the form of a hashref where the keys are all
51       stopwords and the values are all 1.
52
53           $stoplist = {
54               and => 1,
55               if  => 1,
56               # ...
57           };
58
59       getStopWords() expects 1-2 arguments.  The first, which is required, is
60       an ISO code representing a supported language.  If the ISO code cannot
61       be found, getStopWords returns undef.
62
63       The second argument should be 'UTF-8' if you want the stopwords encoded
64       in UTF-8.  The UTF-8 flag will be turned on, so make sure you
65       understand all the implications of that.
66

INSTALLATION

68       To install this module type the following:
69
70          perl Build.PL
71          ./Build
72          ./Build test
73          ./Build install
74

SEE ALSO

76       The stoplists supplied by this module were created as part of the
77       Snowball project (see <http://snowball.tartarus.org>,
78       Lingua::Stem::Snowball).
79
80       Lingua::EN::StopWords provides a different stoplist for English.
81

SOURCE REPOSITORY

83       <https://github.com/wollmers/Lingua-StopWords>
84

AUTHOR

86       Maintained by Helmut Wollmersdorfer <helmut@wollmersdorfer.at> and
87       Marvin Humphrey <marvin at rectangular dot com>.  Original author
88       Fabien Potencier, <fabpot at cpan dot org>.
89
91       Copyright 2021 Helmut Wollmersdorfer Copyright 2004-2008 Fabien
92       Potencier, Marvin Humphrey
93

LICENSE

95       This library is free software; you can redistribute it and/or modify it
96       under the same terms as Perl itself, either Perl version 5.8.3 or, at
97       your option, any later version of Perl 5 you may have available.
98
99
100
101perl v5.36.0                      2022-07-22            Lingua::StopWords(3pm)
Impressum