Text::Affixes(3pm)

1Text::Affixes(3)      User Contributed Perl Documentation     Text::Affixes(3)
2
3
4

NAME

6       Text::Affixes - Prefixes and suffixes analysis of text
7

SYNOPSIS

9         use Text::Affixes;
10         my $text = "Hello, world. Hello, big world.";
11         my $prefixes = get_prefixes($text);
12
13         # $prefixes now holds
14         # {
15         #     3 => {
16         #             'Hel' => 2,
17         #             'wor' => 2,
18         #     }
19         # }
20
21         # or
22
23         $prefixes = get_prefixes({min => 1, max => 2},$text);
24
25         # $prefixes now holds
26         # {
27         #     1 => {
28         #             'H' => 2,
29         #             'w' => 2,
30         #             'b' => 1,
31         #     },
32         #     2 => {
33         #             'He' => 2,
34         #             'wo' => 2,
35         #             'bi' => 1,
36         #     }
37         # }
38
39         # the use for get_suffixes is similar
40

DESCRIPTION

42       Provides methods for prefix and suffix analysis of text.
43

METHODS

45   get_prefixes
46       Extracts prefixes from text. You can specify the minimum and maximum
47       number of characters of prefixes you want.
48
49       Returns a reference to a hash, where the specified limits are mapped in
50       hashes; each of those hashes maps every prefix in the text into the
51       number of times it was found.
52
53       By default, both minimum and maximum limits are 3. If the minimum limit
54       is greater than the lower one, an empty hash is returned.
55
56       A prefix is considered to be a sequence of word characters (\w) in the
57       beginning of a word (that is, after a word boundary) that does not
58       reach the end of the word ("regular expressionly", a prefix is the $1
59       of /\b(\w+)\w/).
60
61         # extracting prefixes of size 3
62         $prefixes = get_prefixes( $text );
63
64         # extracting prefixes of sizes 2 and 3
65         $prefixes = get_prefixes( {min => 2}, $text );
66
67         # extracting prefixes of sizes 3 and 4
68         $prefixes = get_prefixes( {max => 4}, $text );
69
70         # extracting prefixes of sizes 2, 3 and 4
71         $prefixes = get_prefixes( {min => 2, max=> 4}, $text);
72
73   get_suffixes
74       The get_suffixes function is similar to the get_prefixes one. You
75       should read the documentation for that one and than come back to this
76       point.
77
78       A suffix is considered to be a sequence of word characters (\w) in the
79       end of a word (that is, before a word boundary) that does not start at
80       the beginning of the word ("regular expressionly" speaking, a suffix is
81       the $1 of /\w(\w+)\b/).
82
83         # extracting suffixes of size 3
84         $suffixes = get_suffixes( $text );
85
86         # extracting suffixes of sizes 2 and 3
87         $suffixes = get_suffixes( {min => 2}, $text );
88
89         # extracting suffixes of sizes 3 and 4
90         $suffixes = get_suffixes( {max => 4}, $text );
91
92         # extracting suffixes of sizes 2, 3 and 4
93         $suffixes = get_suffixes( {min => 2, max=> 4}, $text);
94

OPTIONS

96       Apart from deciding on a minimum and maximum size for prefixes or
97       suffixes, you can also decide on some configuration options.
98
99   exclude_numbers
100       Set to 0 if you consider numbers as part of words. Default value is 1.
101
102         # this
103         get_suffixes( {min => 1, max => 1, exclude_numbers => 0}, "Hello, but w8" );
104
105         # returns this:
106           {
107             1 => {
108                    'o' => 1,
109                    't' => 1,
110                    '8' => 1
111                  }
112           }
113
114   lowercase
115       Set to 1 to extract all prefixes in lowercase mode. Default value is 0.
116
117       ATTENTION: This does not mean that prefixes with uppercased characters
118       won't be extracted. It means they will be extracted after being
119       lowercased.
120
121         # this...
122         get_prefixes( {min => 2, max => 2, lowercase => 1}, "Hello, hello");
123
124         # returns this:
125           {
126             2 => {
127                    'he' => 2
128                  }
129           }
130

TO DO

132       •     Make it more efficient (use C for that)
133

AUTHOR

135       Jose Castro, "<cog@cpan.org>"
136

COPYRIGHT & LICENSE

138       Copyright 2004 Jose Castro, All Rights Reserved.
139
140       This program is free software; you can redistribute it and/or modify it
141       under the same terms as Perl itself.
142
143
144
145perl v5.36.0                      2022-07-22                  Text::Affixes(3)