Lingua::EN::Fathom(3pm)

1Lingua::EN::Fathom(3) User Contributed Perl DocumentationLingua::EN::Fathom(3)
2
3
4

NAME

6       Lingua::EN::Fathom - Measure readability of English text
7

SYNOPSIS

9           use Lingua::EN::Fathom;
10
11           my $text = Lingua::EN::Fathom->new();
12
13           $text->analyse_file("sample.txt"); # Analyse contents of a text file
14
15           $accumulate = 1;
16           $text->analyse_block($text_string,$accumulate); # Analyse contents of a text string
17
18           # Methods to return statistics on the analysed text
19           $text->num_chars;
20           $text->num_words;
21           $text->percent_complex_words;
22           $text->num_sentences;
23           $text->num_text_lines;
24           $text->num_blank_lines;
25           $text->num_paragraphs;
26           $text->syllables_per_word;
27           $text->words_per_sentence;
28           $text->unique_words;
29           $text->fog;
30           $text->flesch;
31           $text->kincaid;
32
33           # Call all of the above methods and present as a formatted report
34           print($text->report);
35
36           # get a hash of unique words, keyed by word  and occurrence as the value
37           $text->unique_words
38
39           # Print a list of unique words
40           %words = $text->unique_words;
41           foreach $word ( sort keys %words )
42           {
43             print("$words{$word} :$word\n");
44           }
45

REQUIRES

47       Lingua::EN::Syllable, Lingua::EN::Sentence
48

DESCRIPTION

50       This module analyses English text in either a string or file. Totals
51       are then calculated for the number of characters, words, sentences,
52       blank and non blank (text) lines and paragraphs.
53
54       Three common readability statistics are also derived, the Fog, Flesch
55       and Kincaid indices.
56
57       All of these properties can be accessed through individual methods, or
58       by generating a text report.
59
60       A hash of all unique words and the number of times they occur is
61       generated.
62

METHODS

64   new
65       The "new" method creates an instance of an text object This must be
66       called before any of the following methods are invoked. Note that the
67       object only needs to be created once, and can be reused with new input
68       data.
69
70          my $text = Lingua::EN::Fathom->new();
71
72   analyse_file
73       The "analyse_file" method takes as input the name of a text file.
74       Various text based statistics are calculated for the file. This method
75       and "analyse_block" are prerequisites for all the following methods. An
76       optional argument may be supplied to control accumulation of
77       statistics. If set to a non zero value, all statistics are accumulated
78       with each successive call.
79
80           $text->analyse_file("sample.txt");
81
82   analyse_block
83       The "analyse_block" method takes as input a text string. Various text
84       based statistics are calculated for the file. This method and
85       "analyse_file" are prerequisites for all the following methods. An
86       optional argument may be supplied to control accumulation of
87       statistics. If set to a non zero value, all statistics are accumulated
88       with each successive call.
89
90           $text->analyse_block($text_str);
91
92   num_chars
93       Returns the number of characters in the analysed text file or block.
94       This includes characters such as spaces, and punctuation marks.
95
96   num_words
97       Returns the number of words in the analysed text file or block. A word
98       must consist of letters a-z with at least one vowel sound, and
99       optionally an apostrophe or hyphen. Items such as "&, K108, NW" are not
100       counted as words.
101
102   percent_complex_words
103       Returns the percentage of complex words in the analysed text file or
104       block. A complex word must consist of three or more syllables. This
105       statistic is used to calculate the fog index.
106
107   num_sentences
108       Returns the number of sentences in the analysed text file or block. A
109       sentence is any group of words and non words terminated with a single
110       full stop. Spaces may occur before and after the full stop.
111
112   num_text_lines
113       Returns the number of lines containing some text in the analysed text
114       file or block.
115
116   num_blank_lines
117       Returns the number of lines NOT containing any text in the analysed
118       text file or block.
119
120   num_paragraphs
121       Returns the number of paragraphs in the analysed text file or block.
122
123   syllables_per_word
124       Returns the average number of syllables per word in the analysed text
125       file or block.
126
127   words_per_sentence
128       Returns the average number of words per sentence in the analysed text
129       file or block.
130
131   READABILITY
132       Three indices of text readability are calculated. They all measure
133       complexity as a function of syllables per word and words per sentence.
134       They assume the text is well formed and logical. You could analyse a
135       passage of nonsensical English and find the readability is quite good,
136       provided the words are not too complex and the sentences not too long.
137
138       For more information see:
139       <http://www.plainlanguage.com/Resources/readability.html>
140
141   fog
142       Returns the Fog index for the analysed text file or block.
143
144         ( words_per_sentence + percent_complex_words ) * 0.4
145
146       The Fog index, developed by Robert Gunning, is a well known and simple
147       formula for measuring readability. The index indicates the number of
148       years of formal education a reader of average intelligence would need
149       to read the text once and understand that piece of writing with its
150       word sentence workload.
151
152          18 unreadable
153          14 difficult
154          12 ideal
155          10 acceptable
156           8 childish
157
158   flesch
159       Returns the Flesch reading ease score for the analysed text file or
160       block.
161
162          206.835 - (1.015 * words_per_sentence) - (84.6 * syllables_per_word)
163
164       This score rates text on a 100 point scale. The higher the score, the
165       easier it is to understand the text. A score of 60 to 70 is considered
166       to be optimal.
167
168   kincaid
169       Returns the Flesch-Kincaid grade level score for the analysed text file
170       or block.
171
172          (11.8 * syllables_per_word) +  (0.39 * words_per_sentence) - 15.59;
173
174       This score rates text on U.S. grade school level. So a score of 8.0
175       means that the document can be understood by an eighth grader. A score
176       of 7.0 to 8.0 is considered to be optimal.
177
178   unique_words
179       Returns a hash of unique words. The words (in lower case) are held in
180       the hash keys while the number of occurrences are held in the hash
181       values.
182
183   report
184           print($text->report);
185
186       Produces a text based report containing all Fathom statistics for the
187       currently analysed text block or file. For example:
188
189       Number of characters       : 813 Number of words            : 135
190       Percent of complex words   : 20.00 Average syllables per word : 1.7704
191       Number of sentences        : 12 Average words per sentence : 11.2500
192       Number of text lines       : 13 Number of blank lines      : 8 Number
193       of paragraphs       : 4
194
195       READABILITY INDICES
196
197       Fog                        : 12.5000 Flesch                     :
198       45.6429 Flesch-Kincaid             : 9.6879
199
200       The return value is a string containing the report contents
201

POSSIBLE EXTENSIONS

206       Count white space and punctuation characters Allow user control over
207       what strictly defines a word
208

LIMITATIONS

210       The syllable count provided in Lingua::EN::Syllable is about 90%
211       accurate
212
213       Acronyms that contain vowels, like GPO, will be counted as words.
214
215       The fog index should exclude proper names
216

BUGS

218       None known
219

AUTHOR

221       Lingua::EN::Fathom was written by Kim Ryan <kimryan at cpan dot org>.
222

COPYRIGHT AND LICENSE

224       Copyright (c) 2023 Kim Ryan. All rights reserved.
225
226       This library is free software; you can redistribute it and/or modify it
227       under the same terms as Perl itself.
228
229
230
231perl v5.36.1                      2023-06-20             Lingua::EN::Fathom(3)