1Tagger(3)             User Contributed Perl Documentation            Tagger(3)
2
3
4

NAME

6       Lingua::EN::Tagger - Part-of-speech tagger for English natural language
7       processing.
8

SYNOPSIS

10           # Create a parser object
11           my $p = new Lingua::EN::Tagger;
12
13           # Add part of speech tags to a text
14           my $tagged_text = $p->add_tags($text);
15
16           ...
17
18           # Get a list of all nouns and noun phrases with occurrence counts
19           my %word_list = $p->get_words($text);
20
21           ...
22
23           # Get a readable version of the tagged text
24           my $readable_text = $p->get_readable($text);
25

DESCRIPTION

27       The module is a probability based, corpus-trained tagger that assigns
28       POS tags to English text based on a lookup dictionary and a set of
29       probability values.  The tagger assigns appropriate tags based on
30       conditional probabilities - it examines the preceding tag to determine
31       the appropriate tag for the current word.  Unknown words are classified
32       according to word morphology or can be set to be treated as nouns or
33       other parts of speech.
34
35       The tagger also extracts as many nouns and noun phrases as it can,
36       using a set of regular expressions.
37

CONSTRUCTOR

39       new %PARAMS
40           Class constructor.  Takes a hash with the following parameters
41           (shown with default values):
42
43           unknown_word_tag => ''
44               Tag to assign to unknown words
45
46           stem => 0
47               Stem single words using Lingua::Stem::EN
48
49           weight_noun_phrases => 0
50               When returning occurrence counts for a noun phrase, multiply
51               the value by the number of words in the NP.
52
53           longest_noun_phrase => 5
54               Will ignore noun phrases longer than this threshold. This
55               affects only the get_words() and get_nouns() methods.
56
57           relax => 0
58               Relax the Hidden Markov Model: this may improve accuracy for
59               uncommon words, particularly words used polysemously
60

METHODS

62       add_tags TEXT
63           Examine the string provided and return it fully tagged (XML style)
64
65       add_tags_incrementally TEXT
66           Examine the string provided and return it fully tagged (XML style)
67           but do not reset the internal part-of-speech state between
68           invocations.
69
70       get_words TEXT
71           Given a text string, return as many nouns and noun phrases as
72           possible.  Applies add_tags and involves three stages:
73
74               * Tag the text
75               * Extract all the maximal noun phrases
76               * Recursively extract all noun phrases from the MNPs
77
78       get_readable TEXT
79           Return an easy-on-the-eyes tagged version of a text string.
80           Applies add_tags and reformats to be easier to read.
81
82       get_sentences TEXT
83           Returns an anonymous array of sentences (without POS tags) from a
84           text.
85
86       get_proper_nouns TAGGED_TEXT
87           Given a POS-tagged text, this method returns a hash of all proper
88           nouns and their occurrence frequencies. The method is greedy and
89           will return multi-word phrases, if possible, so it would find
90           ``Linguistic Data Consortium'' as a single unit, rather than as
91           three individual proper nouns. This method does not stem the found
92           words.
93
94       get_nouns TAGGED_TEXT
95           Given a POS-tagged text, this method returns all nouns and their
96           occurrence frequencies.
97
98       get_max_noun_phrases TAGGED_TEXT
99           Given a POS-tagged text, this method returns only the maximal noun
100           phrases.  May be called directly, but is also used by
101           get_noun_phrases
102
103       get_noun_phrases TAGGED_TEXT
104           Similar to get_words, but requires a POS-tagged text as an
105           argument.
106
107       install
108           Reads some included corpus data and saves it in a stored hash on
109           the local file system. This is called automatically if the tagger
110           can't find the stored lexicon.
111

AUTHORS

113           Aaron Coburn <acoburn@apache.org>
114

CONTRIBUTORS

116           Maciej Ceglowski <developer@ceglowski.com>
117           Eric Nichols, Nara Institute of Science and Technology
118
120           Copyright 2003-2010 Aaron Coburn <acoburn@apache.org>
121
122           This program is free software; you can redistribute it and/or modify
123           it under the terms of version 3 of the GNU General Public License as
124           published by the Free Software Foundation.
125
126
127
128perl v5.38.0                      2023-07-20                         Tagger(3)
Impressum