1Tagger(3) User Contributed Perl Documentation Tagger(3)
2
3
4
6 Lingua::EN::Tagger - Part-of-speech tagger for English natural language
7 processing.
8
10 # Create a parser object
11 my $p = new Lingua::EN::Tagger;
12
13 # Add part of speech tags to a text
14 my $tagged_text = $p->add_tags( $text );
15
16 ...
17
18 # Get a list of all nouns and noun phrases with occurrence counts
19 my %word_list = $p->get_words( $text );
20
21 ...
22
23 # Get a readable version of the tagged text
24 my $readable_text = $p->get_readable( $text );
25
27 The module is a probability based, corpus-trained tagger that assigns
28 POS tags to English text based on a lookup dictionary and a set of
29 probability values. The tagger assigns appropriate tags based on
30 conditional probabilities - it examines the preceding tag to determine
31 the appropriate tag for the current word. Unknown words are classified
32 according to word morphology or can be set to be treated as nouns or
33 other parts of speech.
34
35 The tagger also extracts as many nouns and noun phrases as it can,
36 using a set of regular expressions.
37
39 new %PARAMS
40 Class constructor. Takes a hash with the following parameters
41 (shown with default values):
42
43 unknown_word_tag => ''
44 Tag to assign to unknown words
45
46 stem => 0
47 Stem single words using Lingua::Stem::EN
48
49 weight_noun_phrases => 0
50 When returning occurrence counts for a noun phrase, multiply
51 the value by the number of words in the NP.
52
53 longest_noun_phrase => 5
54 Will ignore noun phrases longer than this threshold. This
55 affects only the get_words() and get_nouns() methods.
56
57 relax => 0
58 Relax the Hidden Markov Model: this may improve accuracy for
59 uncommon words, particularly words used polysemously
60
62 add_tags TEXT
63 Examine the string provided and return it fully tagged ( XML style
64 )
65
66 get_words TEXT
67 Given a text string, return as many nouns and noun phrases as
68 possible. Applies add_tags and involves three stages:
69
70 * Tag the text
71 * Extract all the maximal noun phrases
72 * Recursively extract all noun phrases from the MNPs
73
74 get_readable TEXT
75 Return an easy-on-the-eyes tagged version of a text string.
76 Applies add_tags and reformats to be easier to read.
77
78 get_sentences TEXT
79 Returns an anonymous array of sentences (without POS tags) from a
80 text.
81
82 get_proper_nouns TAGGED_TEXT
83 Given a POS-tagged text, this method returns a hash of all proper
84 nouns and their occurrence frequencies. The method is greedy and
85 will return multi-word phrases, if possible, so it would find
86 ``Linguistic Data Consortium'' as a single unit, rather than as
87 three individual proper nouns. This method does not stem the found
88 words.
89
90 get_nouns TAGGED_TEXT
91 Given a POS-tagged text, this method returns all nouns and their
92 occurrence frequencies.
93
94 get_max_noun_phrases TAGGED_TEXT
95 Given a POS-tagged text, this method returns only the maximal noun
96 phrases. May be called directly, but is also used by
97 get_noun_phrases
98
99 get_noun_phrases TAGGED_TEXT
100 Similar to get_words, but requires a POS-tagged text as an
101 argument.
102
103 install
104 Reads some included corpus data and saves it in a stored hash on
105 the local file system. This is called automatically if the tagger
106 can't find the stored lexicon.
107
109 Aaron Coburn <aaron@coburncuadrado.com>
110
112 Maciej Ceglowski <developer@ceglowski.com>
113 Eric Nichols, Nara Institute of Science and Technology
114
116 Copyright 2003-2010 Aaron Coburn <aaron@coburncuadrado.com>
117
118 This program is free software; you can redistribute it and/or modify
119 it under the terms of version 3 of the GNU General Public License as
120 published by the Free Software Foundation.
121
122
123
124perl v5.12.2 2010-05-11 Tagger(3)