1AI::Categorizer::CollecUtsieorn(C3o)ntributed Perl DocumAeIn:t:aCtaitoengorizer::Collection(3)
2
3
4

NAME

6       AI::Categorizer::Collection - Access stored documents
7

SYNOPSIS

9         my $c = new AI::Categorizer::Collection::Files
10           (path => '/tmp/docs/training',
11            category_file => '/tmp/docs/cats.txt');
12         print "Total number of docs: ", $c->count_documents, "\n";
13         while (my $document = $c->next) {
14           ...
15         }
16         $c->rewind; # For further operations
17

DESCRIPTION

19       This abstract class implements an iterator for accessing documents in
20       their natively stored format.  You cannot directly create an instance
21       of the Collection class, because it is abstract - see the documentation
22       for the "Files", "SingleFile", or "InMemory" subclasses for a concrete
23       interface.
24

METHODS

26       new()
27           Creates a new Collection object and returns it.  Accepts the
28           following parameters:
29
30           category_hash
31               Indicates a reference to a hash which maps document names to
32               category names.  The keys of the hash are the document names,
33               each value should be a reference to an array containing the
34               names of the categories to which each document belongs.
35
36           category_file
37               Indicates a file which should be read in order to create the
38               "category_hash".  Each line of the file should list a
39               document's name, followed by a list of category names, all
40               separated by whitespace.
41
42           stopword_file
43               Specifies a file containing a list of "stopwords", which are
44               words that should automatically be disregarded when
45               scanning/reading documents.  The file should contain one word
46               per line.  The file will be parsed and then fed as the
47               "stopwords" parameter to the Document "new()" method.
48
49           verbose
50               If true, some status/debugging information will be printed to
51               "STDOUT" during operation.
52
53           document_class
54               The class indicating what type of Document object should be
55               created.  This generally specifies the format that the
56               documents are stored in.  The default is
57               "AI::Categorizer::Document::Text".
58
59       next()
60           Returns the next Document object in the Collection.
61
62       rewind()
63           Resets the iterator for further calls to "next()".
64
65       count_documents()
66           Returns the total number of documents in the Collection.  Note that
67           this usually resets the iterator.  This is because it may not be
68           possible to resume iterating where we left off.
69

AUTHOR

71       Ken Williams, ken@mathforum.org
72
74       Copyright 2002-2003 Ken Williams.  All rights reserved.
75
76       This library is free software; you can redistribute it and/or modify it
77       under the same terms as Perl itself.
78

SEE ALSO

80       AI::Categorizer(3), Storable(3)
81
82
83
84perl v5.30.0                      2019-07-26    AI::Categorizer::Collection(3)
Impressum