1AI::Categorizer::CollecUtsieorn(C3o)ntributed Perl DocumAeIn:t:aCtaitoengorizer::Collection(3)
2
3
4
6 AI::Categorizer::Collection - Access stored documents
7
9 my $c = new AI::Categorizer::Collection::Files
10 (path => '/tmp/docs/training',
11 category_file => '/tmp/docs/cats.txt');
12 print "Total number of docs: ", $c->count_documents, "\n";
13 while (my $document = $c->next) {
14 ...
15 }
16 $c->rewind; # For further operations
17
19 This abstract class implements an iterator for accessing documents in
20 their natively stored format. You cannot directly create an instance
21 of the Collection class, because it is abstract - see the documentation
22 for the "Files", "SingleFile", or "InMemory" subclasses for a concrete
23 interface.
24
26 new()
27 Creates a new Collection object and returns it. Accepts the
28 following parameters:
29
30 category_hash
31 Indicates a reference to a hash which maps document names to
32 category names. The keys of the hash are the document names,
33 each value should be a reference to an array containing the
34 names of the categories to which each document belongs.
35
36 category_file
37 Indicates a file which should be read in order to create the
38 "category_hash". Each line of the file should list a
39 document's name, followed by a list of category names, all
40 separated by whitespace.
41
42 stopword_file
43 Specifies a file containing a list of "stopwords", which are
44 words that should automatically be disregarded when
45 scanning/reading documents. The file should contain one word
46 per line. The file will be parsed and then fed as the
47 "stopwords" parameter to the Document "new()" method.
48
49 verbose
50 If true, some status/debugging information will be printed to
51 "STDOUT" during operation.
52
53 document_class
54 The class indicating what type of Document object should be
55 created. This generally specifies the format that the
56 documents are stored in. The default is
57 "AI::Categorizer::Document::Text".
58
59 next()
60 Returns the next Document object in the Collection.
61
62 rewind()
63 Resets the iterator for further calls to "next()".
64
65 count_documents()
66 Returns the total number of documents in the Collection. Note that
67 this usually resets the iterator. This is because it may not be
68 possible to resume iterating where we left off.
69
71 Ken Williams, ken@mathforum.org
72
74 Copyright 2002-2003 Ken Williams. All rights reserved.
75
76 This library is free software; you can redistribute it and/or modify it
77 under the same terms as Perl itself.
78
80 AI::Categorizer(3), Storable(3)
81
82
83
84perl v5.30.0 2019-07-26 AI::Categorizer::Collection(3)