1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4
6 Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7 a time
8
10 my $iter = new Mail::SpamAssassin::ArchiveIterator(
11 {
12 'opt_all' => 1,
13 'opt_cache' => 1,
14 }
15 );
16
17 $iter->set_functions( \&wanted, sub { } );
18
19 eval { $iter->run(@ARGV); };
20
21 sub wanted {
22 my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25 ...
26 }
27
29 The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30 mbox files, mbx files, and directories (with a single message per file)
31 and generate a list of messages. It will then call the "wanted_sub"
32 and "result_sub" functions appropriately per message.
33
35 $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... }
36 ] )
37 Constructs a new "Mail::SpamAssassin::ArchiveIterator" object. You
38 may pass the following attribute-value pairs to the constructor.
39 The pairs are optional unless otherwise noted.
40
41 opt_all
42 Typically messages over 250k are skipped by ArchiveIterator.
43 Use this option to keep from skipping messages based on size.
44
45 opt_scanprob
46 Randomly select messages to scan, with a probability of N,
47 where N ranges from 0.0 (no messages scanned) to 1.0 (all
48 messages scanned). Default is 1.0.
49
50 This setting can be specified separately for each target.
51
52 opt_before
53 Only use messages which are received after the given time_t
54 value. Negative values are an offset from the current time,
55 e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
56 (e.g. '-6 months')
57
58 This setting can be specified separately for each target.
59
60 opt_after
61 Same as opt_before, except the messages are only used if after
62 the given time_t value.
63
64 This setting can be specified separately for each target.
65
66 opt_want_date
67 Set to 1 (default) if you want the received date to be filled
68 in in the "wanted_sub" callback below. Set this to 0 to avoid
69 this; it's a good idea to set this to 0 if you can, as it
70 imposes a performance hit.
71
72 opt_skip_empty_messages
73 Set to 1 if you want to skip corrupt, 0-byte messages. The
74 default is 0.
75
76 opt_cache
77 Set to 0 (default) if you don't want to use cached information
78 to help speed up ArchiveIterator. Set to 1 to enable. This
79 setting requires "opt_cachedir" also be set.
80
81 opt_cachedir
82 Set to the path of a directory where you wish to store cached
83 information for "opt_cache", if you don't want to mix them with
84 the input files (as is the default). The directory must be
85 both readable and writable.
86
87 wanted_sub
88 Reference to a subroutine which will process message data.
89 Usually set via set_functions(). The routine will be passed 5
90 values: class (scalar), filename (scalar), received date
91 (scalar), message content (array reference, one message line
92 per element), and the message format key ('f' for file, 'm' for
93 mbox, 'b' for mbx).
94
95 Note that if "opt_want_date" is set to 0, the received date
96 scalar will be undefined.
97
98 result_sub
99 Reference to a subroutine which will process the results of the
100 wanted_sub for each message processed. Usually set via
101 set_functions(). The routine will be passed 3 values: class
102 (scalar), result (scalar, returned from wanted_sub), and
103 received date (scalar).
104
105 Note that if "opt_want_date" is set to 0, the received date
106 scalar will be undefined.
107
108 scan_progress_sub
109 Reference to a subroutine which will be called intermittently
110 during the 'scan' phase of the mass-check. No guarantees are
111 made as to how frequently this may happen, mind you.
112
113 set_functions( \&wanted_sub, \&result_sub )
114 Sets the subroutines used for message processing (wanted_sub), and
115 result reporting. For more information, see new() above.
116
117 run ( @target_paths )
118 Generates the list of messages to process, then runs each message
119 through the configured wanted subroutine. Files which have a name
120 ending in ".gz" or ".bz2" will be properly uncompressed via call to
121 "gzip -dc" and "bzip2 -dc" respectively.
122
123 The target_paths array is expected to be either one element per
124 path in the following format: "class:format:raw_location", or a
125 hash reference containing key-value option pairs and a 'target' key
126 with a value in that format.
127
128 The key-value option pairs that can be used are: opt_scanprob,
129 opt_after, opt_before. See the constructor method's documentation
130 for more information on their effects.
131
132 run() returns 0 if there was an error (can't open a file, etc,) and
133 1 if there were no errors.
134
135 class
136 Either 'h' for ham or 's' for spam. If the class is longer
137 than 1 character, it will be truncated. If blank, 'h' is
138 default.
139
140 format
141 Specifies the format of the raw_location. "dir" is a directory
142 whose files are individual messages, "file" a file with a
143 single message, "mbox" an mbox formatted file, or "mbx" for an
144 mbx formatted directory.
145
146 "detect" can also be used. This assumes "mbox" for any file
147 whose path contains the pattern "/\.mbox/i", "file" anything
148 that is not a directory, or "directory" otherwise.
149
150 raw_location
151 Path to file or directory. File globbing is allowed using the
152 standard csh-style globbing (see "perldoc -f glob"). "~" at
153 the front of the value will be replaced by the "HOME"
154 environment variable. Escaped whitespace is protected as well.
155
156 NOTE: "~user" is not allowed.
157
158 NOTE 2: "-" is not allowed as a raw location. To have
159 ArchiveIterator deal with STDIN, generate a temp file.
160
162 "Mail::SpamAssassin" "spamassassin" "mass-check"
163
164
165
166perl v5.10.1 2010-03M-a1i6l::SpamAssassin::ArchiveIterator(3)