1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4

NAME

6       Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7       a time
8

SYNOPSIS

10         my $iter = new Mail::SpamAssassin::ArchiveIterator(
11           {
12             'opt_all'   => 1,
13             'opt_cache' => 1,
14           }
15         );
16
17         $iter->set_functions( \&wanted, sub { } );
18
19         eval { $iter->run(@ARGV); };
20
21         sub wanted {
22           my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25           ...
26         }
27

DESCRIPTION

29       The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30       mbox files, mbx files, and directories (with a single message per file)
31       and generate a list of messages.  It will then call the "wanted_sub"
32       and "result_sub" functions appropriately per message.
33

METHODS

35       $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... }
36       ] )
37           Constructs a new "Mail::SpamAssassin::ArchiveIterator" object.  You
38           may pass the following attribute-value pairs to the constructor.
39           The pairs are optional unless otherwise noted.
40
41           opt_all
42               Typically messages over 250k are skipped by ArchiveIterator.
43               Use this option to keep from skipping messages based on size.
44
45           opt_scanprob
46               Randomly select messages to scan, with a probability of N,
47               where N ranges from 0.0 (no messages scanned) to 1.0 (all
48               messages scanned).  Default is 1.0.
49
50               This setting can be specified separately for each target.
51
52           opt_before
53               Only use messages which are received after the given time_t
54               value.  Negative values are an offset from the current time,
55               e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
56               (e.g. '-6 months')
57
58               This setting can be specified separately for each target.
59
60           opt_after
61               Same as opt_before, except the messages are only used if after
62               the given time_t value.
63
64               This setting can be specified separately for each target.
65
66           opt_want_date
67               Set to 1 (default) if you want the received date to be filled
68               in in the "wanted_sub" callback below.  Set this to 0 to avoid
69               this; it's a good idea to set this to 0 if you can, as it
70               imposes a performance hit.
71
72           opt_skip_empty_messages
73               Set to 1 if you want to skip corrupt, 0-byte messages.  The
74               default is 0.
75
76           opt_cache
77               Set to 0 (default) if you don't want to use cached information
78               to help speed up ArchiveIterator.  Set to 1 to enable.  This
79               setting requires "opt_cachedir" also be set.
80
81           opt_cachedir
82               Set to the path of a directory where you wish to store cached
83               information for "opt_cache", if you don't want to mix them with
84               the input files (as is the default).  The directory must be
85               both readable and writable.
86
87           wanted_sub
88               Reference to a subroutine which will process message data.
89               Usually set via set_functions().  The routine will be passed 5
90               values: class (scalar), filename (scalar), received date
91               (scalar), message content (array reference, one message line
92               per element), and the message format key ('f' for file, 'm' for
93               mbox, 'b' for mbx).
94
95               Note that if "opt_want_date" is set to 0, the received date
96               scalar will be undefined.
97
98           result_sub
99               Reference to a subroutine which will process the results of the
100               wanted_sub for each message processed.  Usually set via
101               set_functions().  The routine will be passed 3 values: class
102               (scalar), result (scalar, returned from wanted_sub), and
103               received date (scalar).
104
105               Note that if "opt_want_date" is set to 0, the received date
106               scalar will be undefined.
107
108           scan_progress_sub
109               Reference to a subroutine which will be called intermittently
110               during the 'scan' phase of the mass-check.  No guarantees are
111               made as to how frequently this may happen, mind you.
112
113       set_functions( \&wanted_sub, \&result_sub )
114           Sets the subroutines used for message processing (wanted_sub), and
115           result reporting.  For more information, see new() above.
116
117       run ( @target_paths )
118           Generates the list of messages to process, then runs each message
119           through the configured wanted subroutine.  Files which have a name
120           ending in ".gz" or ".bz2" will be properly uncompressed via call to
121           "gzip -dc" and "bzip2 -dc" respectively.
122
123           The target_paths array is expected to be either one element per
124           path in the following format: "class:format:raw_location", or a
125           hash reference containing key-value option pairs and a 'target' key
126           with a value in that format.
127
128           The key-value option pairs that can be used are: opt_scanprob,
129           opt_after, opt_before.  See the constructor method's documentation
130           for more information on their effects.
131
132           run() returns 0 if there was an error (can't open a file, etc,) and
133           1 if there were no errors.
134
135           class
136               Either 'h' for ham or 's' for spam.  If the class is longer
137               than 1 character, it will be truncated.  If blank, 'h' is
138               default.
139
140           format
141               Specifies the format of the raw_location.  "dir" is a directory
142               whose files are individual messages, "file" a file with a
143               single message, "mbox" an mbox formatted file, or "mbx" for an
144               mbx formatted directory.
145
146               "detect" can also be used.  This assumes "mbox" for any file
147               whose path contains the pattern "/\.mbox/i", "file" anything
148               that is not a directory, or "directory" otherwise.
149
150           raw_location
151               Path to file or directory.  File globbing is allowed using the
152               standard csh-style globbing (see "perldoc -f glob").  "~" at
153               the front of the value will be replaced by the "HOME"
154               environment variable.  Escaped whitespace is protected as well.
155
156               NOTE: "~user" is not allowed.
157
158               NOTE 2: "-" is not allowed as a raw location.  To have
159               ArchiveIterator deal with STDIN, generate a temp file.
160

SEE ALSO

162       "Mail::SpamAssassin" "spamassassin" "mass-check"
163
164
165
166perl v5.10.1                      2010-03M-a1i6l::SpamAssassin::ArchiveIterator(3)
Impressum