1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4

NAME

6       Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7       a time
8

SYNOPSIS

10         my $iter = new Mail::SpamAssassin::ArchiveIterator(
11           {
12             'opt_max_size' => 256 * 1024,  # 0 implies no limit
13             'opt_cache' => 1,
14           }
15         );
16
17         $iter->set_functions( \&wanted, sub { } );
18
19         eval { $iter->run(@ARGV); };
20
21         sub wanted {
22           my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25           ...
26         }
27

DESCRIPTION

29       The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30       mbox files, mbx files, and directories (with a single message per file)
31       and generate a list of messages.  It will then call the "wanted_sub"
32       and "result_sub" functions appropriately per message.
33

METHODS

35       $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... }
36       ] )
37           Constructs a new "Mail::SpamAssassin::ArchiveIterator" object.  You
38           may pass the following attribute-value pairs to the constructor.
39           The pairs are optional unless otherwise noted.
40
41           opt_max_size
42               A value of option opt_max_size determines a limit (number of
43               bytes) beyond which a message is considered large and is
44               skipped by ArchiveIterator.
45
46               A value 0 implies no size limit, all messages are examined. An
47               undefined value implies a default limit of 256 KiB.
48
49           opt_all
50               Setting this option to true implicitly sets opt_max_size to 0,
51               i.e.  no limit of a message size, all messages are processes by
52               ArchiveIterator.  For compatibility with SpamAssassin versions
53               older than 3.4.0 which lacked option opt_max_size.
54
55           opt_scanprob
56               Randomly select messages to scan, with a probability of N,
57               where N ranges from 0.0 (no messages scanned) to 1.0 (all
58               messages scanned).  Default is 1.0.
59
60               This setting can be specified separately for each target.
61
62           opt_before
63               Only use messages which are received after the given time_t
64               value.  Negative values are an offset from the current time,
65               e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
66               (e.g. '-6 months')
67
68               This setting can be specified separately for each target.
69
70           opt_after
71               Same as opt_before, except the messages are only used if after
72               the given time_t value.
73
74               This setting can be specified separately for each target.
75
76           opt_want_date
77               Set to 1 (default) if you want the received date to be filled
78               in in the "wanted_sub" callback below.  Set this to 0 to avoid
79               this; it's a good idea to set this to 0 if you can, as it
80               imposes a performance hit.
81
82           opt_skip_empty_messages
83               Set to 1 if you want to skip corrupt, 0-byte messages.  The
84               default is 0.
85
86           opt_cache
87               Set to 0 (default) if you don't want to use cached information
88               to help speed up ArchiveIterator.  Set to 1 to enable.  This
89               setting requires "opt_cachedir" also be set.
90
91           opt_cachedir
92               Set to the path of a directory where you wish to store cached
93               information for "opt_cache", if you don't want to mix them with
94               the input files (as is the default).  The directory must be
95               both readable and writable.
96
97           wanted_sub
98               Reference to a subroutine which will process message data.
99               Usually set via set_functions().  The routine will be passed 5
100               values: class (scalar), filename (scalar), received date
101               (scalar), message content (array reference, one message line
102               per element), and the message format key ('f' for file, 'm' for
103               mbox, 'b' for mbx).
104
105               Note that if "opt_want_date" is set to 0, the received date
106               scalar will be undefined.
107
108           result_sub
109               Reference to a subroutine which will process the results of the
110               wanted_sub for each message processed.  Usually set via
111               set_functions().  The routine will be passed 3 values: class
112               (scalar), result (scalar, returned from wanted_sub), and
113               received date (scalar).
114
115               Note that if "opt_want_date" is set to 0, the received date
116               scalar will be undefined.
117
118           scan_progress_sub
119               Reference to a subroutine which will be called intermittently
120               during the 'scan' phase of the mass-check.  No guarantees are
121               made as to how frequently this may happen, mind you.
122
123           opt_from_regex
124               This setting allows for flexibility in specifying the mbox
125               format From separator.
126
127               It defaults to the regular expression:
128
129               /^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d
130               \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
131
132               Some SpamAssassin programs such as sa-learn will use the
133               configuration option 'mbox_format_from_regex' to override the
134               default regular expression.
135
136       set_functions( \&wanted_sub, \&result_sub )
137           Sets the subroutines used for message processing (wanted_sub), and
138           result reporting.  For more information, see new() above.
139
140       run ( @target_paths )
141           Generates the list of messages to process, then runs each message
142           through the configured wanted subroutine.  Files which have a name
143           ending in ".gz" or ".bz2" will be properly uncompressed via call to
144           "gzip -dc" and "bzip2 -dc" respectively.
145
146           The target_paths array is expected to be either one element per
147           path in the following format: "class:format:raw_location", or a
148           hash reference containing key-value option pairs and a 'target' key
149           with a value in that format.
150
151           The key-value option pairs that can be used are: opt_scanprob,
152           opt_after, opt_before.  See the constructor method's documentation
153           for more information on their effects.
154
155           run() returns 0 if there was an error (can't open a file, etc,) and
156           1 if there were no errors.
157
158           class
159               Either 'h' for ham or 's' for spam.  If the class is longer
160               than 1 character, it will be truncated.  If blank, 'h' is
161               default.
162
163           format
164               Specifies the format of the raw_location.  "dir" is a directory
165               whose files are individual messages, "file" a file with a
166               single message, "mbox" an mbox formatted file, or "mbx" for an
167               mbx formatted directory.
168
169               "detect" can also be used.  This assumes "mbox" for any file
170               whose path contains the pattern "/\.mbox/i", "file" anything
171               that is not a directory, or "directory" otherwise.
172
173           raw_location
174               Path to file or directory.  File globbing is allowed using the
175               standard csh-style globbing (see "perldoc -f glob").  "~" at
176               the front of the value will be replaced by the "HOME"
177               environment variable.  Escaped whitespace is protected as well.
178
179               NOTE: "~user" is not allowed.
180
181               NOTE 2: "-" is not allowed as a raw location.  To have
182               ArchiveIterator deal with STDIN, generate a temp file.
183

SEE ALSO

185       Mail::SpamAssassin(3) spamassassin(1) mass-check(1)
186
187
188
189perl v5.32.1                      2021-03M-a2i5l::SpamAssassin::ArchiveIterator(3)
Impressum