1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4

NAME

6       Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7       a time
8

SYNOPSIS

10         my $iter = Mail::SpamAssassin::ArchiveIterator->new(
11           {
12             'opt_max_size' => 500 * 1024,  # 0 implies no limit
13             'opt_cache' => 1,
14           }
15         );
16
17         $iter->set_functions( \&wanted, sub { } );
18
19         eval { $iter->run(@ARGV); };
20
21         sub wanted {
22           my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25           ...
26         }
27

DESCRIPTION

29       The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30       mbox files, mbx files, and directories (with a single message per file)
31       and generate a list of messages.  It will then call the "wanted_sub"
32       and "result_sub" functions appropriately per message.
33

METHODS

35       $item = Mail::SpamAssassin::ArchiveIterator->new( [ { opt => val, ... }
36       ] )
37           Constructs a new "Mail::SpamAssassin::ArchiveIterator" object.  You
38           may pass the following attribute-value pairs to the constructor.
39           The pairs are optional unless otherwise noted.
40
41           opt_max_size
42               A value of option opt_max_size determines a limit (number of
43               bytes) beyond which a message is considered large and is
44               skipped by ArchiveIterator.
45
46               A value 0 implies no size limit, all messages are examined. An
47               undefined value implies a default limit of 500 KiB.
48
49           opt_all
50               Setting this option to true implicitly sets opt_max_size to 0,
51               i.e.  no limit of a message size, all messages are processes by
52               ArchiveIterator.  For compatibility with SpamAssassin versions
53               older than 3.4.0 which lacked option opt_max_size.
54
55           opt_scanprob
56               Randomly select messages to scan, with a probability of N,
57               where N ranges from 0.0 (no messages scanned) to 1.0 (all
58               messages scanned).  Default is 1.0.
59
60               This setting can be specified separately for each target.
61
62           opt_before
63               Only use messages which are received after the given time_t
64               value.  Negative values are an offset from the current time,
65               e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
66               (e.g. '-6 months')
67
68               This setting can be specified separately for each target.
69
70           opt_after
71               Same as opt_before, except the messages are only used if after
72               the given time_t value.
73
74               This setting can be specified separately for each target.
75
76           opt_want_date
77               Set to 1 (default) if you want the received date to be filled
78               in in the "wanted_sub" callback below.  Set this to 0 to avoid
79               this; it's a good idea to set this to 0 if you can, as it
80               imposes a performance hit.
81
82           opt_skip_empty_messages
83               Set to 1 if you want to skip corrupt, 0-byte messages.  The
84               default is 0.
85
86           opt_cache
87               Set to 0 (default) if you don't want to use cached information
88               to help speed up ArchiveIterator.  Set to 1 to enable.  This
89               setting requires "opt_cachedir" also be set.
90
91           opt_cachedir
92               Set to the path of a directory where you wish to store cached
93               information for "opt_cache", if you don't want to mix them with
94               the input files (as is the default).  The directory must be
95               both readable and writable.
96
97           wanted_sub
98               Reference to a subroutine which will process message data.
99               Usually set via set_functions().  The routine will be passed 5
100               values: class (scalar), filename (scalar), received date
101               (scalar), message content (array reference, one message line
102               per element), and the message format key ('f' for file, 'm' for
103               mbox, 'b' for mbx).
104
105               Note that if "opt_want_date" is set to 0, the received date
106               scalar will be undefined.
107
108           result_sub
109               Reference to a subroutine which will process the results of the
110               wanted_sub for each message processed.  Usually set via
111               set_functions().  The routine will be passed 3 values: class
112               (scalar), result (scalar, returned from wanted_sub), and
113               received date (scalar).
114
115               Note that if "opt_want_date" is set to 0, the received date
116               scalar will be undefined.
117
118           scan_progress_sub
119               Reference to a subroutine which will be called intermittently
120               during the 'scan' phase of the mass-check.  No guarantees are
121               made as to how frequently this may happen, mind you.
122
123           opt_from_regex
124               This setting allows for flexibility in specifying the mbox
125               format From separator.
126
127               It defaults to the regular expression:
128
129               /^From \S+  ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d
130               \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
131
132               Some SpamAssassin programs such as sa-learn will use the
133               configuration option 'mbox_format_from_regex' to override the
134               default regular expression.
135
136       set_functions( \&wanted_sub, \&result_sub )
137           Sets the subroutines used for message processing (wanted_sub), and
138           result reporting.  For more information, see new() above.
139
140       run ( @target_paths )
141           Generates the list of messages to process, then runs each message
142           through the configured wanted subroutine.
143
144           Compressed files are detected and uncompressed automatically
145           regardless of file extension.  Supported formats are "gzip",
146           "bzip2", "xz", "lz4", "lzip", "lzo".  Gzip is uncompressed via
147           IO::Zlib module, others use their specific command line tool
148           (bzip2/xz/lz4/lzip/lzop).  Compressed mailbox/mbox files are not
149           supported.
150
151           The target_paths array is expected to be either one element per
152           path in the following format: "class:format:raw_location", or a
153           hash reference containing key-value option pairs and a 'target' key
154           with a value in that format.
155
156           The key-value option pairs that can be used are: opt_scanprob,
157           opt_after, opt_before.  See the constructor method's documentation
158           for more information on their effects.
159
160           run() returns 0 if there was an error (can't open a file, etc,) and
161           1 if there were no errors.
162
163           class
164               Either 'h' for ham or 's' for spam.  If the class is longer
165               than 1 character, it will be truncated.  If blank, 'h' is
166               default.
167
168           format
169               Specifies the format of the raw_location.  "dir" is a directory
170               whose files are individual messages, "file" a file with a
171               single message, "mbox" an mbox formatted file, or "mbx" for an
172               mbx formatted directory.
173
174               "detect" can also be used.  This assumes "mbox" for any file
175               whose path contains the pattern "/\.mbox/i", "file" anything
176               that is not a directory, or "directory" otherwise.
177
178           raw_location
179               Path to file or directory.  File globbing is allowed using the
180               standard csh-style globbing (see "perldoc -f glob").  "~" at
181               the front of the value will be replaced by the "HOME"
182               environment variable.  Escaped whitespace is protected as well.
183
184               NOTE: "~user" is not allowed.
185
186               NOTE 2: "-" is not allowed as a raw location.  To have
187               ArchiveIterator deal with STDIN, generate a temp file.
188

SEE ALSO

190       Mail::SpamAssassin(3) spamassassin(1) mass-check(1)
191
192
193
194perl v5.36.0                      2023-01M-a2i1l::SpamAssassin::ArchiveIterator(3)
Impressum