1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4
6 Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7 a time
8
10 my $iter = new Mail::SpamAssassin::ArchiveIterator(
11 {
12 'opt_max_size' => 256 * 1024, # 0 implies no limit
13 'opt_cache' => 1,
14 }
15 );
16
17 $iter->set_functions( \&wanted, sub { } );
18
19 eval { $iter->run(@ARGV); };
20
21 sub wanted {
22 my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25 ...
26 }
27
29 The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30 mbox files, mbx files, and directories (with a single message per file)
31 and generate a list of messages. It will then call the "wanted_sub"
32 and "result_sub" functions appropriately per message.
33
35 $item = new Mail::SpamAssassin::ArchiveIterator( [ { opt => val, ... }
36 ] )
37 Constructs a new "Mail::SpamAssassin::ArchiveIterator" object. You
38 may pass the following attribute-value pairs to the constructor.
39 The pairs are optional unless otherwise noted.
40
41 opt_max_size
42 A value of option opt_max_size determines a limit (number of
43 bytes) beyond which a message is considered large and is
44 skipped by ArchiveIterator.
45
46 A value 0 implies no size limit, all messages are examined. An
47 undefined value implies a default limit of 256 KiB.
48
49 opt_all
50 Setting this option to true implicitly sets opt_max_size to 0,
51 i.e. no limit of a message size, all messages are processes by
52 ArchiveIterator. For compatibility with SpamAssassin versions
53 older than 3.4.0 which lacked option opt_max_size.
54
55 opt_scanprob
56 Randomly select messages to scan, with a probability of N,
57 where N ranges from 0.0 (no messages scanned) to 1.0 (all
58 messages scanned). Default is 1.0.
59
60 This setting can be specified separately for each target.
61
62 opt_before
63 Only use messages which are received after the given time_t
64 value. Negative values are an offset from the current time,
65 e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
66 (e.g. '-6 months')
67
68 This setting can be specified separately for each target.
69
70 opt_after
71 Same as opt_before, except the messages are only used if after
72 the given time_t value.
73
74 This setting can be specified separately for each target.
75
76 opt_want_date
77 Set to 1 (default) if you want the received date to be filled
78 in in the "wanted_sub" callback below. Set this to 0 to avoid
79 this; it's a good idea to set this to 0 if you can, as it
80 imposes a performance hit.
81
82 opt_skip_empty_messages
83 Set to 1 if you want to skip corrupt, 0-byte messages. The
84 default is 0.
85
86 opt_cache
87 Set to 0 (default) if you don't want to use cached information
88 to help speed up ArchiveIterator. Set to 1 to enable. This
89 setting requires "opt_cachedir" also be set.
90
91 opt_cachedir
92 Set to the path of a directory where you wish to store cached
93 information for "opt_cache", if you don't want to mix them with
94 the input files (as is the default). The directory must be
95 both readable and writable.
96
97 wanted_sub
98 Reference to a subroutine which will process message data.
99 Usually set via set_functions(). The routine will be passed 5
100 values: class (scalar), filename (scalar), received date
101 (scalar), message content (array reference, one message line
102 per element), and the message format key ('f' for file, 'm' for
103 mbox, 'b' for mbx).
104
105 Note that if "opt_want_date" is set to 0, the received date
106 scalar will be undefined.
107
108 result_sub
109 Reference to a subroutine which will process the results of the
110 wanted_sub for each message processed. Usually set via
111 set_functions(). The routine will be passed 3 values: class
112 (scalar), result (scalar, returned from wanted_sub), and
113 received date (scalar).
114
115 Note that if "opt_want_date" is set to 0, the received date
116 scalar will be undefined.
117
118 scan_progress_sub
119 Reference to a subroutine which will be called intermittently
120 during the 'scan' phase of the mass-check. No guarantees are
121 made as to how frequently this may happen, mind you.
122
123 opt_from_regex
124 This setting allows for flexibility in specifying the mbox
125 format From separator.
126
127 It defaults to the regular expression:
128
129 /^From \S+ ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d
130 \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
131
132 Some SpamAssassin programs such as sa-learn will use the
133 configuration option 'mbox_format_from_regex' to override the
134 default regular expression.
135
136 set_functions( \&wanted_sub, \&result_sub )
137 Sets the subroutines used for message processing (wanted_sub), and
138 result reporting. For more information, see new() above.
139
140 run ( @target_paths )
141 Generates the list of messages to process, then runs each message
142 through the configured wanted subroutine. Files which have a name
143 ending in ".gz" or ".bz2" will be properly uncompressed via call to
144 "gzip -dc" and "bzip2 -dc" respectively.
145
146 The target_paths array is expected to be either one element per
147 path in the following format: "class:format:raw_location", or a
148 hash reference containing key-value option pairs and a 'target' key
149 with a value in that format.
150
151 The key-value option pairs that can be used are: opt_scanprob,
152 opt_after, opt_before. See the constructor method's documentation
153 for more information on their effects.
154
155 run() returns 0 if there was an error (can't open a file, etc,) and
156 1 if there were no errors.
157
158 class
159 Either 'h' for ham or 's' for spam. If the class is longer
160 than 1 character, it will be truncated. If blank, 'h' is
161 default.
162
163 format
164 Specifies the format of the raw_location. "dir" is a directory
165 whose files are individual messages, "file" a file with a
166 single message, "mbox" an mbox formatted file, or "mbx" for an
167 mbx formatted directory.
168
169 "detect" can also be used. This assumes "mbox" for any file
170 whose path contains the pattern "/\.mbox/i", "file" anything
171 that is not a directory, or "directory" otherwise.
172
173 raw_location
174 Path to file or directory. File globbing is allowed using the
175 standard csh-style globbing (see "perldoc -f glob"). "~" at
176 the front of the value will be replaced by the "HOME"
177 environment variable. Escaped whitespace is protected as well.
178
179 NOTE: "~user" is not allowed.
180
181 NOTE 2: "-" is not allowed as a raw location. To have
182 ArchiveIterator deal with STDIN, generate a temp file.
183
185 Mail::SpamAssassin(3) spamassassin(1) mass-check(1)
186
187
188
189perl v5.32.1 2021-03M-a2i5l::SpamAssassin::ArchiveIterator(3)