1Mail::SpamAssassin::ArcUhsievreICtoenrtartiobru(t3e)d PeMralilD:o:cSupmaemnAtsastaisosnin::ArchiveIterator(3)
2
3
4
6 Mail::SpamAssassin::ArchiveIterator - find and process messages one at
7 a time
8
10 my $iter = Mail::SpamAssassin::ArchiveIterator->new(
11 {
12 'opt_max_size' => 500 * 1024, # 0 implies no limit
13 'opt_cache' => 1,
14 }
15 );
16
17 $iter->set_functions( \&wanted, sub { } );
18
19 eval { $iter->run(@ARGV); };
20
21 sub wanted {
22 my($class, $filename, $recv_date, $msg_array) = @_;
23
24
25 ...
26 }
27
29 The Mail::SpamAssassin::ArchiveIterator module will go through a set of
30 mbox files, mbx files, and directories (with a single message per file)
31 and generate a list of messages. It will then call the "wanted_sub"
32 and "result_sub" functions appropriately per message.
33
35 $item = Mail::SpamAssassin::ArchiveIterator->new( [ { opt => val, ... }
36 ] )
37 Constructs a new "Mail::SpamAssassin::ArchiveIterator" object. You
38 may pass the following attribute-value pairs to the constructor.
39 The pairs are optional unless otherwise noted.
40
41 opt_max_size
42 A value of option opt_max_size determines a limit (number of
43 bytes) beyond which a message is considered large and is
44 skipped by ArchiveIterator.
45
46 A value 0 implies no size limit, all messages are examined. An
47 undefined value implies a default limit of 500 KiB.
48
49 opt_all
50 Setting this option to true implicitly sets opt_max_size to 0,
51 i.e. no limit of a message size, all messages are processes by
52 ArchiveIterator. For compatibility with SpamAssassin versions
53 older than 3.4.0 which lacked option opt_max_size.
54
55 opt_scanprob
56 Randomly select messages to scan, with a probability of N,
57 where N ranges from 0.0 (no messages scanned) to 1.0 (all
58 messages scanned). Default is 1.0.
59
60 This setting can be specified separately for each target.
61
62 opt_before
63 Only use messages which are received after the given time_t
64 value. Negative values are an offset from the current time,
65 e.g. -86400 = last 24 hours; or as parsed by Time::ParseDate
66 (e.g. '-6 months')
67
68 This setting can be specified separately for each target.
69
70 opt_after
71 Same as opt_before, except the messages are only used if after
72 the given time_t value.
73
74 This setting can be specified separately for each target.
75
76 opt_want_date
77 Set to 1 (default) if you want the received date to be filled
78 in in the "wanted_sub" callback below. Set this to 0 to avoid
79 this; it's a good idea to set this to 0 if you can, as it
80 imposes a performance hit.
81
82 opt_skip_empty_messages
83 Set to 1 if you want to skip corrupt, 0-byte messages. The
84 default is 0.
85
86 opt_cache
87 Set to 0 (default) if you don't want to use cached information
88 to help speed up ArchiveIterator. Set to 1 to enable. This
89 setting requires "opt_cachedir" also be set.
90
91 opt_cachedir
92 Set to the path of a directory where you wish to store cached
93 information for "opt_cache", if you don't want to mix them with
94 the input files (as is the default). The directory must be
95 both readable and writable.
96
97 wanted_sub
98 Reference to a subroutine which will process message data.
99 Usually set via set_functions(). The routine will be passed 5
100 values: class (scalar), filename (scalar), received date
101 (scalar), message content (array reference, one message line
102 per element), and the message format key ('f' for file, 'm' for
103 mbox, 'b' for mbx).
104
105 Note that if "opt_want_date" is set to 0, the received date
106 scalar will be undefined.
107
108 result_sub
109 Reference to a subroutine which will process the results of the
110 wanted_sub for each message processed. Usually set via
111 set_functions(). The routine will be passed 3 values: class
112 (scalar), result (scalar, returned from wanted_sub), and
113 received date (scalar).
114
115 Note that if "opt_want_date" is set to 0, the received date
116 scalar will be undefined.
117
118 scan_progress_sub
119 Reference to a subroutine which will be called intermittently
120 during the 'scan' phase of the mass-check. No guarantees are
121 made as to how frequently this may happen, mind you.
122
123 opt_from_regex
124 This setting allows for flexibility in specifying the mbox
125 format From separator.
126
127 It defaults to the regular expression:
128
129 /^From \S+ ?(\S\S\S \S\S\S .?\d .?\d:\d\d:\d\d
130 \d{4}|.?\d-\d\d-\d{4}_\d\d:\d\d:\d\d_)/
131
132 Some SpamAssassin programs such as sa-learn will use the
133 configuration option 'mbox_format_from_regex' to override the
134 default regular expression.
135
136 set_functions( \&wanted_sub, \&result_sub )
137 Sets the subroutines used for message processing (wanted_sub), and
138 result reporting. For more information, see new() above.
139
140 run ( @target_paths )
141 Generates the list of messages to process, then runs each message
142 through the configured wanted subroutine.
143
144 Compressed files are detected and uncompressed automatically
145 regardless of file extension. Supported formats are "gzip",
146 "bzip2", "xz", "lz4", "lzip", "lzo". Gzip is uncompressed via
147 IO::Zlib module, others use their specific command line tool
148 (bzip2/xz/lz4/lzip/lzop). Compressed mailbox/mbox files are not
149 supported.
150
151 The target_paths array is expected to be either one element per
152 path in the following format: "class:format:raw_location", or a
153 hash reference containing key-value option pairs and a 'target' key
154 with a value in that format.
155
156 The key-value option pairs that can be used are: opt_scanprob,
157 opt_after, opt_before. See the constructor method's documentation
158 for more information on their effects.
159
160 run() returns 0 if there was an error (can't open a file, etc,) and
161 1 if there were no errors.
162
163 class
164 Either 'h' for ham or 's' for spam. If the class is longer
165 than 1 character, it will be truncated. If blank, 'h' is
166 default.
167
168 format
169 Specifies the format of the raw_location. "dir" is a directory
170 whose files are individual messages, "file" a file with a
171 single message, "mbox" an mbox formatted file, or "mbx" for an
172 mbx formatted directory.
173
174 "detect" can also be used. This assumes "mbox" for any file
175 whose path contains the pattern "/\.mbox/i", "file" anything
176 that is not a directory, or "directory" otherwise.
177
178 raw_location
179 Path to file or directory. File globbing is allowed using the
180 standard csh-style globbing (see "perldoc -f glob"). "~" at
181 the front of the value will be replaced by the "HOME"
182 environment variable. Escaped whitespace is protected as well.
183
184 NOTE: "~user" is not allowed.
185
186 NOTE 2: "-" is not allowed as a raw location. To have
187 ArchiveIterator deal with STDIN, generate a temp file.
188
190 Mail::SpamAssassin(3) spamassassin(1) mass-check(1)
191
192
193
194perl v5.36.0 2023-01M-a2i1l::SpamAssassin::ArchiveIterator(3)