1UUlib(3) User Contributed Perl Documentation UUlib(3)
2
3
4
6 Convert::UUlib - Perl interface to the uulib library (a.k.a.
7 uudeview/uuenview).
8
10 use Convert::UUlib ':all';
11
12 # read all the files named on the commandline and decode them
13 # into the CURRENT directory. See below for a longer example.
14 LoadFile $_ for @ARGV;
15
16 for my $uu (GetFileList) {
17 if ($uu->state & FILE_OK) {
18 $uu->decode;
19 print $uu->filename, "\n";
20 }
21 }
22
24 Read the file doc/library.pdf from the distribution for in-depth
25 information about the C-library used in this interface, and the rest of
26 this document and especially the non-trivial decoder program at the
27 end.
28
30 Action code constants
31 ACT_IDLE we don't do anything
32 ACT_SCANNING scanning an input file
33 ACT_DECODING decoding into a temp file
34 ACT_COPYING copying temp to target
35 ACT_ENCODING encoding a file
36
37 Message severity levels
38 MSG_MESSAGE just a message, nothing important
39 MSG_NOTE something that should be noticed
40 MSG_WARNING important msg, processing continues
41 MSG_ERROR processing has been terminated
42 MSG_FATAL decoder cannot process further requests
43 MSG_PANIC recovery impossible, app must terminate
44
45 Options
46 OPT_VERSION version number MAJOR.MINORplPATCH (ro)
47 OPT_FAST assumes only one part per file
48 OPT_DUMBNESS switch off the program's intelligence
49 OPT_BRACKPOL give numbers in [] higher precendence
50 OPT_VERBOSE generate informative messages
51 OPT_DESPERATE try to decode incomplete files
52 OPT_IGNREPLY ignore RE:plies (off by default)
53 OPT_OVERWRITE whether it's OK to overwrite ex. files
54 OPT_SAVEPATH prefix to save-files on disk
55 OPT_IGNMODE ignore the original file mode
56 OPT_DEBUG print messages with FILE/LINE info
57 OPT_ERRNO get last error code for RET_IOERR (ro)
58 OPT_PROGRESS retrieve progress information
59 OPT_USETEXT handle text messages
60 OPT_PREAMB handle Mime preambles/epilogues
61 OPT_TINYB64 detect short B64 outside of Mime
62 OPT_ENCEXT extension for single-part encoded files
63 OPT_REMOVE remove input files after decoding (dangerous)
64 OPT_MOREMIME strict MIME adherence
65 OPT_DOTDOT ".."-unescaping has not yet been done on input files
66 OPT_RBUF set default read I/O buffer size in bytes
67 OPT_WBUF set default write I/O buffer size in bytes
68 OPT_AUTOCHECK automatically check file list after every loadfile
69
70 Result/Error codes
71 RET_OK everything went fine
72 RET_IOERR I/O Error - examine errno
73 RET_NOMEM not enough memory
74 RET_ILLVAL illegal value for operation
75 RET_NODATA decoder didn't find any data
76 RET_NOEND encoded data wasn't ended properly
77 RET_UNSUP unsupported function (encoding)
78 RET_EXISTS file exists (decoding)
79 RET_CONT continue -- special from ScanPart
80 RET_CANCEL operation canceled
81
82 File States
83 This code is zero, i.e. "false":
84
85 UUFILE_READ Read in, but not further processed
86
87 The following state codes are or'ed together:
88
89 FILE_MISPART Missing Part(s) detected
90 FILE_NOBEGIN No 'begin' found
91 FILE_NOEND No 'end' found
92 FILE_NODATA File does not contain valid uudata
93 FILE_OK All Parts found, ready to decode
94 FILE_ERROR Error while decoding
95 FILE_DECODED Successfully decoded
96 FILE_TMPFILE Temporary decoded file exists
97
98 Encoding types
99 UU_ENCODED UUencoded data
100 B64_ENCODED Mime-Base64 data
101 XX_ENCODED XXencoded data
102 BH_ENCODED Binhex encoded
103 PT_ENCODED Plain-Text encoded (MIME)
104 QP_ENCODED Quoted-Printable (MIME)
105 YENC_ENCODED yEnc encoded (non-MIME)
106
108 Initializing and cleanup
109 Initialize is automatically called when the module is loaded and
110 allocates quite a small amount of memory for todays machines ;) CleanUp
111 releases that again.
112
113 On my machine, a fairly complete decode with DBI backend needs about
114 10MB RSS to decode 20000 files.
115
116 CleanUp
117 Release memory, file items and clean up files. Should be called
118 after a decoidng run, if you want to start a new one.
119
120 Setting and querying options
121 $option = GetOption OPT_xxx
122 SetOption OPT_xxx, opt-value
123
124 See the "OPT_xxx" constants above to see which options exist.
125
126 Setting various callbacks
127 SetMsgCallback [callback-function]
128 SetBusyCallback [callback-function]
129 SetFileCallback [callback-function]
130 SetFNameFilter [callback-function]
131
132 Call the currently selected FNameFilter
133 $file = FNameFilter $file
134
135 Loading sourcefiles, optionally fuzzy merge and start decoding
136 ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
137 Load the given file and scan it for encoded contents. Optionally
138 tag it with the given id, and if $delflag is true, delete the file
139 after it is no longer necessary. If you are certain of the part
140 number, you can specify it as the last argument.
141
142 A better (usually faster) way of doing this is using the
143 "SetFNameFilter" functionality.
144
145 $retval = Smerge $pass
146 If you are desperate, try to call "Smerge" with increasing $pass
147 values, beginning at 0, to try to merge parts that usually would
148 not have been merged.
149
150 Most probably this will result in garbled files, so never do this
151 by default, except:
152
153 If the "OPT_AUTOCHECK" option has been disabled (by default it is
154 enabled) to speed up file loading, then you have to call "Smerge
155 -1" after loading all files as an additional pre-pass (which is
156 normally done by "LoadFile").
157
158 $item = GetFileListItem $item_number
159 Return the $item structure for the $item_number'th found file, or
160 "undef" of no file with that number exists.
161
162 The first file has number 0, and the series has no holes, so you
163 can iterate over all files by starting with zero and incrementing
164 until you hit "undef".
165
166 This function has to walk the linear list of fils on each access,
167 so if you want to iterate over all items, it is usually faster to
168 use "GetFileList".
169
170 @items = GetFileList
171 Similar to "GetFileListItem", but returns all files in one go.
172
173 Decoding files
174 $retval = $item->rename ($newname)
175 Change the ondisk filename where the decoded file will be saved.
176
177 $retval = $item->decode_temp
178 Decode the file into a temporary location, use "$item->infile" to
179 retrieve the temporary filename.
180
181 $retval = $item->remove_temp
182 Remove the temporarily decoded file again.
183
184 $retval = $item->decode ([$target_path])
185 Decode the file to its destination, or the given target path.
186
187 $retval = $item->info (callback-function)
188
189 Querying (and setting) item attributes
190 $state = $item->state
191 $mode = $item->mode ([newmode])
192 $uudet = $item->uudet
193 $size = $item->size
194 $filename = $item->filename ([newfilename})
195 $subfname = $item->subfname
196 $mimeid = $item->mimeid
197 $mimetype = $item->mimetype
198 $binfile = $item->binfile
199
200 Information about source parts
201 $parts = $item->parts
202 Return information about all parts (source files) used to decode
203 the file as a list of hashrefs with the following structure:
204
205 {
206 partno => <integer describing the part number, starting with 1>,
207 # the following member sonly exist when they contain useful information
208 sfname => <local pathname of the file where this part is from>,
209 filename => <the ondisk filename of the decoded file>,
210 subfname => <used to cluster postings, possibly the posting filename>,
211 subject => <the subject of the posting/mail>,
212 origin => <the possible source (From) address>,
213 mimetype => <the possible mimetype of the decoded file>,
214 mimeid => <the id part of the Content-Type>,
215 }
216
217 Usually you are interested mostly the "sfname" and possibly the
218 "partno" and "filename" members.
219
220 Functions below are not documented and not very well tested - feedback
221 welcome
222 QuickDecode
223 EncodeMulti
224 EncodePartial
225 EncodeToStream
226 EncodeToFile
227 E_PrepSingle
228 E_PrepPartial
229
230 EXTENSION FUNCTIONS
231 Functions found in this module but not documented in the uulib
232 documentation:
233
234 $msg = straction ACT_xxx
235 Return a human readable string representing the given action code.
236
237 $msg = strerror RET_xxx
238 Return a human readable string representing the given error code.
239
240 $str = strencoding xxx_ENCODED
241 Return the name of the encoding type as a string.
242
243 $str = strmsglevel MSG_xxx
244 Returns the message level as a string.
245
246 SetFileNameCallback $cb
247 Sets (or queries) the FileNameCallback, which is called whenever
248 the decoding library can't find a filename and wants to extract a
249 filename from the subject line of a posting. The callback will be
250 called with two arguments, the subject line and the current
251 candidate for the filename. The latter argument can be "undef",
252 which means that no filename could be found (and likely no one
253 exists, so it is safe to also return "undef" in this case). If it
254 doesn't return anything (not even "undef"!), then nothing happens,
255 so this is a no-op callback:
256
257 sub cb {
258 return ();
259 }
260
261 If it returns "undef", then this indicates that no filename could
262 be found. In all other cases, the return value is taken to be the
263 filename.
264
265 This is a slightly more useful callback:
266
267 sub cb {
268 return unless $_[1]; # skip "Re:"-plies et al.
269 my ($subject, $filename) = @_;
270 # if we find some *.rar, take it
271 return $1 if $subject =~ /(\w+\.rar)/;
272 # otherwise just pass what we have
273 return ();
274 }
275
277 The general workflow for decoding is like this:
278
279 1. Configure options with "SetOption" or "SetXXXCallback".
280 2. Load all source files with "LoadFile".
281 3. Optionally "Smerge".
282 4. Iterate over all "GetFileList" items (i.e. result files).
283 5. "CleanUp" to delete files and free items.
284
285 What follows is the file "example-decoder" from the distribution that
286 illustrates the above worklfow in a non-trivial example.
287
288 #!/usr/bin/perl
289
290 # decode all the files in the directory uusrc/ and copy
291 # the resulting files to uudst/
292
293 use Convert::UUlib ':all';
294
295 sub namefilter {
296 my ($path) = @_;
297
298 $path=~s/^.*[\/\\]//;
299
300 $path
301 }
302
303 sub busycb {
304 my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
305 $_[0]=straction($action);
306 print "busy_callback(", (join ",",@_), ")\n";
307 0
308 }
309
310 SetOption OPT_RBUF, 128*1024;
311 SetOption OPT_WBUF, 1024*1024;
312 SetOption OPT_IGNMODE, 1;
313 SetOption OPT_IGNMODE, 1;
314 SetOption OPT_VERBOSE, 1;
315
316 # show the three ways you can set callback functions. I normally
317 # prefer the one with the sub inplace.
318 SetFNameFilter \&namefilter;
319
320 SetBusyCallback "busycb", 333;
321
322 SetMsgCallback sub {
323 my ($msg, $level) = @_;
324 print uc strmsglevel $_[1], ": $msg\n";
325 };
326
327 # the following non-trivial FileNameCallback takes care
328 # of some subject lines not detected properly by uulib:
329 SetFileNameCallback sub {
330 return unless $_[1]; # skip "Re:"-plies et al.
331 local $_ = $_[0];
332
333 # the following rules are rather effective on some newsgroups,
334 # like alt.binaries.games.anime, where non-mime, uuencoded data
335 # is very common
336
337 # if we find some *.rar, take it as the filename
338 return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
339
340 # one common subject format
341 return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
342
343 # - filename.par (04/55)
344 return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
345
346 # - (xxx) No. 1 sayuri81.jpg 756565 bytes
347 # - (20 files) No.17 Roseanne.jpg [2/2]
348 return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
349
350 # try to detect some common forms of filenames
351 return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
352
353 # otherwise just pass what we have
354 ()
355 };
356
357 # now read all files in the directory uusrc/*
358 for (<uusrc/*>) {
359 my ($retval, $count) = LoadFile ($_, $_, 1);
360 print "file($_), status(", strerror $retval, ") parts($count)\n";
361 }
362
363 SetOption OPT_SAVEPATH, "uudst/";
364
365 # now wade through all files and their source parts
366 for my $uu (GetFileList) {
367 print "file ", $uu->filename, "\n";
368 print " state ", $uu->state, "\n";
369 print " mode ", $uu->mode, "\n";
370 print " uudet ", strencoding $uu->uudet, "\n";
371 print " size ", $uu->size, "\n";
372 print " subfname ", $uu->subfname, "\n";
373 print " mimeid ", $uu->mimeid, "\n";
374 print " mimetype ", $uu->mimetype, "\n";
375
376 # print additional info about all parts
377 print " parts";
378 for ($uu->parts) {
379 for my $k (sort keys %$_) {
380 print " $k=$_->{$k}";
381 }
382 print "\n";
383 }
384
385 $uu->remove_temp;
386
387 if (my $err = $uu->decode) {
388 print " ERROR ", strerror $err, "\n";
389 } else {
390 print " successfully saved as uudst/", $uu->filename, "\n";
391 }
392 }
393
394 print "cleanup...\n";
395
396 CleanUp;
397
399 This module supports the perlmulticore standard (see
400 <http://perlmulticore.schmorp.de/> for more info) for the following
401 functions - generally these are functions accessing the disk and/or
402 using considerable CPU time:
403
404 LoadFile
405 $item->decode
406 $item->decode_temp
407 $item->remove_temp
408 $item->info
409
410 The perl interpreter will be reacquired/released on every callback
411 invocation, so for performance reasons, callbacks should be avoided if
412 that is costly.
413
414 Future versions might enable multicore support for more functions.
415
417 The original uulib library this module uses was written at a time where
418 main memory of measured in megabytes and buffer overflows as a security
419 thign didn't exist. While a lot of security fixes have been applied
420 over the years (includign some defense in depth mechanism that can
421 shield against a lot of as-of-yet undetected bugs), using this library
422 for security purposes requires care.
423
424 Likewise, file sizes when the uulib library was written were tiny
425 compared to today, so do not expect this library to handle files larger
426 than 2GB.
427
428 Lastly, this module uses a very "C-like" interface, which means it
429 doesn't protect you from invalid points as you might expect from "more
430 perlish" modules - for example, accessing a file item object after
431 callinbg "CleanUp" will likely result in crashes, memory corruption, or
432 worse.
433
435 Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
436 written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
437 heavily bugfixed by Marc Lehmann.
438
440 perl(1), uudeview homepage at
441 <http://www.fpx.de/fp/Software/UUDeview/>.
442
443
444
445perl v5.30.2 2020-04-26 UUlib(3)