1UUlib(3) User Contributed Perl Documentation UUlib(3)
2
3
4
6 Convert::UUlib - decode uu/xx/b64/mime/yenc/etc-encoded data from a
7 massive number of files
8
10 use Convert::UUlib ':all';
11
12 # read all the files named on the commandline and decode them
13 # into the CURRENT directory. See below for a longer example.
14 LoadFile $_ for @ARGV;
15
16 for my $uu (GetFileList) {
17 if ($uu->state & FILE_OK) {
18 $uu->decode;
19 print $uu->filename, "\n";
20 }
21 }
22
24 This module started as an interface to the uulib/uudeview library by
25 Frank Pilhofer that can be used to decode all kinds of usenet (and
26 other) binary messages.
27
28 After upstream abondoned the project, th library was continuously
29 bugfixed and improved in this module, with major focuses on security
30 fixes, correctness and speed (that does not mean that this library is
31 considered safe with untrusted data, but it surely is safer than the
32 poriginal uudeview).
33
34 Read the file doc/library.pdf from the distribution for in-depth
35 information about the C-library used in this interface, and the rest of
36 this document and especially the non-trivial decoder program at the
37 end.
38
40 Action code constants
41 ACT_IDLE we don't do anything
42 ACT_SCANNING scanning an input file
43 ACT_DECODING decoding into a temp file
44 ACT_COPYING copying temp to target
45 ACT_ENCODING encoding a file
46
47 Message severity levels
48 MSG_MESSAGE just a message, nothing important
49 MSG_NOTE something that should be noticed
50 MSG_WARNING important msg, processing continues
51 MSG_ERROR processing has been terminated
52 MSG_FATAL decoder cannot process further requests
53 MSG_PANIC recovery impossible, app must terminate
54
55 Options
56 OPT_VERSION version number MAJOR.MINORplPATCH (ro)
57 OPT_FAST assumes only one part per file
58 OPT_DUMBNESS switch off the program's intelligence
59 OPT_BRACKPOL give numbers in [] higher precendence
60 OPT_VERBOSE generate informative messages
61 OPT_DESPERATE try to decode incomplete files
62 OPT_IGNREPLY ignore RE:plies (off by default)
63 OPT_OVERWRITE whether it's OK to overwrite ex. files
64 OPT_SAVEPATH prefix to save-files on disk
65 OPT_IGNMODE ignore the original file mode
66 OPT_DEBUG print messages with FILE/LINE info
67 OPT_ERRNO get last error code for RET_IOERR (ro)
68 OPT_PROGRESS retrieve progress information
69 OPT_USETEXT handle text messages
70 OPT_PREAMB handle Mime preambles/epilogues
71 OPT_TINYB64 detect short B64 outside of Mime
72 OPT_ENCEXT extension for single-part encoded files
73 OPT_REMOVE remove input files after decoding (dangerous)
74 OPT_MOREMIME strict MIME adherence
75 OPT_DOTDOT ".."-unescaping has not yet been done on input files
76 OPT_RBUF set default read I/O buffer size in bytes
77 OPT_WBUF set default write I/O buffer size in bytes
78 OPT_AUTOCHECK automatically check file list after every loadfile
79
80 Result/Error codes
81 RET_OK everything went fine
82 RET_IOERR I/O Error - examine errno
83 RET_NOMEM not enough memory
84 RET_ILLVAL illegal value for operation
85 RET_NODATA decoder didn't find any data
86 RET_NOEND encoded data wasn't ended properly
87 RET_UNSUP unsupported function (encoding)
88 RET_EXISTS file exists (decoding)
89 RET_CONT continue -- special from ScanPart
90 RET_CANCEL operation canceled
91
92 File States
93 This code is zero, i.e. "false":
94
95 UUFILE_READ Read in, but not further processed
96
97 The following state codes are or'ed together:
98
99 FILE_MISPART Missing Part(s) detected
100 FILE_NOBEGIN No 'begin' found
101 FILE_NOEND No 'end' found
102 FILE_NODATA File does not contain valid uudata
103 FILE_OK All Parts found, ready to decode
104 FILE_ERROR Error while decoding
105 FILE_DECODED Successfully decoded
106 FILE_TMPFILE Temporary decoded file exists
107
108 Encoding types
109 UU_ENCODED UUencoded data
110 B64_ENCODED Mime-Base64 data
111 XX_ENCODED XXencoded data
112 BH_ENCODED Binhex encoded
113 PT_ENCODED Plain-Text encoded (MIME)
114 QP_ENCODED Quoted-Printable (MIME)
115 YENC_ENCODED yEnc encoded (non-MIME)
116
118 Initializing and cleanup
119 Initialize is automatically called when the module is loaded and
120 allocates quite a small amount of memory for todays machines ;) CleanUp
121 releases that again.
122
123 On my machine, a fairly complete decode with DBI backend needs about
124 10MB RSS to decode 20000 files.
125
126 CleanUp
127 Release memory, file items and clean up files. Should be called
128 after a decoidng run, if you want to start a new one.
129
130 Setting and querying options
131 $option = GetOption OPT_xxx
132 SetOption OPT_xxx, opt-value
133
134 See the "OPT_xxx" constants above to see which options exist.
135
136 Setting various callbacks
137 SetMsgCallback [callback-function]
138 SetBusyCallback [callback-function]
139 SetFileCallback [callback-function]
140 SetFNameFilter [callback-function]
141
142 Call the currently selected FNameFilter
143 $file = FNameFilter $file
144
145 Loading sourcefiles, optionally fuzzy merge and start decoding
146 ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
147 Load the given file and scan it for encoded contents. Optionally
148 tag it with the given id, and if $delflag is true, delete the file
149 after it is no longer necessary. If you are certain of the part
150 number, you can specify it as the last argument.
151
152 A better (usually faster) way of doing this is using the
153 "SetFNameFilter" functionality.
154
155 $retval = Smerge $pass
156 If you are desperate, try to call "Smerge" with increasing $pass
157 values, beginning at 0, to try to merge parts that usually would
158 not have been merged.
159
160 Most probably this will result in garbled files, so never do this
161 by default, except:
162
163 If the "OPT_AUTOCHECK" option has been disabled (by default it is
164 enabled) to speed up file loading, then you have to call "Smerge
165 -1" after loading all files as an additional pre-pass (which is
166 normally done by "LoadFile").
167
168 $item = GetFileListItem $item_number
169 Return the $item structure for the $item_number'th found file, or
170 "undef" of no file with that number exists.
171
172 The first file has number 0, and the series has no holes, so you
173 can iterate over all files by starting with zero and incrementing
174 until you hit "undef".
175
176 This function has to walk the linear list of fils on each access,
177 so if you want to iterate over all items, it is usually faster to
178 use "GetFileList".
179
180 @items = GetFileList
181 Similar to "GetFileListItem", but returns all files in one go,
182 which is very much faster for large number of items, and has no
183 drawbacks when used for a small number of items.
184
185 Decoding files
186 $retval = $item->rename ($newname)
187 Change the ondisk filename where the decoded file will be saved.
188
189 $retval = $item->decode_temp
190 Decode the file into a temporary location, use "$item->infile" to
191 retrieve the temporary filename.
192
193 $retval = $item->remove_temp
194 Remove the temporarily decoded file again.
195
196 $retval = $item->decode ([$target_path])
197 Decode the file to its destination, or the given target path.
198
199 $retval = $item->info (callback-function)
200
201 Querying (and setting) item attributes
202 $state = $item->state
203 $mode = $item->mode ([newmode])
204 $uudet = $item->uudet
205 $size = $item->size
206 $filename = $item->filename ([newfilename})
207 $subfname = $item->subfname
208 $mimeid = $item->mimeid
209 $mimetype = $item->mimetype
210 $binfile = $item->binfile
211
212 Information about source parts
213 $parts = $item->parts
214 Return information about all parts (source files) used to decode
215 the file as a list of hashrefs with the following structure:
216
217 {
218 partno => <integer describing the part number, starting with 1>,
219 # the following member sonly exist when they contain useful information
220 sfname => <local pathname of the file where this part is from>,
221 filename => <the ondisk filename of the decoded file>,
222 subfname => <used to cluster postings, possibly the posting filename>,
223 subject => <the subject of the posting/mail>,
224 origin => <the possible source (From) address>,
225 mimetype => <the possible mimetype of the decoded file>,
226 mimeid => <the id part of the Content-Type>,
227 }
228
229 Usually you are interested mostly the "sfname" and possibly the
230 "partno" and "filename" members.
231
232 Functions below are not documented and not very well tested - feedback
233 welcome
234 QuickDecode
235 EncodeMulti
236 EncodePartial
237 EncodeToStream
238 EncodeToFile
239 E_PrepSingle
240 E_PrepPartial
241
242 EXTENSION FUNCTIONS
243 Functions found in this module but not documented in the uulib
244 documentation:
245
246 $msg = straction ACT_xxx
247 Return a human readable string representing the given action code.
248
249 $msg = strerror RET_xxx
250 Return a human readable string representing the given error code.
251
252 $str = strencoding xxx_ENCODED
253 Return the name of the encoding type as a string.
254
255 $str = strmsglevel MSG_xxx
256 Returns the message level as a string.
257
258 SetFileNameCallback $cb
259 Sets (or queries) the FileNameCallback, which is called whenever
260 the decoding library can't find a filename and wants to extract a
261 filename from the subject line of a posting. The callback will be
262 called with two arguments, the subject line and the current
263 candidate for the filename. The latter argument can be "undef",
264 which means that no filename could be found (and likely no one
265 exists, so it is safe to also return "undef" in this case). If it
266 doesn't return anything (not even "undef"!), then nothing happens,
267 so this is a no-op callback:
268
269 sub cb {
270 return ();
271 }
272
273 If it returns "undef", then this indicates that no filename could
274 be found. In all other cases, the return value is taken to be the
275 filename.
276
277 This is a slightly more useful callback:
278
279 sub cb {
280 return unless $_[1]; # skip "Re:"-plies et al.
281 my ($subject, $filename) = @_;
282 # if we find some *.rar, take it
283 return $1 if $subject =~ /(\w+\.rar)/;
284 # otherwise just pass what we have
285 return ();
286 }
287
289 The general workflow for decoding is like this:
290
291 1. Configure options with "SetOption" or "SetXXXCallback".
292 2. Load all source files with "LoadFile".
293 3. Optionally "Smerge".
294 4. Iterate over all "GetFileList" items (i.e. result files).
295 5. "CleanUp" to delete files and free items.
296
297 What follows is the file "example-decoder" from the distribution that
298 illustrates the above worklfow in a non-trivial example.
299
300 #!/usr/bin/perl
301
302 # decode all the files in the directory uusrc/ and copy
303 # the resulting files to uudst/
304
305 use Convert::UUlib ':all';
306
307 sub namefilter {
308 my ($path) = @_;
309
310 $path=~s/^.*[\/\\]//;
311
312 $path
313 }
314
315 sub busycb {
316 my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
317 $_[0]=straction($action);
318 print "busy_callback(", (join ",",@_), ")\n";
319 0
320 }
321
322 SetOption OPT_RBUF, 128*1024;
323 SetOption OPT_WBUF, 1024*1024;
324 SetOption OPT_IGNMODE, 1;
325 SetOption OPT_IGNMODE, 1;
326 SetOption OPT_VERBOSE, 1;
327 SetOption OPT_AUTOCHK, 0;
328
329 # show the three ways you can set callback functions. I normally
330 # prefer the one with the sub inplace.
331 SetFNameFilter \&namefilter;
332
333 SetBusyCallback "busycb", 333;
334
335 SetMsgCallback sub {
336 my ($msg, $level) = @_;
337 print uc strmsglevel $_[1], ": $msg\n";
338 };
339
340 # the following non-trivial FileNameCallback takes care
341 # of some subject lines not detected properly by uulib:
342 SetFileNameCallback sub {
343 return unless $_[1]; # skip "Re:"-plies et al.
344 local $_ = $_[0];
345
346 # the following rules are rather effective on some newsgroups,
347 # like alt.binaries.games.anime, where non-mime, uuencoded data
348 # is very common
349
350 # if we find some *.rar, take it as the filename
351 return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
352
353 # one common subject format
354 return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
355
356 # - filename.par (04/55)
357 return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
358
359 # - (xxx) No. 1 sayuri81.jpg 756565 bytes
360 # - (20 files) No.17 Roseanne.jpg [2/2]
361 return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
362
363 # try to detect some common forms of filenames
364 return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
365
366 # otherwise just pass what we have
367 ()
368 };
369
370 # now read all files in the directory uusrc/*
371 for (<uusrc/*>) {
372 my ($retval, $count) = LoadFile ($_, $_, 1);
373 print "file($_), status(", strerror $retval, ") parts($count)\n";
374 }
375
376 Smerge -1;
377
378 SetOption OPT_SAVEPATH, "uudst/";
379
380 # now wade through all files and their source parts
381 for my $uu (GetFileList) {
382 print "file ", $uu->filename, "\n";
383 print " state ", $uu->state, "\n";
384 print " mode ", $uu->mode, "\n";
385 print " uudet ", strencoding $uu->uudet, "\n";
386 print " size ", $uu->size, "\n";
387 print " subfname ", $uu->subfname, "\n";
388 print " mimeid ", $uu->mimeid, "\n";
389 print " mimetype ", $uu->mimetype, "\n";
390
391 # print additional info about all parts
392 print " parts";
393 for ($uu->parts) {
394 for my $k (sort keys %$_) {
395 print " $k=$_->{$k}";
396 }
397 print "\n";
398 }
399
400 $uu->remove_temp;
401
402 if (my $err = $uu->decode) {
403 print " ERROR ", strerror $err, "\n";
404 } else {
405 print " successfully saved as uudst/", $uu->filename, "\n";
406 }
407 }
408
409 print "cleanup...\n";
410
411 CleanUp;
412
414 This module supports the perlmulticore standard (see
415 <http://perlmulticore.schmorp.de/> for more info) for the following
416 functions - generally these are functions accessing the disk and/or
417 using considerable CPU time:
418
419 LoadFile
420 $item->decode
421 $item->decode_temp
422 $item->remove_temp
423 $item->info
424
425 The perl interpreter will be reacquired/released on every callback
426 invocation, so for performance reasons, callbacks should be avoided if
427 that is costly.
428
429 Future versions might enable multicore support for more functions.
430
432 The original uulib library this module uses was written at a time where
433 main memory of measured in megabytes and buffer overflows as a security
434 thign didn't exist. While a lot of security fixes have been applied
435 over the years (includign some defense in depth mechanism that can
436 shield against a lot of as-of-yet undetected bugs), using this library
437 for security purposes requires care.
438
439 Likewise, file sizes when the uulib library was written were tiny
440 compared to today, so do not expect this library to handle files larger
441 than 2GB.
442
443 Lastly, this module uses a very "C-like" interface, which means it
444 doesn't protect you from invalid points as you might expect from "more
445 perlish" modules - for example, accessing a file item object after
446 callinbg "CleanUp" will likely result in crashes, memory corruption, or
447 worse.
448
450 Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
451 written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
452 heavily bugfixed by Marc Lehmann.
453
455 perl(1), uudeview homepage at
456 <http://www.fpx.de/fp/Software/UUDeview/>.
457
458
459
460perl v5.36.0 2023-01-20 UUlib(3)