1UUlib(3)              User Contributed Perl Documentation             UUlib(3)
2
3
4

NAME

6       Convert::UUlib - decode uu/xx/b64/mime/yenc/etc-encoded data from a
7       massive number of files
8

SYNOPSIS

10        use Convert::UUlib ':all';
11
12        # read all the files named on the commandline and decode them
13        # into the CURRENT directory. See below for a longer example.
14        LoadFile $_ for @ARGV;
15
16        for my $uu (GetFileList) {
17           if ($uu->state & FILE_OK) {
18             $uu->decode;
19             print $uu->filename, "\n";
20           }
21        }
22

DESCRIPTION

24       This module started as an interface to the uulib/uudeview library by
25       Frank Pilhofer that can be used to decode all kinds of usenet (and
26       other) binary messages.
27
28       After upstream abondoned the project, th library was continuously
29       bugfixed and improved in this module, with major focuses on security
30       fixes, correctness and speed (that does not mean that this library is
31       considered safe with untrusted data, but it surely is safer than the
32       poriginal uudeview).
33
34       Read the file doc/library.pdf from the distribution for in-depth
35       information about the C-library used in this interface, and the rest of
36       this document and especially the non-trivial decoder program at the
37       end.
38

EXPORTED CONSTANTS

40   Action code constants
41         ACT_IDLE      we don't do anything
42         ACT_SCANNING  scanning an input file
43         ACT_DECODING  decoding into a temp file
44         ACT_COPYING   copying temp to target
45         ACT_ENCODING  encoding a file
46
47   Message severity levels
48         MSG_MESSAGE   just a message, nothing important
49         MSG_NOTE      something that should be noticed
50         MSG_WARNING   important msg, processing continues
51         MSG_ERROR     processing has been terminated
52         MSG_FATAL     decoder cannot process further requests
53         MSG_PANIC     recovery impossible, app must terminate
54
55   Options
56         OPT_VERSION   version number MAJOR.MINORplPATCH (ro)
57         OPT_FAST      assumes only one part per file
58         OPT_DUMBNESS  switch off the program's intelligence
59         OPT_BRACKPOL  give numbers in [] higher precendence
60         OPT_VERBOSE   generate informative messages
61         OPT_DESPERATE try to decode incomplete files
62         OPT_IGNREPLY  ignore RE:plies (off by default)
63         OPT_OVERWRITE whether it's OK to overwrite ex. files
64         OPT_SAVEPATH  prefix to save-files on disk
65         OPT_IGNMODE   ignore the original file mode
66         OPT_DEBUG     print messages with FILE/LINE info
67         OPT_ERRNO     get last error code for RET_IOERR (ro)
68         OPT_PROGRESS  retrieve progress information
69         OPT_USETEXT   handle text messages
70         OPT_PREAMB    handle Mime preambles/epilogues
71         OPT_TINYB64   detect short B64 outside of Mime
72         OPT_ENCEXT    extension for single-part encoded files
73         OPT_REMOVE    remove input files after decoding (dangerous)
74         OPT_MOREMIME  strict MIME adherence
75         OPT_DOTDOT    ".."-unescaping has not yet been done on input files
76         OPT_RBUF      set default read I/O buffer size in bytes
77         OPT_WBUF      set default write I/O buffer size in bytes
78         OPT_AUTOCHECK automatically check file list after every loadfile
79
80   Result/Error codes
81         RET_OK        everything went fine
82         RET_IOERR     I/O Error - examine errno
83         RET_NOMEM     not enough memory
84         RET_ILLVAL    illegal value for operation
85         RET_NODATA    decoder didn't find any data
86         RET_NOEND     encoded data wasn't ended properly
87         RET_UNSUP     unsupported function (encoding)
88         RET_EXISTS    file exists (decoding)
89         RET_CONT      continue -- special from ScanPart
90         RET_CANCEL    operation canceled
91
92   File States
93        This code is zero, i.e. "false":
94
95         UUFILE_READ   Read in, but not further processed
96
97        The following state codes are or'ed together:
98
99         FILE_MISPART  Missing Part(s) detected
100         FILE_NOBEGIN  No 'begin' found
101         FILE_NOEND    No 'end' found
102         FILE_NODATA   File does not contain valid uudata
103         FILE_OK       All Parts found, ready to decode
104         FILE_ERROR    Error while decoding
105         FILE_DECODED  Successfully decoded
106         FILE_TMPFILE  Temporary decoded file exists
107
108   Encoding types
109         UU_ENCODED    UUencoded data
110         B64_ENCODED   Mime-Base64 data
111         XX_ENCODED    XXencoded data
112         BH_ENCODED    Binhex encoded
113         PT_ENCODED    Plain-Text encoded (MIME)
114         QP_ENCODED    Quoted-Printable (MIME)
115         YENC_ENCODED  yEnc encoded (non-MIME)
116

EXPORTED FUNCTIONS

118   Initializing and cleanup
119       Initialize is automatically called when the module is loaded and
120       allocates quite a small amount of memory for todays machines ;) CleanUp
121       releases that again.
122
123       On my machine, a fairly complete decode with DBI backend needs about
124       10MB RSS to decode 20000 files.
125
126       CleanUp
127           Release memory, file items and clean up files. Should be called
128           after a decoidng run, if you want to start a new one.
129
130   Setting and querying options
131       $option = GetOption OPT_xxx
132       SetOption OPT_xxx, opt-value
133
134       See the "OPT_xxx" constants above to see which options exist.
135
136   Setting various callbacks
137       SetMsgCallback [callback-function]
138       SetBusyCallback [callback-function]
139       SetFileCallback [callback-function]
140       SetFNameFilter [callback-function]
141
142   Call the currently selected FNameFilter
143       $file = FNameFilter $file
144
145   Loading sourcefiles, optionally fuzzy merge and start decoding
146       ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
147           Load the given file and scan it for encoded contents. Optionally
148           tag it with the given id, and if $delflag is true, delete the file
149           after it is no longer necessary. If you are certain of the part
150           number, you can specify it as the last argument.
151
152           A better (usually faster) way of doing this is using the
153           "SetFNameFilter" functionality.
154
155       $retval = Smerge $pass
156           If you are desperate, try to call "Smerge" with increasing $pass
157           values, beginning at 0, to try to merge parts that usually would
158           not have been merged.
159
160           Most probably this will result in garbled files, so never do this
161           by default, except:
162
163           If the "OPT_AUTOCHECK" option has been disabled (by default it is
164           enabled) to speed up file loading, then you have to call "Smerge
165           -1" after loading all files as an additional pre-pass (which is
166           normally done by "LoadFile").
167
168       $item = GetFileListItem $item_number
169           Return the $item structure for the $item_number'th found file, or
170           "undef" of no file with that number exists.
171
172           The first file has number 0, and the series has no holes, so you
173           can iterate over all files by starting with zero and incrementing
174           until you hit "undef".
175
176           This function has to walk the linear list of fils on each access,
177           so if you want to iterate over all items, it is usually faster to
178           use "GetFileList".
179
180       @items = GetFileList
181           Similar to "GetFileListItem", but returns all files in one go,
182           which is very much faster for large number of items, and has no
183           drawbacks when used for a small number of items.
184
185   Decoding files
186       $retval = $item->rename ($newname)
187           Change the ondisk filename where the decoded file will be saved.
188
189       $retval = $item->decode_temp
190           Decode the file into a temporary location, use "$item->infile" to
191           retrieve the temporary filename.
192
193       $retval = $item->remove_temp
194           Remove the temporarily decoded file again.
195
196       $retval = $item->decode ([$target_path])
197           Decode the file to its destination, or the given target path.
198
199       $retval = $item->info (callback-function)
200
201   Querying (and setting) item attributes
202       $state    = $item->state
203       $mode     = $item->mode ([newmode])
204       $uudet    = $item->uudet
205       $size     = $item->size
206       $filename = $item->filename ([newfilename})
207       $subfname = $item->subfname
208       $mimeid   = $item->mimeid
209       $mimetype = $item->mimetype
210       $binfile  = $item->binfile
211
212   Information about source parts
213       $parts = $item->parts
214           Return information about all parts (source files) used to decode
215           the file as a list of hashrefs with the following structure:
216
217            {
218              partno   => <integer describing the part number, starting with 1>,
219              # the following member sonly exist when they contain useful information
220              sfname   => <local pathname of the file where this part is from>,
221              filename => <the ondisk filename of the decoded file>,
222              subfname => <used to cluster postings, possibly the posting filename>,
223              subject  => <the subject of the posting/mail>,
224              origin   => <the possible source (From) address>,
225              mimetype => <the possible mimetype of the decoded file>,
226              mimeid   => <the id part of the Content-Type>,
227            }
228
229           Usually you are interested mostly the "sfname" and possibly the
230           "partno" and "filename" members.
231
232   Functions below are not documented and not very well tested - feedback
233       welcome
234         QuickDecode
235         EncodeMulti
236         EncodePartial
237         EncodeToStream
238         EncodeToFile
239         E_PrepSingle
240         E_PrepPartial
241
242   EXTENSION FUNCTIONS
243       Functions found in this module but not documented in the uulib
244       documentation:
245
246       $msg = straction ACT_xxx
247           Return a human readable string representing the given action code.
248
249       $msg = strerror RET_xxx
250           Return a human readable string representing the given error code.
251
252       $str = strencoding xxx_ENCODED
253           Return the name of the encoding type as a string.
254
255       $str = strmsglevel MSG_xxx
256           Returns the message level as a string.
257
258       SetFileNameCallback $cb
259           Sets (or queries) the FileNameCallback, which is called whenever
260           the decoding library can't find a filename and wants to extract a
261           filename from the subject line of a posting. The callback will be
262           called with two arguments, the subject line and the current
263           candidate for the filename. The latter argument can be "undef",
264           which means that no filename could be found (and likely no one
265           exists, so it is safe to also return "undef" in this case). If it
266           doesn't return anything (not even "undef"!), then nothing happens,
267           so this is a no-op callback:
268
269              sub cb {
270                 return ();
271              }
272
273           If it returns "undef", then this indicates that no filename could
274           be found. In all other cases, the return value is taken to be the
275           filename.
276
277           This is a slightly more useful callback:
278
279             sub cb {
280                return unless $_[1]; # skip "Re:"-plies et al.
281                my ($subject, $filename) = @_;
282                # if we find some *.rar, take it
283                return $1 if $subject =~ /(\w+\.rar)/;
284                # otherwise just pass what we have
285                return ();
286             }
287

LARGE EXAMPLE DECODER

289       The general workflow for decoding is like this:
290
291       1. Configure options with "SetOption" or "SetXXXCallback".
292       2. Load all source files with "LoadFile".
293       3. Optionally "Smerge".
294       4. Iterate over all "GetFileList" items (i.e. result files).
295       5. "CleanUp" to delete files and free items.
296
297       What follows is the file "example-decoder" from the distribution that
298       illustrates the above worklfow in a non-trivial example.
299
300          #!/usr/bin/perl
301
302          # decode all the files in the directory uusrc/ and copy
303          # the resulting files to uudst/
304
305          use Convert::UUlib ':all';
306
307          sub namefilter {
308             my ($path) = @_;
309
310             $path=~s/^.*[\/\\]//;
311
312             $path
313          }
314
315          sub busycb {
316             my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
317             $_[0]=straction($action);
318             print "busy_callback(", (join ",",@_), ")\n";
319             0
320          }
321
322          SetOption OPT_RBUF, 128*1024;
323          SetOption OPT_WBUF, 1024*1024;
324          SetOption OPT_IGNMODE, 1;
325          SetOption OPT_IGNMODE, 1;
326          SetOption OPT_VERBOSE, 1;
327          SetOption OPT_AUTOCHK, 0;
328
329          # show the three ways you can set callback functions. I normally
330          # prefer the one with the sub inplace.
331          SetFNameFilter \&namefilter;
332
333          SetBusyCallback "busycb", 333;
334
335          SetMsgCallback sub {
336             my ($msg, $level) = @_;
337             print uc strmsglevel $_[1], ": $msg\n";
338          };
339
340          # the following non-trivial FileNameCallback takes care
341          # of some subject lines not detected properly by uulib:
342          SetFileNameCallback sub {
343             return unless $_[1]; # skip "Re:"-plies et al.
344             local $_ = $_[0];
345
346             # the following rules are rather effective on some newsgroups,
347             # like alt.binaries.games.anime, where non-mime, uuencoded data
348             # is very common
349
350             # if we find some *.rar, take it as the filename
351             return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
352
353             # one common subject format
354             return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
355
356             # - filename.par (04/55)
357             return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
358
359             # - (xxx) No. 1 sayuri81.jpg 756565 bytes
360             # - (20 files) No.17 Roseanne.jpg [2/2]
361             return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
362
363             # try to detect some common forms of filenames
364             return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
365
366             # otherwise just pass what we have
367             ()
368          };
369
370          # now read all files in the directory uusrc/*
371          for (<uusrc/*>) {
372             my ($retval, $count) = LoadFile ($_, $_, 1);
373             print "file($_), status(", strerror $retval, ") parts($count)\n";
374          }
375
376          Smerge -1;
377
378          SetOption OPT_SAVEPATH, "uudst/";
379
380          # now wade through all files and their source parts
381          for my $uu (GetFileList) {
382             print "file ", $uu->filename, "\n";
383             print " state ", $uu->state, "\n";
384             print " mode ", $uu->mode, "\n";
385             print " uudet ", strencoding $uu->uudet, "\n";
386             print " size ", $uu->size, "\n";
387             print " subfname ", $uu->subfname, "\n";
388             print " mimeid ", $uu->mimeid, "\n";
389             print " mimetype ", $uu->mimetype, "\n";
390
391             # print additional info about all parts
392             print " parts";
393             for ($uu->parts) {
394                for my $k (sort keys %$_) {
395                   print " $k=$_->{$k}";
396                }
397                print "\n";
398             }
399
400             $uu->remove_temp;
401
402             if (my $err = $uu->decode) {
403                print " ERROR ", strerror $err, "\n";
404             } else {
405                print " successfully saved as uudst/", $uu->filename, "\n";
406             }
407          }
408
409          print "cleanup...\n";
410
411          CleanUp;
412

PERLMULTICORE SUPPORT

414       This module supports the perlmulticore standard (see
415       <http://perlmulticore.schmorp.de/> for more info) for the following
416       functions - generally these are functions accessing the disk and/or
417       using considerable CPU time:
418
419          LoadFile
420          $item->decode
421          $item->decode_temp
422          $item->remove_temp
423          $item->info
424
425       The perl interpreter will be reacquired/released on every callback
426       invocation, so for performance reasons, callbacks should be avoided if
427       that is costly.
428
429       Future versions might enable multicore support for more functions.
430

BUGS AND LIMITATIONS

432       The original uulib library this module uses was written at a time where
433       main memory of measured in megabytes and buffer overflows as a security
434       thign didn't exist. While a lot of security fixes have been applied
435       over the years (includign some defense in depth mechanism that can
436       shield against a lot of as-of-yet undetected bugs), using this library
437       for security purposes requires care.
438
439       Likewise, file sizes when the uulib library was written were tiny
440       compared to today, so do not expect this library to handle files larger
441       than 2GB.
442
443       Lastly, this module uses a very "C-like" interface, which means it
444       doesn't protect you from invalid points as you might expect from "more
445       perlish" modules - for example, accessing a file item object after
446       callinbg "CleanUp" will likely result in crashes, memory corruption, or
447       worse.
448

AUTHOR

450       Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
451       written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
452       heavily bugfixed by Marc Lehmann.
453

SEE ALSO

455       perl(1), uudeview homepage at
456       <http://www.fpx.de/fp/Software/UUDeview/>.
457
458
459
460perl v5.32.0                      2020-12-31                          UUlib(3)
Impressum