MIME::EncWords(3pm)

1EncWords(3)           User Contributed Perl Documentation          EncWords(3)
2
3
4

NAME

6       MIME::EncWords - deal with RFC 2047 encoded words (improved)
7

SYNOPSIS

9       MIME::EncWords is aimed to be another implimentation of MIME::Words so
10       that it will achive more exact conformance with RFC 2047 (former RFC
11       1522) specifications.  Additionally, it contains some improvements.
12       Following synopsis and descriptions are inherited from its inspirer,
13       then added descriptions on improvements (**) or changes and
14       clarifications (*).
15
16       Before reading further, you should see MIME::Tools to make sure that
17       you understand where this module fits into the grand scheme of things.
18       Go on, do it now.  I'll wait.
19
20       Ready?  Ok...
21
22           use MIME::EncWords qw(:all);
23
24           ### Decode the string into another string, forgetting the charsets:
25           $decoded = decode_mimewords(
26                 'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
27                 );
28
29           ### Split string into array of decoded [DATA,CHARSET] pairs:
30           @decoded = decode_mimewords(
31                 'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
32                 );
33
34           ### Encode a single unsafe word:
35           $encoded = encode_mimeword("\xABFran\xE7ois\xBB");
36
37           ### Encode a string, trying to find the unsafe words inside it:
38           $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB in town");
39

DESCRIPTION

41       Fellow Americans, you probably won't know what the hell this module is
42       for.  Europeans, Russians, et al, you probably do.  ":-)".
43
44       For example, here's a valid MIME header you might get:
45
46             From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
47             To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
48             CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
49             Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
50              =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
51              =?US-ASCII?Q?.._cool!?=
52
53       The fields basically decode to (sorry, I can only approximate the Latin
54       characters with 7 bit sequences /o and 'e):
55
56             From: Keith Moore <moore@cs.utk.edu>
57             To: Keld J/orn Simonsen <keld@dkuug.dk>
58             CC: Andr'e  Pirard <PIRARD@vm1.ulg.ac.be>
59             Subject: If you can read this you understand the example... cool!
60
61       Supplement: Fellow Americans, Europeans, you probably won't know what
62       the hell this module is for.  East Asians, et al, you probably do.
63       "(^_^)".
64
65       For example, here's a valid MIME header you might get:
66
67             Subject: =?EUC-KR?B?sNTAuLinKGxhemluZXNzKSwgwvzB9ri7seIoaW1w?=
68              =?EUC-KR?B?YXRpZW5jZSksILGzuLgoaHVicmlzKQ==?=
69
70       The fields basically decode to (sorry, I cannot approximate the non-
71       Latin multibyte characters with any 7 bit sequences):
72
73             Subject: ???(laziness), ????(impatience), ??(hubris)
74

PUBLIC INTERFACE

76       decode_mimewords ENCODED, [OPTS...]
77           Function.  Go through the string looking for RFC-1522-style "Q"
78           (quoted-printable, sort of) or "B" (base64) encoding, and decode
79           them.
80
81           In an array context, splits the ENCODED string into a list of
82           decoded "[DATA, CHARSET]" pairs, and returns that list.  Unencoded
83           data are returned in a 1-element array "[DATA]", giving an
84           effective CHARSET of "undef".
85
86               $enc = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>';
87               foreach (decode_mimewords($enc)) {
88                   print "", ($_[1] || 'US-ASCII'), ": ", $_[0], "\n";
89               }
90
91           ** However, adjacent encoded-words with same charset will be
92           concatenated to handle multibyte sequences safely.
93
94           * Whitespaces surrounding unencoded data will not be stripped so
95           that compatibility with MIME::Words will be ensured.
96
97           In a scalar context, joins the "data" elements of the above list
98           together, and returns that.  Warning: this is information-lossy,
99           and probably not what you want, but if you know that all charsets
100           in the ENCODED string are identical, it might be useful to you.
101           (Before you use this, please see "unmime" in MIME::WordDecoder,
102           which is probably what you want.)  ** See also "Charset" option
103           below.
104
105           In the event of a syntax error, $@ will be set to a description of
106           the error, but parsing will continue as best as possible (so as to
107           get something back when decoding headers).  $@ will be false if no
108           error was detected.
109
110           * Malformed base64 encoded-words will be kept encoded.  In this
111           case $@ will be set.
112
113           Any arguments past the ENCODED string are taken to define a hash of
114           options.  ** When Unicode/multibyte support is disabled (see
115           "USE_ENCODE" in MIME::Charset), these options will not have any
116           effects.
117
118           Charset **
119               Name of character set by which data elements in scalar context
120               will be converted.  The default is no conversion.  If this
121               option is specified as special value "_UNICODE_", returned
122               value will be Unicode string.
123
124               Note: This feature is still information-lossy, except when
125               "_UNICODE_" is specified.
126
127           Detect7bit **
128               Try to detect 7-bit charset on unencoded portions.  Default is
129               "YES".
130
131           Mapping **
132               In scalar context, specify mappings actually used for charset
133               names.  "EXTENDED" uses extended mappings.  "STANDARD" uses
134               standardized strict mappings.  Default is "EXTENDED".
135
136       encode_mimeword RAW, [ENCODING], [CHARSET]
137           Function.  Encode a single RAW "word" that has unsafe characters.
138           The "word" will be encoded in its entirety.
139
140               ### Encode "<<Franc,ois>>":
141               $encoded = encode_mimeword("\xABFran\xE7ois\xBB");
142
143           You may specify the ENCODING ("Q" or "B"), which defaults to "Q".
144           ** You may also specify it as ``special'' value: "S" to choose
145           shorter one of either "Q" or "B".
146
147           You may specify the CHARSET, which defaults to "iso-8859-1".
148
149           * Spaces will be escaped with ``_'' by "Q" encoding.
150
151       encode_mimewords RAW, [OPTS]
152           Function.  Given a RAW string, try to find and encode all "unsafe"
153           sequences of characters:
154
155               ### Encode a string with some unsafe "words":
156               $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB");
157
158           Returns the encoded string.
159
160           ** RAW may be a Unicode string when Unicode/multibyte support is
161           enabled (see "USE_ENCODE" in MIME::Charset).  Furthermore, RAW may
162           be a reference to that returned by "decode_mimewords" on array
163           context.  In latter case "Charset" option (see below) will be
164           overridden (see also a note below).
165
166           Note: * When RAW is an arrayref, adjacent encoded-words (i.e.
167           elements having non-ASCII charset element) are concatenated.  Then
168           they are splitted taking care of character boundaries of multibyte
169           sequences when Unicode/multibyte support is enabled.  Portions for
170           unencoded data should include surrounding whitespace(s), or they
171           will be merged into adjoining encoded-word(s).
172
173           Any arguments past the RAW string are taken to define a hash of
174           options:
175
176           Charset
177               Encode all unsafe stuff with this charset.  Default is
178               'ISO-8859-1', a.k.a. "Latin-1".
179
180           Detect7bit **
181               When "Encoding" option (see below) is specified as "a" and
182               "Charset" option is unknown, try to detect 7-bit charset on
183               given RAW string.  Default is "YES".  When Unicode/multibyte
184               support is disabled, this option will not have any effects (see
185               "USE_ENCODE" in MIME::Charset).
186
187           Encoding
188               The encoding to use, "q" or "b".  ** You may also specify
189               ``special'' values: "a" will automatically choose recommended
190               encoding to use (with charset conversion if alternative charset
191               is recommended: see MIME::Charset); "s" will choose shorter one
192               of either "q" or "b".  Note: * As of release 1.005, The default
193               was changed from "q" (the default on MIME::Words) to "a".
194
195           Field
196               Name of the mail field this string will be used in.  ** Length
197               of mail field name will be considered in the first line of
198               encoded header.
199
200           Folding **
201               A Sequence to fold encoded lines.  The default is "\n".  If
202               empty string "" is specified, encoded-words exceeding line
203               length (see "MaxLineLen" below) will be splitted by SPACE.
204
205               Note: * Though RFC 2822 states that the lines are delimited by
206               CRLF ("\r\n"), this module chose LF ("\n") as a default to keep
207               backward compatibility.  When you use the default, you might
208               need converting newlines before encoded headers are thrown into
209               session.
210
211           Mapping **
212               Specify mappings actually used for charset names.  "EXTENDED"
213               uses extended mappings.  "STANDARD" uses standardized strict
214               mappings.  The default is "EXTENDED".  When Unicode/multibyte
215               support is disabled, this option will not have any effects (see
216               "USE_ENCODE" in MIME::Charset).
217
218           MaxLineLen **
219               Maximum line length excluding newline.  The default is 76.
220
221           Minimal **
222               Takes care of natural word separators (i.e. whitespaces) in the
223               text to be encoded.  If "NO" is specified, this module will
224               encode whole text (if encoding needed) not regarding
225               whitespaces; encoded-words exceeding line length will be
226               splitted based only on their lengths.  Default is "YES".
227
228               Note: As of release 0.040, default has been changed to "YES" to
229               ensure compatibility with MIME::Words.  On earlier releases,
230               this option was fixed to be "NO".
231
232           Replacement **
233               See "Error Handling" in MIME::Charset.
234
235   Configuration Files **
236       Built-in defaults of option parameters for "decode_mimewords" (except
237       'Charset' option) and "encode_mimewords" can be overridden by
238       configuration files: MIME/Charset/Defaults.pm and
239       MIME/EncWords/Defaults.pm.  For more details read
240       MIME/EncWords/Defaults.pm.sample.
241

VERSION

243       Consult $VERSION variable.
244
245       Development versions of this module may be found at
246       http://hatuka.nezumi.nu/repos/MIME-EncWords/
247       <http://hatuka.nezumi.nu/repos/MIME-EncWords/>.
248

AUTHORS

253       The original version of function decode_mimewords() is derived from
254       MIME::Words module that was written by:
255           Eryq (eryq@zeegee.com), ZeeGee Software Inc
256       (http://www.zeegee.com).
257           David F. Skoll (dfs@roaringpenguin.com)
258       http://www.roaringpenguin.com
259
260       Other stuff are rewritten or added by:
261           Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>.
262
263       All rights reserved.  This program is free software; you can
264       redistribute it and/or modify it under the same terms as Perl itself.
265
266
267
268perl v5.12.0                      2008-04-19                       EncWords(3)

NAME

SYNOPSIS

DESCRIPTION

PUBLIC INTERFACE

VERSION

SEE ALSO

AUTHORS