1EncWords(3) User Contributed Perl Documentation EncWords(3)
2
3
4
6 MIME::EncWords - deal with RFC 2047 encoded words (improved)
7
9 MIME::EncWords is aimed to be another implimentation of MIME::Words so
10 that it will achive more exact conformance with RFC 2047 (former RFC
11 1522) specifications. Additionally, it contains some improvements.
12 Following synopsis and descriptions are inherited from its inspirer,
13 then added descriptions on improvements (**) or changes and
14 clarifications (*).
15
16 Before reading further, you should see MIME::Tools to make sure that
17 you understand where this module fits into the grand scheme of things.
18 Go on, do it now. I'll wait.
19
20 Ready? Ok...
21
22 use MIME::EncWords qw(:all);
23
24 ### Decode the string into another string, forgetting the charsets:
25 $decoded = decode_mimewords(
26 'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
27 );
28
29 ### Split string into array of decoded [DATA,CHARSET] pairs:
30 @decoded = decode_mimewords(
31 'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
32 );
33
34 ### Encode a single unsafe word:
35 $encoded = encode_mimeword("\xABFran\xE7ois\xBB");
36
37 ### Encode a string, trying to find the unsafe words inside it:
38 $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB in town");
39
41 Fellow Americans, you probably won't know what the hell this module is
42 for. Europeans, Russians, et al, you probably do. ":-)".
43
44 For example, here's a valid MIME header you might get:
45
46 From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
47 To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
48 CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
49 Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
50 =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
51 =?US-ASCII?Q?.._cool!?=
52
53 The fields basically decode to (sorry, I can only approximate the Latin
54 characters with 7 bit sequences /o and 'e):
55
56 From: Keith Moore <moore@cs.utk.edu>
57 To: Keld J/orn Simonsen <keld@dkuug.dk>
58 CC: Andr'e Pirard <PIRARD@vm1.ulg.ac.be>
59 Subject: If you can read this you understand the example... cool!
60
61 Supplement: Fellow Americans, Europeans, you probably won't know what
62 the hell this module is for. East Asians, et al, you probably do.
63 "(^_^)".
64
65 For example, here's a valid MIME header you might get:
66
67 Subject: =?EUC-KR?B?sNTAuLinKGxhemluZXNzKSwgwvzB9ri7seIoaW1w?=
68 =?EUC-KR?B?YXRpZW5jZSksILGzuLgoaHVicmlzKQ==?=
69
70 The fields basically decode to (sorry, I cannot approximate the non-
71 Latin multibyte characters with any 7 bit sequences):
72
73 Subject: ???(laziness), ????(impatience), ??(hubris)
74
76 decode_mimewords ENCODED, [OPTS...]
77 Function. Go through the string looking for RFC-1522-style "Q"
78 (quoted-printable, sort of) or "B" (base64) encoding, and decode
79 them.
80
81 In an array context, splits the ENCODED string into a list of
82 decoded "[DATA, CHARSET]" pairs, and returns that list. Unencoded
83 data are returned in a 1-element array "[DATA]", giving an
84 effective CHARSET of "undef".
85
86 $enc = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>';
87 foreach (decode_mimewords($enc)) {
88 print "", ($_[1] || 'US-ASCII'), ": ", $_[0], "\n";
89 }
90
91 ** However, adjacent encoded-words with same charset will be
92 concatenated to handle multibyte sequences safely.
93
94 * Whitespaces surrounding unencoded data will not be stripped so
95 that compatibility with MIME::Words will be ensured.
96
97 In a scalar context, joins the "data" elements of the above list
98 together, and returns that. Warning: this is information-lossy,
99 and probably not what you want, but if you know that all charsets
100 in the ENCODED string are identical, it might be useful to you.
101 (Before you use this, please see "unmime" in MIME::WordDecoder,
102 which is probably what you want.) ** See also "Charset" option
103 below.
104
105 In the event of a syntax error, $@ will be set to a description of
106 the error, but parsing will continue as best as possible (so as to
107 get something back when decoding headers). $@ will be false if no
108 error was detected.
109
110 * Malformed base64 encoded-words will be kept encoded. In this
111 case $@ will be set.
112
113 Any arguments past the ENCODED string are taken to define a hash of
114 options. ** When Unicode/multibyte support is disabled (see
115 "USE_ENCODE" in MIME::Charset), these options will not have any
116 effects.
117
118 Charset **
119 Name of character set by which data elements in scalar context
120 will be converted. The default is no conversion. If this
121 option is specified as special value "_UNICODE_", returned
122 value will be Unicode string.
123
124 Note: This feature is still information-lossy, except when
125 "_UNICODE_" is specified.
126
127 Detect7bit **
128 Try to detect 7-bit charset on unencoded portions. Default is
129 "YES".
130
131 Mapping **
132 In scalar context, specify mappings actually used for charset
133 names. "EXTENDED" uses extended mappings. "STANDARD" uses
134 standardized strict mappings. Default is "EXTENDED".
135
136 encode_mimeword RAW, [ENCODING], [CHARSET]
137 Function. Encode a single RAW "word" that has unsafe characters.
138 The "word" will be encoded in its entirety.
139
140 ### Encode "<<Franc,ois>>":
141 $encoded = encode_mimeword("\xABFran\xE7ois\xBB");
142
143 You may specify the ENCODING ("Q" or "B"), which defaults to "Q".
144 ** You may also specify it as ``special'' value: "S" to choose
145 shorter one of either "Q" or "B".
146
147 You may specify the CHARSET, which defaults to "iso-8859-1".
148
149 * Spaces will be escaped with ``_'' by "Q" encoding.
150
151 encode_mimewords RAW, [OPTS]
152 Function. Given a RAW string, try to find and encode all "unsafe"
153 sequences of characters:
154
155 ### Encode a string with some unsafe "words":
156 $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB");
157
158 Returns the encoded string.
159
160 ** RAW may be a Unicode string when Unicode/multibyte support is
161 enabled (see "USE_ENCODE" in MIME::Charset). Furthermore, RAW may
162 be a reference to that returned by "decode_mimewords" on array
163 context. In latter case "Charset" option (see below) will be
164 overridden (see also a note below).
165
166 Note: * When RAW is an arrayref, adjacent encoded-words (i.e.
167 elements having non-ASCII charset element) are concatenated. Then
168 they are splitted taking care of character boundaries of multibyte
169 sequences when Unicode/multibyte support is enabled. Portions for
170 unencoded data should include surrounding whitespace(s), or they
171 will be merged into adjoining encoded-word(s).
172
173 Any arguments past the RAW string are taken to define a hash of
174 options:
175
176 Charset
177 Encode all unsafe stuff with this charset. Default is
178 'ISO-8859-1', a.k.a. "Latin-1".
179
180 Detect7bit **
181 When "Encoding" option (see below) is specified as "a" and
182 "Charset" option is unknown, try to detect 7-bit charset on
183 given RAW string. Default is "YES". When Unicode/multibyte
184 support is disabled, this option will not have any effects (see
185 "USE_ENCODE" in MIME::Charset).
186
187 Encoding
188 The encoding to use, "q" or "b". ** You may also specify
189 ``special'' values: "a" will automatically choose recommended
190 encoding to use (with charset conversion if alternative charset
191 is recommended: see MIME::Charset); "s" will choose shorter one
192 of either "q" or "b". Note: * As of release 1.005, The default
193 was changed from "q" (the default on MIME::Words) to "a".
194
195 Field
196 Name of the mail field this string will be used in. ** Length
197 of mail field name will be considered in the first line of
198 encoded header.
199
200 Folding **
201 A Sequence to fold encoded lines. The default is "\n". If
202 empty string "" is specified, encoded-words exceeding line
203 length (see "MaxLineLen" below) will be splitted by SPACE.
204
205 Note: * Though RFC 2822 states that the lines are delimited by
206 CRLF ("\r\n"), this module chose LF ("\n") as a default to keep
207 backward compatibility. When you use the default, you might
208 need converting newlines before encoded headers are thrown into
209 session.
210
211 Mapping **
212 Specify mappings actually used for charset names. "EXTENDED"
213 uses extended mappings. "STANDARD" uses standardized strict
214 mappings. The default is "EXTENDED". When Unicode/multibyte
215 support is disabled, this option will not have any effects (see
216 "USE_ENCODE" in MIME::Charset).
217
218 MaxLineLen **
219 Maximum line length excluding newline. The default is 76.
220
221 Minimal **
222 Takes care of natural word separators (i.e. whitespaces) in the
223 text to be encoded. If "NO" is specified, this module will
224 encode whole text (if encoding needed) not regarding
225 whitespaces; encoded-words exceeding line length will be
226 splitted based only on their lengths. Default is "YES".
227
228 Note: As of release 0.040, default has been changed to "YES" to
229 ensure compatibility with MIME::Words. On earlier releases,
230 this option was fixed to be "NO".
231
232 Replacement **
233 See "Error Handling" in MIME::Charset.
234
235 Configuration Files **
236 Built-in defaults of option parameters for "decode_mimewords" (except
237 'Charset' option) and "encode_mimewords" can be overridden by
238 configuration files: MIME/Charset/Defaults.pm and
239 MIME/EncWords/Defaults.pm. For more details read
240 MIME/EncWords/Defaults.pm.sample.
241
243 Consult $VERSION variable.
244
245 Development versions of this module may be found at
246 http://hatuka.nezumi.nu/repos/MIME-EncWords/
247 <http://hatuka.nezumi.nu/repos/MIME-EncWords/>.
248
250 MIME::Charset, MIME::Tools
251
253 The original version of function decode_mimewords() is derived from
254 MIME::Words module that was written by:
255 Eryq (eryq@zeegee.com), ZeeGee Software Inc
256 (http://www.zeegee.com).
257 David F. Skoll (dfs@roaringpenguin.com)
258 http://www.roaringpenguin.com
259
260 Other stuff are rewritten or added by:
261 Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>.
262
263 All rights reserved. This program is free software; you can
264 redistribute it and/or modify it under the same terms as Perl itself.
265
266
267
268perl v5.12.0 2008-04-19 EncWords(3)