1Encode(3) User Contributed Perl Documentation Encode(3)
2
3
4
6 Encode - character encodings in Perl
7
9 use Encode qw(decode encode);
10 $characters = decode('UTF-8', $octets, Encode::FB_CROAK);
11 $octets = encode('UTF-8', $characters, Encode::FB_CROAK);
12
13 Table of Contents
14 Encode consists of a collection of modules whose details are too
15 extensive to fit in one document. This one itself explains the top-
16 level APIs and general topics at a glance. For other topics and more
17 details, see the documentation for these modules:
18
19 Encode::Alias - Alias definitions to encodings
20 Encode::Encoding - Encode Implementation Base Class
21 Encode::Supported - List of Supported Encodings
22 Encode::CN - Simplified Chinese Encodings
23 Encode::JP - Japanese Encodings
24 Encode::KR - Korean Encodings
25 Encode::TW - Traditional Chinese Encodings
26
28 The "Encode" module provides the interface between Perl strings and the
29 rest of the system. Perl strings are sequences of characters.
30
31 The repertoire of characters that Perl can represent is a superset of
32 those defined by the Unicode Consortium. On most platforms the ordinal
33 values of a character as returned by "ord(S)" is the Unicode codepoint
34 for that character. The exceptions are platforms where the legacy
35 encoding is some variant of EBCDIC rather than a superset of ASCII; see
36 perlebcdic.
37
38 During recent history, data is moved around a computer in 8-bit chunks,
39 often called "bytes" but also known as "octets" in standards documents.
40 Perl is widely used to manipulate data of many types: not only strings
41 of characters representing human or computer languages, but also
42 "binary" data, being the machine's representation of numbers, pixels in
43 an image, or just about anything.
44
45 When Perl is processing "binary data", the programmer wants Perl to
46 process "sequences of bytes". This is not a problem for Perl: because a
47 byte has 256 possible values, it easily fits in Perl's much larger
48 "logical character".
49
50 This document mostly explains the how. perlunitut and perlunifaq
51 explain the why.
52
53 TERMINOLOGY
54 character
55
56 A character in the range 0 .. 2**32-1 (or more); what Perl's strings
57 are made of.
58
59 byte
60
61 A character in the range 0..255; a special case of a Perl character.
62
63 octet
64
65 8 bits of data, with ordinal values 0..255; term for bytes passed to or
66 from a non-Perl context, such as a disk file, standard I/O stream,
67 database, command-line argument, environment variable, socket etc.
68
70 Basic methods
71 encode
72
73 $octets = encode(ENCODING, STRING[, CHECK])
74
75 Encodes the scalar value STRING from Perl's internal form into ENCODING
76 and returns a sequence of octets. ENCODING can be either a canonical
77 name or an alias. For encoding names and aliases, see "Defining
78 Aliases". For CHECK, see "Handling Malformed Data".
79
80 CAVEAT: the input scalar STRING might be modified in-place depending on
81 what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be
82 left unchanged.
83
84 For example, to convert a string from Perl's internal format into
85 ISO-8859-1, also known as Latin1:
86
87 $octets = encode("iso-8859-1", $string);
88
89 CAVEAT: When you run "$octets = encode("UTF-8", $string)", then $octets
90 might not be equal to $string. Though both contain the same data, the
91 UTF8 flag for $octets is always off. When you encode anything, the
92 UTF8 flag on the result is always off, even when it contains a
93 completely valid UTF-8 string. See "The UTF8 flag" below.
94
95 If the $string is "undef", then "undef" is returned.
96
97 "str2bytes" may be used as an alias for "encode".
98
99 decode
100
101 $string = decode(ENCODING, OCTETS[, CHECK])
102
103 This function returns the string that results from decoding the scalar
104 value OCTETS, assumed to be a sequence of octets in ENCODING, into
105 Perl's internal form. As with encode(), ENCODING can be either a
106 canonical name or an alias. For encoding names and aliases, see
107 "Defining Aliases"; for CHECK, see "Handling Malformed Data".
108
109 CAVEAT: the input scalar OCTETS might be modified in-place depending on
110 what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be
111 left unchanged.
112
113 For example, to convert ISO-8859-1 data into a string in Perl's
114 internal format:
115
116 $string = decode("iso-8859-1", $octets);
117
118 CAVEAT: When you run "$string = decode("UTF-8", $octets)", then $string
119 might not be equal to $octets. Though both contain the same data, the
120 UTF8 flag for $string is on. See "The UTF8 flag" below.
121
122 If the $string is "undef", then "undef" is returned.
123
124 "bytes2str" may be used as an alias for "decode".
125
126 find_encoding
127
128 [$obj =] find_encoding(ENCODING)
129
130 Returns the encoding object corresponding to ENCODING. Returns "undef"
131 if no matching ENCODING is find. The returned object is what does the
132 actual encoding or decoding.
133
134 $string = decode($name, $bytes);
135
136 is in fact
137
138 $string = do {
139 $obj = find_encoding($name);
140 croak qq(encoding "$name" not found) unless ref $obj;
141 $obj->decode($bytes);
142 };
143
144 with more error checking.
145
146 You can therefore save time by reusing this object as follows;
147
148 my $enc = find_encoding("iso-8859-1");
149 while(<>) {
150 my $string = $enc->decode($_);
151 ... # now do something with $string;
152 }
153
154 Besides "decode" and "encode", other methods are available as well.
155 For instance, "name()" returns the canonical name of the encoding
156 object.
157
158 find_encoding("latin1")->name; # iso-8859-1
159
160 See Encode::Encoding for details.
161
162 find_mime_encoding
163
164 [$obj =] find_mime_encoding(MIME_ENCODING)
165
166 Returns the encoding object corresponding to MIME_ENCODING. Acts same
167 as "find_encoding()" but "mime_name()" of returned object must match to
168 MIME_ENCODING. So as opposite of "find_encoding()" canonical names and
169 aliases are not used when searching for object.
170
171 find_mime_encoding("utf8"); # returns undef because "utf8" is not valid I<MIME_ENCODING>
172 find_mime_encoding("utf-8"); # returns encode object "utf-8-strict"
173 find_mime_encoding("UTF-8"); # same as "utf-8" because I<MIME_ENCODING> is case insensitive
174 find_mime_encoding("utf-8-strict"); returns undef because "utf-8-strict" is not valid I<MIME_ENCODING>
175
176 from_to
177
178 [$length =] from_to($octets, FROM_ENC, TO_ENC [, CHECK])
179
180 Converts in-place data between two encodings. The data in $octets must
181 be encoded as octets and not as characters in Perl's internal format.
182 For example, to convert ISO-8859-1 data into Microsoft's CP1250
183 encoding:
184
185 from_to($octets, "iso-8859-1", "cp1250");
186
187 and to convert it back:
188
189 from_to($octets, "cp1250", "iso-8859-1");
190
191 Because the conversion happens in place, the data to be converted
192 cannot be a string constant: it must be a scalar variable.
193
194 "from_to()" returns the length of the converted string in octets on
195 success, and "undef" on error.
196
197 CAVEAT: The following operations may look the same, but are not:
198
199 from_to($data, "iso-8859-1", "UTF-8"); #1
200 $data = decode("iso-8859-1", $data); #2
201
202 Both #1 and #2 make $data consist of a completely valid UTF-8 string,
203 but only #2 turns the UTF8 flag on. #1 is equivalent to:
204
205 $data = encode("UTF-8", decode("iso-8859-1", $data));
206
207 See "The UTF8 flag" below.
208
209 Also note that:
210
211 from_to($octets, $from, $to, $check);
212
213 is equivalent to:
214
215 $octets = encode($to, decode($from, $octets), $check);
216
217 Yes, it does not respect the $check during decoding. It is
218 deliberately done that way. If you need minute control, use "decode"
219 followed by "encode" as follows:
220
221 $octets = encode($to, decode($from, $octets, $check_from), $check_to);
222
223 encode_utf8
224
225 $octets = encode_utf8($string);
226
227 Equivalent to "$octets = encode("utf8", $string)". The characters in
228 $string are encoded in Perl's internal format, and the result is
229 returned as a sequence of octets. Because all possible characters in
230 Perl have a (loose, not strict) utf8 representation, this function
231 cannot fail.
232
233 WARNING: do not use this function for data exchange as it can produce
234 not strict utf8 $octets! For strictly valid UTF-8 output use "$octets =
235 encode("UTF-8", $string)".
236
237 decode_utf8
238
239 $string = decode_utf8($octets [, CHECK]);
240
241 Equivalent to "$string = decode("utf8", $octets [, CHECK])". The
242 sequence of octets represented by $octets is decoded from (loose, not
243 strict) utf8 into a sequence of logical characters. Because not all
244 sequences of octets are valid not strict utf8, it is quite possible for
245 this function to fail. For CHECK, see "Handling Malformed Data".
246
247 WARNING: do not use this function for data exchange as it can produce
248 $string with not strict utf8 representation! For strictly valid UTF-8
249 $string representation use "$string = decode("UTF-8", $octets [,
250 CHECK])".
251
252 CAVEAT: the input $octets might be modified in-place depending on what
253 is set in CHECK. See "LEAVE_SRC" if you want your inputs to be left
254 unchanged.
255
256 Listing available encodings
257 use Encode;
258 @list = Encode->encodings();
259
260 Returns a list of canonical names of available encodings that have
261 already been loaded. To get a list of all available encodings
262 including those that have not yet been loaded, say:
263
264 @all_encodings = Encode->encodings(":all");
265
266 Or you can give the name of a specific module:
267
268 @with_jp = Encode->encodings("Encode::JP");
269
270 When ""::"" is not in the name, ""Encode::"" is assumed.
271
272 @ebcdic = Encode->encodings("EBCDIC");
273
274 To find out in detail which encodings are supported by this package,
275 see Encode::Supported.
276
277 Defining Aliases
278 To add a new alias to a given encoding, use:
279
280 use Encode;
281 use Encode::Alias;
282 define_alias(NEWNAME => ENCODING);
283
284 After that, NEWNAME can be used as an alias for ENCODING. ENCODING may
285 be either the name of an encoding or an encoding object.
286
287 Before you do that, first make sure the alias is nonexistent using
288 "resolve_alias()", which returns the canonical name thereof. For
289 example:
290
291 Encode::resolve_alias("latin1") eq "iso-8859-1" # true
292 Encode::resolve_alias("iso-8859-12") # false; nonexistent
293 Encode::resolve_alias($name) eq $name # true if $name is canonical
294
295 "resolve_alias()" does not need "use Encode::Alias"; it can be imported
296 via "use Encode qw(resolve_alias)".
297
298 See Encode::Alias for details.
299
300 Finding IANA Character Set Registry names
301 The canonical name of a given encoding does not necessarily agree with
302 IANA Character Set Registry, commonly seen as "Content-Type:
303 text/plain; charset=WHATEVER". For most cases, the canonical name
304 works, but sometimes it does not, most notably with "utf-8-strict".
305
306 As of "Encode" version 2.21, a new method "mime_name()" is therefore
307 added.
308
309 use Encode;
310 my $enc = find_encoding("UTF-8");
311 warn $enc->name; # utf-8-strict
312 warn $enc->mime_name; # UTF-8
313
314 See also: Encode::Encoding
315
317 If your perl supports "PerlIO" (which is the default), you can use a
318 "PerlIO" layer to decode and encode directly via a filehandle. The
319 following two examples are fully identical in functionality:
320
321 ### Version 1 via PerlIO
322 open(INPUT, "< :encoding(shiftjis)", $infile)
323 || die "Can't open < $infile for reading: $!";
324 open(OUTPUT, "> :encoding(euc-jp)", $outfile)
325 || die "Can't open > $output for writing: $!";
326 while (<INPUT>) { # auto decodes $_
327 print OUTPUT; # auto encodes $_
328 }
329 close(INPUT) || die "can't close $infile: $!";
330 close(OUTPUT) || die "can't close $outfile: $!";
331
332 ### Version 2 via from_to()
333 open(INPUT, "< :raw", $infile)
334 || die "Can't open < $infile for reading: $!";
335 open(OUTPUT, "> :raw", $outfile)
336 || die "Can't open > $output for writing: $!";
337
338 while (<INPUT>) {
339 from_to($_, "shiftjis", "euc-jp", 1); # switch encoding
340 print OUTPUT; # emit raw (but properly encoded) data
341 }
342 close(INPUT) || die "can't close $infile: $!";
343 close(OUTPUT) || die "can't close $outfile: $!";
344
345 In the first version above, you let the appropriate encoding layer
346 handle the conversion. In the second, you explicitly translate from
347 one encoding to the other.
348
349 Unfortunately, it may be that encodings are not "PerlIO"-savvy. You
350 can check to see whether your encoding is supported by "PerlIO" by
351 invoking the "perlio_ok" method on it:
352
353 Encode::perlio_ok("hz"); # false
354 find_encoding("euc-cn")->perlio_ok; # true wherever PerlIO is available
355
356 use Encode qw(perlio_ok); # imported upon request
357 perlio_ok("euc-jp")
358
359 Fortunately, all encodings that come with "Encode" core are
360 "PerlIO"-savvy except for "hz" and "ISO-2022-kr". For the gory
361 details, see Encode::Encoding and Encode::PerlIO.
362
364 The optional CHECK argument tells "Encode" what to do when encountering
365 malformed data. Without CHECK, "Encode::FB_DEFAULT" (== 0) is assumed.
366
367 As of version 2.12, "Encode" supports coderef values for "CHECK"; see
368 below.
369
370 NOTE: Not all encodings support this feature. Some encodings ignore
371 the CHECK argument. For example, Encode::Unicode ignores CHECK and it
372 always croaks on error.
373
374 List of CHECK values
375 FB_DEFAULT
376
377 I<CHECK> = Encode::FB_DEFAULT ( == 0)
378
379 If CHECK is 0, encoding and decoding replace any malformed character
380 with a substitution character. When you encode, SUBCHAR is used. When
381 you decode, the Unicode REPLACEMENT CHARACTER, code point U+FFFD, is
382 used. If the data is supposed to be UTF-8, an optional lexical warning
383 of warning category "utf8" is given.
384
385 FB_CROAK
386
387 I<CHECK> = Encode::FB_CROAK ( == 1)
388
389 If CHECK is 1, methods immediately die with an error message.
390 Therefore, when CHECK is 1, you should trap exceptions with "eval{}",
391 unless you really want to let it "die".
392
393 FB_QUIET
394
395 I<CHECK> = Encode::FB_QUIET
396
397 If CHECK is set to "Encode::FB_QUIET", encoding and decoding
398 immediately return the portion of the data that has been processed so
399 far when an error occurs. The data argument is overwritten with
400 everything after that point; that is, the unprocessed portion of the
401 data. This is handy when you have to call "decode" repeatedly in the
402 case where your source data may contain partial multi-byte character
403 sequences, (that is, you are reading with a fixed-width buffer). Here's
404 some sample code to do exactly that:
405
406 my($buffer, $string) = ("", "");
407 while (read($fh, $buffer, 256, length($buffer))) {
408 $string .= decode($encoding, $buffer, Encode::FB_QUIET);
409 # $buffer now contains the unprocessed partial character
410 }
411
412 FB_WARN
413
414 I<CHECK> = Encode::FB_WARN
415
416 This is the same as "FB_QUIET" above, except that instead of being
417 silent on errors, it issues a warning. This is handy for when you are
418 debugging.
419
420 CAVEAT: All warnings from Encode module are reported, independently of
421 pragma warnings settings. If you want to follow settings of lexical
422 warnings configured by pragma warnings then append also check value
423 "ENCODE::ONLY_PRAGMA_WARNINGS". This value is available since Encode
424 version 2.99.
425
426 FB_PERLQQ FB_HTMLCREF FB_XMLCREF
427
428 perlqq mode (CHECK = Encode::FB_PERLQQ)
429 HTML charref mode (CHECK = Encode::FB_HTMLCREF)
430 XML charref mode (CHECK = Encode::FB_XMLCREF)
431
432 For encodings that are implemented by the "Encode::XS" module, "CHECK"
433 "==" "Encode::FB_PERLQQ" puts "encode" and "decode" into "perlqq"
434 fallback mode.
435
436 When you decode, "\xHH" is inserted for a malformed character, where HH
437 is the hex representation of the octet that could not be decoded to
438 utf8. When you encode, "\x{HHHH}" will be inserted, where HHHH is the
439 Unicode code point (in any number of hex digits) of the character that
440 cannot be found in the character repertoire of the encoding.
441
442 The HTML/XML character reference modes are about the same. In place of
443 "\x{HHHH}", HTML uses "&#NNN;" where NNN is a decimal number, and XML
444 uses "&#xHHHH;" where HHHH is the hexadecimal number.
445
446 In "Encode" 2.10 or later, "LEAVE_SRC" is also implied.
447
448 The bitmask
449
450 These modes are all actually set via a bitmask. Here is how the
451 "FB_XXX" constants are laid out. You can import the "FB_XXX" constants
452 via "use Encode qw(:fallbacks)", and you can import the generic bitmask
453 constants via "use Encode qw(:fallback_all)".
454
455 FB_DEFAULT FB_CROAK FB_QUIET FB_WARN FB_PERLQQ
456 DIE_ON_ERR 0x0001 X
457 WARN_ON_ERR 0x0002 X
458 RETURN_ON_ERR 0x0004 X X
459 LEAVE_SRC 0x0008 X
460 PERLQQ 0x0100 X
461 HTMLCREF 0x0200
462 XMLCREF 0x0400
463
464 LEAVE_SRC
465
466 Encode::LEAVE_SRC
467
468 If the "Encode::LEAVE_SRC" bit is not set but CHECK is set, then the
469 source string to encode() or decode() will be overwritten in place. If
470 you're not interested in this, then bitwise-OR it with the bitmask.
471
472 coderef for CHECK
473 As of "Encode" 2.12, "CHECK" can also be a code reference which takes
474 the ordinal value of the unmapped character as an argument and returns
475 octets that represent the fallback character. For instance:
476
477 $ascii = encode("ascii", $utf8, sub{ sprintf "<U+%04X>", shift });
478
479 Acts like "FB_PERLQQ" but U+XXXX is used instead of "\x{XXXX}".
480
481 Fallback for "decode" must return decoded string (sequence of
482 characters) and takes a list of ordinal values as its arguments. So for
483 example if you wish to decode octets as UTF-8, and use ISO-8859-15 as a
484 fallback for bytes that are not valid UTF-8, you could write
485
486 $str = decode 'UTF-8', $octets, sub {
487 my $tmp = join '', map chr, @_;
488 return decode 'ISO-8859-15', $tmp;
489 };
490
492 To define a new encoding, use:
493
494 use Encode qw(define_encoding);
495 define_encoding($object, CANONICAL_NAME [, alias...]);
496
497 CANONICAL_NAME will be associated with $object. The object should
498 provide the interface described in Encode::Encoding. If more than two
499 arguments are provided, additional arguments are considered aliases for
500 $object.
501
502 See Encode::Encoding for details.
503
505 Before the introduction of Unicode support in Perl, The "eq" operator
506 just compared the strings represented by two scalars. Beginning with
507 Perl 5.8, "eq" compares two strings with simultaneous consideration of
508 the UTF8 flag. To explain why we made it so, I quote from page 402 of
509 Programming Perl, 3rd ed.
510
511 Goal #1:
512 Old byte-oriented programs should not spontaneously break on the old
513 byte-oriented data they used to work on.
514
515 Goal #2:
516 Old byte-oriented programs should magically start working on the new
517 character-oriented data when appropriate.
518
519 Goal #3:
520 Programs should run just as fast in the new character-oriented mode
521 as in the old byte-oriented mode.
522
523 Goal #4:
524 Perl should remain one language, rather than forking into a byte-
525 oriented Perl and a character-oriented Perl.
526
527 When Programming Perl, 3rd ed. was written, not even Perl 5.6.0 had
528 been born yet, many features documented in the book remained
529 unimplemented for a long time. Perl 5.8 corrected much of this, and
530 the introduction of the UTF8 flag is one of them. You can think of
531 there being two fundamentally different kinds of strings and string-
532 operations in Perl: one a byte-oriented mode for when the internal
533 UTF8 flag is off, and the other a character-oriented mode for when the
534 internal UTF8 flag is on.
535
536 This UTF8 flag is not visible in Perl scripts, exactly for the same
537 reason you cannot (or rather, you don't have to) see whether a scalar
538 contains a string, an integer, or a floating-point number. But you
539 can still peek and poke these if you will. See the next section.
540
541 Messing with Perl's Internals
542 The following API uses parts of Perl's internals in the current
543 implementation. As such, they are efficient but may change in a future
544 release.
545
546 is_utf8
547
548 is_utf8(STRING [, CHECK])
549
550 [INTERNAL] Tests whether the UTF8 flag is turned on in the STRING. If
551 CHECK is true, also checks whether STRING contains well-formed UTF-8.
552 Returns true if successful, false otherwise.
553
554 Typically only necessary for debugging and testing. Don't use this
555 flag as a marker to distinguish character and binary data, that should
556 be decided for each variable when you write your code.
557
558 CAVEAT: If STRING has UTF8 flag set, it does NOT mean that STRING is
559 UTF-8 encoded and vice-versa.
560
561 As of Perl 5.8.1, utf8 also has the "utf8::is_utf8" function.
562
563 _utf8_on
564
565 _utf8_on(STRING)
566
567 [INTERNAL] Turns the STRING's internal UTF8 flag on. The STRING is not
568 checked for containing only well-formed UTF-8. Do not use this unless
569 you know with absolute certainty that the STRING holds only well-formed
570 UTF-8. Returns the previous state of the UTF8 flag (so please don't
571 treat the return value as indicating success or failure), or "undef" if
572 STRING is not a string.
573
574 NOTE: For security reasons, this function does not work on tainted
575 values.
576
577 _utf8_off
578
579 _utf8_off(STRING)
580
581 [INTERNAL] Turns the STRING's internal UTF8 flag off. Do not use
582 frivolously. Returns the previous state of the UTF8 flag, or "undef"
583 if STRING is not a string. Do not treat the return value as indicative
584 of success or failure, because that isn't what it means: it is only the
585 previous setting.
586
587 NOTE: For security reasons, this function does not work on tainted
588 values.
589
591 ....We now view strings not as sequences of bytes, but as sequences
592 of numbers in the range 0 .. 2**32-1 (or in the case of 64-bit
593 computers, 0 .. 2**64-1) -- Programming Perl, 3rd ed.
594
595 That has historically been Perl's notion of UTF-8, as that is how UTF-8
596 was first conceived by Ken Thompson when he invented it. However,
597 thanks to later revisions to the applicable standards, official UTF-8
598 is now rather stricter than that. For example, its range is much
599 narrower (0 .. 0x10_FFFF to cover only 21 bits instead of 32 or 64
600 bits) and some sequences are not allowed, like those used in surrogate
601 pairs, the 31 non-character code points 0xFDD0 .. 0xFDEF, the last two
602 code points in any plane (0xXX_FFFE and 0xXX_FFFF), all non-shortest
603 encodings, etc.
604
605 The former default in which Perl would always use a loose
606 interpretation of UTF-8 has now been overruled:
607
608 From: Larry Wall <larry@wall.org>
609 Date: December 04, 2004 11:51:58 JST
610 To: perl-unicode@perl.org
611 Subject: Re: Make Encode.pm support the real UTF-8
612 Message-Id: <20041204025158.GA28754@wall.org>
613
614 On Fri, Dec 03, 2004 at 10:12:12PM +0000, Tim Bunce wrote:
615 : I've no problem with 'utf8' being perl's unrestricted uft8 encoding,
616 : but "UTF-8" is the name of the standard and should give the
617 : corresponding behaviour.
618
619 For what it's worth, that's how I've always kept them straight in my
620 head.
621
622 Also for what it's worth, Perl 6 will mostly default to strict but
623 make it easy to switch back to lax.
624
625 Larry
626
627 Got that? As of Perl 5.8.7, "UTF-8" means UTF-8 in its current sense,
628 which is conservative and strict and security-conscious, whereas "utf8"
629 means UTF-8 in its former sense, which was liberal and loose and lax.
630 "Encode" version 2.10 or later thus groks this subtle but critically
631 important distinction between "UTF-8" and "utf8".
632
633 encode("utf8", "\x{FFFF_FFFF}", 1); # okay
634 encode("UTF-8", "\x{FFFF_FFFF}", 1); # croaks
635
636 In the "Encode" module, "UTF-8" is actually a canonical name for
637 "utf-8-strict". That hyphen between the "UTF" and the "8" is critical;
638 without it, "Encode" goes "liberal" and (perhaps overly-)permissive:
639
640 find_encoding("UTF-8")->name # is 'utf-8-strict'
641 find_encoding("utf-8")->name # ditto. names are case insensitive
642 find_encoding("utf_8")->name # ditto. "_" are treated as "-"
643 find_encoding("UTF8")->name # is 'utf8'.
644
645 Perl's internal UTF8 flag is called "UTF8", without a hyphen. It
646 indicates whether a string is internally encoded as "utf8", also
647 without a hyphen.
648
650 Encode::Encoding, Encode::Supported, Encode::PerlIO, encoding,
651 perlebcdic, "open" in perlfunc, perlunicode, perluniintro, perlunifaq,
652 perlunitut utf8, the Perl Unicode Mailing List
653 <http://lists.perl.org/list/perl-unicode.html>
654
656 This project was originated by the late Nick Ing-Simmons and later
657 maintained by Dan Kogai <dankogai@cpan.org>. See AUTHORS for a full
658 list of people involved. For any questions, send mail to
659 <perl-unicode@perl.org> so that we can all share.
660
661 While Dan Kogai retains the copyright as a maintainer, credit should go
662 to all those involved. See AUTHORS for a list of those who submitted
663 code to the project.
664
666 Copyright 2002-2014 Dan Kogai <dankogai@cpan.org>.
667
668 This library is free software; you can redistribute it and/or modify it
669 under the same terms as Perl itself.
670
671
672
673perl v5.32.1 2021-01-27 Encode(3)