1utf8::all(3) User Contributed Perl Documentation utf8::all(3)
2
3
4
6 utf8::all - turn on Unicode - all of it
7
9 version 0.024
10
12 use utf8::all; # Turn on UTF-8, all of it.
13
14 open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here
15 print length 'føø bār'; # 7 UTF-8 characters
16 my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
17
19 The "use utf8" pragma tells the Perl parser to allow UTF-8 in the
20 program text in the current lexical scope. This also means that you can
21 now use literal Unicode characters as part of strings, variable names,
22 and regular expressions.
23
24 "utf8::all" goes further:
25
26 • "charnames" are imported so "\N{...}" sequences can be used to
27 compile Unicode characters based on names.
28
29 • On Perl "v5.11.0" or higher, the "use feature 'unicode_strings'" is
30 enabled.
31
32 • "use feature fc" and "use feature unicode_eval" are enabled on Perl
33 5.16.0 and higher.
34
35 • Filehandles are opened with UTF-8 encoding turned on by default
36 (including "STDIN", "STDOUT", and "STDERR" when "utf8::all" is used
37 from the "main" package). Meaning that they automatically convert
38 UTF-8 octets to characters and vice versa. If you don't want UTF-8
39 for a particular filehandle, you'll have to set "binmode
40 $filehandle".
41
42 • @ARGV gets converted from UTF-8 octets to Unicode characters (when
43 "utf8::all" is used from the "main" package). This is similar to
44 the behaviour of the "-CA" perl command-line switch (see perlrun).
45
46 • "readdir", "readlink", "readpipe" (including the "qx//" and
47 backtick operators), and "glob" (including the "<>" operator) now
48 all work with and return Unicode characters instead of (UTF-8)
49 octets (again only when "utf8::all" is used from the "main"
50 package).
51
52 Lexical Scope
53 The pragma is lexically-scoped, so you can do the following if you had
54 some reason to:
55
56 {
57 use utf8::all;
58 open my $out, '>', 'outfile';
59 my $utf8_str = 'føø bār';
60 print length $utf8_str, "\n"; # 7
61 print $out $utf8_str; # out as utf8
62 }
63 open my $in, '<', 'outfile'; # in as raw
64 my $text = do { local $/; <$in>};
65 print length $text, "\n"; # 10, not 7!
66
67 Instead of lexical scoping, you can also use "no utf8::all" to turn off
68 the effects.
69
70 Note that the effect on @ARGV and the "STDIN", "STDOUT", and "STDERR"
71 file handles is always global and can not be undone!
72
73 Enabling/Disabling Global Features
74 As described above, the default behaviour of "utf8::all" is to convert
75 @ARGV and to open the "STDIN", "STDOUT", and "STDERR" file handles with
76 UTF-8 encoding, and override the "readlink" and "readdir" functions and
77 "glob" operators when "utf8::all" is used from the "main" package.
78
79 If you want to disable these features even when "utf8::all" is used
80 from the "main" package, add the option "NO-GLOBAL" (or "LEXICAL-ONLY")
81 to the use line. E.g.:
82
83 use utf8::all 'NO-GLOBAL';
84
85 If on the other hand you want to enable these global effects even when
86 "utf8::all" was used from another package than "main", use the option
87 "GLOBAL" on the use line:
88
89 use utf8::all 'GLOBAL';
90
91 UTF-8 Errors
92 "utf8::all" will handle invalid code points (i.e., utf-8 that does not
93 map to a valid unicode "character"), as a fatal error.
94
95 For "glob", "readdir", and "readlink", one can change this behaviour by
96 setting the attribute "$utf8::all::UTF8_CHECK".
97
99 $utf8::all::UTF8_CHECK
100 By default "utf8::all" marks decoding errors as fatal (default value
101 for this setting is "Encode::FB_CROAK"). If you want, you can change
102 this by setting $utf8::all::UTF8_CHECK. The value "Encode::FB_WARN"
103 reports the encoding errors as warnings, and "Encode::FB_DEFAULT" will
104 completely ignore them. Please see Encode for details. Note:
105 "Encode::LEAVE_SRC" is always enforced.
106
107 Important: Only controls the handling of decoding errors in "glob",
108 "readdir", and "readlink".
109
111 If you use autodie, which is a great idea, you need to use at least
112 version 2.12, released on June 26, 2012
113 <https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise,
114 autodie obliterates the IO layers set by the open pragma. See RT #54777
115 <https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7
116 <https://github.com/doherty/utf8-all/issues/7>.
117
119 Please report any bugs or feature requests on the bugtracker website
120 <https://github.com/doherty/utf8-all/issues>.
121
122 When submitting a bug or request, please include a test-file or a patch
123 to an existing test-file that illustrates the bug or desired feature.
124
126 The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8.
127 The "readlink" and "readdir" functions and "glob" operators will
128 therefore not be replaced on these systems.
129
131 • File::Find::utf8 for fully utf-8 aware File::Find functions.
132
133 • Cwd::utf8 for fully utf-8 aware Cwd functions.
134
136 • Michael Schwern <mschwern@cpan.org>
137
138 • Mike Doherty <doherty@cpan.org>
139
140 • Hayo Baan <info@hayobaan.com>
141
143 This software is copyright (c) 2009 by Michael Schwern
144 <mschwern@cpan.org>; he originated it.
145
146 This is free software; you can redistribute it and/or modify it under
147 the same terms as the Perl 5 programming language system itself.
148
149
150
151perl v5.36.0 2022-07-22 utf8::all(3)