1Catalyst::UTF8(3)     User Contributed Perl Documentation    Catalyst::UTF8(3)
2
3
4

Name

6       Catalyst::UTF8 - All About UTF8 and Catalyst Encoding
7

Description

9       Starting in 5.90080 Catalyst will enable UTF8 encoding by default for
10       text like body responses.  In addition we've made a ton of fixes around
11       encoding and utf8 scattered throughout the codebase.  This document
12       attempts to give an overview of the assumptions and practices that
13       Catalyst uses when dealing with UTF8 and encoding issues.  You should
14       also review the Changes file, Catalyst::Delta and Catalyst::Upgrading
15       for more.
16
17       We attempt to describe all relevant processes, try to give some advice
18       and explain where we may have been exceptional to respect our
19       commitment to backwards compatibility.
20

UTF8 in Controller Actions

22       Using UTF8 characters in your Controller classes and actions.
23
24   Summary
25       In this section we will review changes to how UTF8 characters can be
26       used in controller actions, how it looks in the debugging screens (and
27       your logs) as well as how you construct URL objects to actions with
28       UTF8 paths (or using UTF8 args or captures).
29
30   Unicode in Controllers and URLs
31           package MyApp::Controller::Root;
32
33           use utf8;
34           use base 'Catalyst::Controller';
35
36           sub heart_with_arg :Path('♥') Args(1)  {
37             my ($self, $c, $arg) = @_;
38           }
39
40           sub base :Chained('/') CaptureArgs(0) {
41             my ($self, $c) = @_;
42           }
43
44             sub capture :Chained('base') PathPart('♥') CaptureArgs(1) {
45               my ($self, $c, $capture) = @_;
46             }
47
48               sub arg :Chained('capture') PathPart('♥') Args(1) {
49                 my ($self, $c, $arg) = @_;
50               }
51
52   Discussion
53       In the example controller above we have constructed two matchable URL
54       routes:
55
56           http://localhost/root/♥/{arg}
57           http://localhost/base/♥/{capture}/♥/{arg}
58
59       The first one is a classic Path type action and the second uses
60       Chaining, and spans three actions in total.  As you can see, you can
61       use unicode characters in your Path and PathPart attributes (remember
62       to use the "utf8" pragma to allow these multibyte characters in your
63       source).  The two constructed matchable routes would match the
64       following incoming URLs:
65
66           (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg}
67           (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg}
68
69       That path path "%E2%99%A5" is url encoded unicode (assuming you are
70       hitting this with a reasonably modern browser).  Its basically what
71       goes over HTTP when your type a browser location that has the unicode
72       'heart' in it.  However we will use the unicode symbol in your
73       debugging messages:
74
75           [debug] Loaded Path actions:
76           .-------------------------------------+--------------------------------------.
77           | Path                                | Private                              |
78           +-------------------------------------+--------------------------------------+
79           | /root/♥/*                          | /root/heart_with_arg                  |
80           '-------------------------------------+--------------------------------------'
81
82           [debug] Loaded Chained actions:
83           .-------------------------------------+--------------------------------------.
84           | Path Spec                           | Private                              |
85           +-------------------------------------+--------------------------------------+
86           | /base/♥/*/♥/*                       | /root/base (0)                       |
87           |                                     | -> /root/capture (1)                 |
88           |                                     | => /root/arg                         |
89           '-------------------------------------+--------------------------------------'
90
91       And if the requested URL uses unicode characters in your captures or
92       args (such as "http://localhost:/base/♥/♥/♥/♥") you should see the
93       arguments and captures as their unicode characters as well:
94
95           [debug] Arguments are "♥"
96           [debug] "GET" request for "base/♥/♥/♥/♥" from "127.0.0.1"
97           .------------------------------------------------------------+-----------.
98           | Action                                                     | Time      |
99           +------------------------------------------------------------+-----------+
100           | /root/base                                                 | 0.000080s |
101           | /root/capture                                              | 0.000075s |
102           | /root/arg                                                  | 0.000755s |
103           '------------------------------------------------------------+-----------'
104
105       Again, remember that we are display the unicode character and using it
106       to match actions containing such multibyte characters BUT over HTTP you
107       are getting these as URL encoded bytes.  For example if you looked at
108       the PSGI $env value for "REQUEST_URI" you would see (for the above
109       request)
110
111           REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5"
112
113       So on the incoming request we decode so that we can match and display
114       unicode characters (after decoding the URL encoding).  This makes it
115       straightforward to use these types of multibyte characters in your
116       actions and see them incoming in captures and arguments.  Please keep
117       this in might if you are doing for example regular expression matching,
118       length determination or other string comparisons, you will need to try
119       these incoming variables as though UTF8 strings.  For example in the
120       following action:
121
122               sub arg :Chained('capture') PathPart('♥') Args(1) {
123                 my ($self, $c, $arg) = @_;
124               }
125
126       when $arg is "♥" you should expect "length($arg)" to be 1 since it is
127       indeed one character although it will take more than one byte to store.
128
129   UTF8 in constructing URLs via $c->uri_for
130       For the reverse (constructing meaningful URLs to actions that contain
131       multibyte characters in their paths or path parts, or when you want to
132       include such characters in your captures or arguments) Catalyst will do
133       the right thing (again just remember to use the "utf8" pragma).
134
135           use utf8;
136           my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['♥','♥']);
137
138       When you stringify this object (for use in a template, for example) it
139       will automatically do the right thing regarding utf8 encoding and url
140       encoding.
141
142           http://localhost/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5
143
144       Since again what you want is a properly url encoded version of this.
145       In this case your string length will reflect URL encoded bytes, not the
146       character length.  Ultimately what you want to send over the wire via
147       HTTP needs to be bytes.
148

UTF8 in GET Query and Form POST

150       What Catalyst does with UTF8 in your GET and classic HTML Form POST
151
152   UTF8 in URL query and keywords
153       The same rules that we find in URL paths also cover URL query parts.
154       That is if one types a URL like this into the browser
155
156           http://localhost/example?♥=♥♥
157
158       When this goes 'over the wire' to your application server its going to
159       be as percent encoded bytes:
160
161           http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
162
163       When Catalyst encounters this we decode the percent encoding and the
164       utf8 so that we can properly display this information (such as in the
165       debugging logs or in a response.)
166
167           [debug] Query Parameters are:
168           .-------------------------------------+--------------------------------------.
169           | Parameter                           | Value                                |
170           +-------------------------------------+--------------------------------------+
171           | ♥                                   | ♥♥                                   |
172           '-------------------------------------+--------------------------------------'
173
174       All the values and keys that are part of $c->req->query_parameters will
175       be utf8 decoded.  So you should not need to do anything special to take
176       those values/keys and send them to the body response (since as we will
177       see later Catalyst will do all the necessary encoding for you).
178
179       Again, remember that values of your parameters are now decode into
180       Unicode strings.  so for example you'd expect the result of length to
181       reflect the character length not the byte length.
182
183       Just like with arguments and captures, you can use utf8 literals (or
184       utf8 strings) in $c->uri_for:
185
186           use utf8;
187           my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'♥' => '♥♥'});
188
189       When you stringify this object (for use in a template, for example) it
190       will automatically do the right thing regarding utf8 encoding and url
191       encoding.
192
193           http://localhost/example?%E2%99%A5=%E2%99%A5%E2%99%A5
194
195       Since again what you want is a properly url encoded version of this.
196       Ultimately what you want to send over the wire via HTTP needs to be
197       bytes (not unicode characters).
198
199       Remember if you use any utf8 literals in your source code, you should
200       use the "use utf8" pragma.
201
202       NOTE: Assuming UTF-8 in your query parameters and keywords may be an
203       issue if you have legacy code where you created URL in templates
204       manually and used an encoding other than UTF-8.  In these cases you may
205       find versions of Catalyst after 5.90080+ will incorrectly decode.  For
206       backwards compatibility we offer three configurations settings, here
207       described in order of precedence:
208
209       "do_not_decode_query"
210
211       If true, then do not try to character decode any wide characters in
212       your request URL query or keywords.  You will need to handle this
213       manually in your action code (although if you choose this setting,
214       chances are you already do this).
215
216       "default_query_encoding"
217
218       This setting allows one to specify a fixed value for how to decode your
219       query, instead of using the default, UTF-8.
220
221       "decode_query_using_global_encoding"
222
223       If this is true we decode using whatever you set "encoding" to.
224
225   UTF8 in Form POST
226       In general most modern browsers will follow the specification, which
227       says that POSTed form fields should be encoded in the same way that the
228       document was served with.  That means that if you are using modern
229       Catalyst and serving UTF8 encoded responses, a browser is supposed to
230       notice that and encode the form POSTs accordingly.
231
232       As a result since Catalyst now serves UTF8 encoded responses by
233       default, this means that you can mostly rely on incoming form POSTs to
234       be so encoded.  Catalyst will make this assumption and decode
235       accordingly (unless you explicitly turn off encoding...)  If you are
236       running Catalyst in developer debug, then you will see the correct
237       unicode characters in the debug output.  For example if you generate a
238       POST request:
239
240           use Catalyst::Test 'MyApp';
241           use utf8;
242
243           my $res = request POST "/example/posted", ['♥'=>'♥', '♥♥'=>'♥'];
244
245       Running in CATALYST_DEBUG=1 mode you should see output like this:
246
247           [debug] Body Parameters are:
248           .-------------------------------------+--------------------------------------.
249           | Parameter                           | Value                                |
250           +-------------------------------------+--------------------------------------+
251           | ♥                                   | ♥                                    |
252           | ♥♥                                  | ♥                                    |
253           '-------------------------------------+--------------------------------------'
254
255       And if you had a controller like this:
256
257           package MyApp::Controller::Example;
258
259           use base 'Catalyst::Controller';
260
261           sub posted :POST Local {
262               my ($self, $c) = @_;
263               $c->res->content_type('text/plain');
264               $c->res->body("hearts => ${\$c->req->post_parameters->{♥}}");
265           }
266
267       The following test case would be true:
268
269           use Encode 2.21 'decode_utf8';
270           is decode_utf8($req->content), 'hearts => ♥';
271
272       In this case we decode so that we can print and compare strings with
273       multibyte characters.
274
275       NOTE  In some cases some browsers may not follow the specification and
276       set the form POST encoding based on the server response.  Catalyst
277       itself doesn't attempt any workarounds, but one common approach is to
278       use a hidden form field with a UTF8 value (You might be familiar with
279       this from how Ruby on Rails has HTML form helpers that do that
280       automatically).  In that case some browsers will send UTF8 encoded if
281       it notices the hidden input field contains such a character.  Also, you
282       can add an HTML attribute to your form tag which many modern browsers
283       will respect to set the encoding (accept-charset="utf-8").  And lastly
284       there are some javascript based tricks and workarounds for even more
285       odd cases (just search the web for this will return a number of
286       approaches.  Hopefully as more compliant browsers become popular these
287       edge cases will fade.
288
289       NOTE  It is possible for a form POST multipart response (normally a
290       file upload) to contain inline content with mixed content character
291       sets and encoding.  For example one might create a POST like this:
292
293           use utf8;
294           use HTTP::Request::Common;
295
296           my $utf8 = 'test ♥';
297           my $shiftjs = 'test テスト';
298           my $req = POST '/root/echo_arg',
299               Content_Type => 'form-data',
300                 Content =>  [
301                   arg0 => 'helloworld',
302                   Encode::encode('UTF-8','♥') => Encode::encode('UTF-8','♥♥'),
303                   arg1 => [
304                     undef, '',
305                     'Content-Type' =>'text/plain; charset=UTF-8',
306                     'Content' => Encode::encode('UTF-8', $utf8)],
307                   arg2 => [
308                     undef, '',
309                     'Content-Type' =>'text/plain; charset=SHIFT_JIS',
310                     'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
311                   arg2 => [
312                     undef, '',
313                     'Content-Type' =>'text/plain; charset=SHIFT_JIS',
314                     'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
315                 ];
316
317       In this case we've created a POST request but each part specifies its
318       own content character set (and setting a content encoding would also be
319       possible).  Generally one would not run into this situation in a web
320       browser context but for completeness sake Catalyst will notice if a
321       multipart POST contains parts with complex or extended header
322       information.  In these cases we will try to inspect the meta data and
323       do the right thing (in the above case we'd use SHIFT_JIS to decode, not
324       UTF-8).  However if after inspecting the headers we cannot figure out
325       how to decode the data, in those cases it will not attempt to apply
326       decoding to the form values.  Instead the part will be represented as
327       an instance of an object Catalyst::Request::PartData which will contain
328       all the header information needed for you to perform custom parser of
329       the data.
330
331       Ideally we'd fix Catalyst to be smarter about decoding so please submit
332       your cases of this so we can add intelligence to the parser and find a
333       way to extract a valid value out of it.
334

UTF8 Encoding in Body Response

336       When does Catalyst encode your response body and what rules does it use
337       to determine when that is needed.
338
339   Summary
340           use utf8;
341           use warnings;
342           use strict;
343
344           package MyApp::Controller::Root;
345
346           use base 'Catalyst::Controller';
347           use File::Spec;
348
349           sub scalar_body :Local {
350               my ($self, $c) = @_;
351               $c->response->content_type('text/html');
352               $c->response->body("<p>This is scalar_body action ♥</p>");
353           }
354
355           sub stream_write :Local {
356               my ($self, $c) = @_;
357               $c->response->content_type('text/html');
358               $c->response->write("<p>This is stream_write action ♥</p>");
359           }
360
361           sub stream_write_fh :Local {
362               my ($self, $c) = @_;
363               $c->response->content_type('text/html');
364
365               my $writer = $c->res->write_fh;
366               $writer->write_encoded('<p>This is stream_write_fh action ♥</p>');
367               $writer->close;
368           }
369
370           sub stream_body_fh :Local {
371               my ($self, $c) = @_;
372               my $path = File::Spec->catfile('t', 'utf8.txt');
373               open(my $fh, '<', $path) || die "trouble: $!";
374               $c->response->content_type('text/html');
375               $c->response->body($fh);
376           }
377
378   Discussion
379       Beginning with Catalyst version 5.90080 You no longer need to set the
380       encoding configuration (although doing so won't hurt anything).
381
382       Currently we only encode if the content type is one of the types which
383       generally expects a UTF8 encoding.  This is determined by the following
384       regular expression:
385
386           our $DEFAULT_ENCODE_CONTENT_TYPE_MATCH = qr{text|xml$|javascript$};
387           $c->response->content_type =~ /$DEFAULT_ENCODE_CONTENT_TYPE_MATCH/
388
389       This is a global variable in Catalyst::Response which is stored in the
390       "encodable_content_type" attribute of $c->response.  You may currently
391       alter this directly on the response or globally.  In the future we may
392       offer a configuration setting for this.
393
394       This would match content-types like the following (examples)
395
396           text/plain
397           text/html
398           text/xml
399           application/javascript
400           application/xml
401           application/vnd.user+xml
402
403       You should set your content type prior to header finalization if you
404       want Catalyst to encode.
405
406       NOTE We do not attempt to encode "application/json" since the two most
407       commonly used approaches (Catalyst::View::JSON and
408       Catalyst::Action::REST) have already configured their JSON encoders to
409       produce properly encoding UTF8 responses.  If you are rolling your own
410       JSON encoding, you may need to set the encoder to do the right thing
411       (or override the global regular expression to include the JSON media
412       type).
413
414   Encoding with Scalar Body
415       Catalyst supports several methods of supplying your response with body
416       content.  The first and currently most common is to set the
417       Catalyst::Response ->body with a scalar string ( as in the example):
418
419           use utf8;
420
421           sub scalar_body :Local {
422               my ($self, $c) = @_;
423               $c->response->content_type('text/html');
424               $c->response->body("<p>This is scalar_body action ♥</p>");
425           }
426
427       In general you should need to do nothing else since Catalyst will
428       automatically encode this string during body finalization.  The only
429       matter to watch out for is to make sure the string has not already been
430       encoded, as this will result in double encoding errors.
431
432       NOTE pay attention to the content-type setting in the example.
433       Catalyst inspects that content type carefully to determine if the body
434       needs encoding).
435
436       NOTE If you set the character set of the response Catalyst will skip
437       encoding IF the character set is set to something that doesn't match
438       $c->encoding->mime_name. We will assume if you are setting an
439       alternative character set, that means you want to handle the encoding
440       yourself.  However it might be easier to set $c->encoding for a given
441       response cycle since you can override this for a given response.  For
442       example here's how to override the default encoding and set the correct
443       character set in the response:
444
445           sub override_encoding :Local {
446               my ($self, $c) = @_;
447               $c->res->content_type('text/plain');
448               $c->encoding(Encode::find_encoding('Shift_JIS'));
449               $c->response->body("テスト");
450           }
451
452       This will use the alternative encoding for a single response.
453
454       NOTE If you manually set the content-type character set to whatever
455       $c->encoding->mime_name is set to, we STILL encode, rather than assume
456       your manual setting is a flag to override.  This is done to support
457       backward compatible assumptions (in particular Catalyst::View::TT has
458       set a utf-8 character set in its default content-type for ages, even
459       though it does not itself do any encoding on the body response).  If
460       you are going to handle encoding manually you may set
461       $c->clear_encoding for a single request response cycle, or as in the
462       above example set an alternative encoding.
463
464   Encoding with streaming type responses
465       Catalyst offers two approaches to streaming your body response.  Again,
466       you must remember to set your content type prior to streaming, since
467       invoking a streaming response will automatically finalize and send your
468       HTTP headers (and your content type MUST be one that matches the
469       regular expression given above.)
470
471       Also, if you are going to override $c->encoding (or invoke
472       $c->clear_encoding), you should do that before anything else!
473
474       The first streaming method is to use the "write" method on the response
475       object.  This method allows 'inlined' streaming and is generally used
476       with blocking style servers.
477
478           sub stream_write :Local {
479               my ($self, $c) = @_;
480               $c->response->content_type('text/html');
481               $c->response->write("<p>This is stream_write action ♥</p>");
482           }
483
484       You may call the "write" method as often as you need to finish
485       streaming all your content.  Catalyst will encode each line in turn as
486       long as the content-type meets the 'encodable types' requirement and
487       $c->encoding is set (which it is, as long as you did not change it).
488
489       NOTE If you try to change the encoding after you start the stream, this
490       will invoke an error response.  However since you've already started
491       streaming this will not show up as an HTTP error status code, but
492       rather error information in your body response and an error in your
493       logs.
494
495       NOTE If you use ->body AFTER using ->write (for example you may do this
496       to write your HTML HEAD information as fast as possible) we expect the
497       contents to body to be encoded as it normally would be if you never
498       called ->write.  In general unless you are doing weird custom stuff
499       with encoding this is likely to just already do the correct thing.
500
501       The second way to stream a response is to get the response writer
502       object and invoke methods on that directly:
503
504           sub stream_write_fh :Local {
505               my ($self, $c) = @_;
506               $c->response->content_type('text/html');
507
508               my $writer = $c->res->write_fh;
509               $writer->write_encoded('<p>This is stream_write_fh action ♥</p>');
510               $writer->close;
511           }
512
513       This can be used just like the "write" method, but typically you
514       request this object when you want to do a nonblocking style response
515       since the writer object can be closed over or sent to a model that will
516       invoke it in a non blocking manner.  For more on using the writer
517       object for non blocking responses you should review the "Catalyst"
518       documentation and also you can look at several articles from last years
519       advent, in particular:
520
521       <http://www.catalystframework.org/calendar/2013/10>,
522       <http://www.catalystframework.org/calendar/2013/11>,
523       <http://www.catalystframework.org/calendar/2013/12>,
524       <http://www.catalystframework.org/calendar/2013/13>,
525       <http://www.catalystframework.org/calendar/2013/14>.
526
527       The main difference this year is that previously calling ->write_fh
528       would return the actual Plack writer object that was supplied by your
529       Plack application handler, whereas now we wrap that object in a
530       lightweight decorator object that proxies the "write" and "close"
531       methods and supplies an additional "write_encoded" method.
532       "write_encoded" does the exact same thing as "write" except that it
533       will first encode the string when necessary.  In general if you are
534       streaming encodable content such as HTML this is the method to use.  If
535       you are streaming binary content, you should just use the "write"
536       method (although if the content type is set correctly we would skip
537       encoding anyway, but you may as well avoid the extra noop overhead).
538
539       The last style of content response that Catalyst supports is setting
540       the body to a filehandle like object.  In this case the object is
541       passed down to the Plack application handler directly and currently we
542       do nothing to set encoding.
543
544           sub stream_body_fh :Local {
545               my ($self, $c) = @_;
546               my $path = File::Spec->catfile('t', 'utf8.txt');
547               open(my $fh, '<', $path) || die "trouble: $!";
548               $c->response->content_type('text/html');
549               $c->response->body($fh);
550           }
551
552       In this example we create a filehandle to a text file that contains
553       UTF8 encoded characters. We pass this down without modification, which
554       I think is correct since we don't want to double encode.  However this
555       may change in a future development release so please be sure to double
556       check the current docs and changelog.  Its possible a future release
557       will require you to to set a encoding on the IO layer level so that we
558       can be sure to properly encode at body finalization.  So this is still
559       an edge case we are writing test examples for.  But for now if you are
560       returning a filehandle like response, you are expected to make sure you
561       are following the PSGI specification and return raw bytes.
562
563   Override the Encoding on Context
564       As already noted you may change the current encoding (or remove it) by
565       setting an alternative encoding on the context;
566
567           $c->encoding(Encode::find_encoding('Shift_JIS'));
568
569       Please note that you can continue to change encoding UNTIL the headers
570       have been finalized.  The last setting always wins.  Trying to change
571       encoding after header finalization is an error.
572
573   Setting the Content Encoding HTTP Header
574       In some cases you may set a content encoding on your response.  For
575       example if you are encoding your response with gzip.  In this case you
576       are again on your own.  If we notice that the content encoding header
577       is set when we hit finalization, we skip automatic encoding:
578
579           use Encode;
580           use Compress::Zlib;
581           use utf8;
582
583           sub gzipped :Local {
584               my ($self, $c) = @_;
585
586               $c->res->content_type('text/plain');
587               $c->res->content_type_charset('UTF-8');
588               $c->res->content_encoding('gzip');
589
590               $c->response->body(
591                 Compress::Zlib::memGzip(
592                   Encode::encode_utf8("manual_1 ♥")));
593           }
594
595       If you are using Catalyst::Plugin::Compress you need to upgrade to the
596       most recent version in order to be compatible with changes introduced
597       in Catalyst 5.90080.  Other plugins may require updates (please open
598       bugs if you find them).
599
600       NOTE Content encoding may be set to 'identify' and we will still
601       perform automatic encoding if the content type is encodable and an
602       encoding is present for the context.
603
604   Using Common Views
605       The following common views have been updated so that their tests pass
606       with default UTF8 encoding for Catalyst:
607
608       Catalyst::View::TT, Catalyst::View::Mason, Catalyst::View::HTML::Mason,
609       Catalyst::View::Xslate
610
611       See Catalyst::Upgrading for additional information on Catalyst
612       extensions that require upgrades.
613
614       In generally for the common views you should not need to do anything
615       special.  If your actual template files contain UTF8 literals you
616       should set configuration on your View to enable that.  For example in
617       TT, if your template has actual UTF8 character in it you should do the
618       following:
619
620           MyApp::View::TT->config(ENCODING => 'utf-8');
621
622       However Catalyst::View::Xslate wants to do the UTF8 encoding for you
623       (We assume that the authors of that view did this as a workaround to
624       the fact that until now encoding was not core to Catalyst.  So if you
625       use that view, you either need to tell it to not encode, or you need to
626       turn off encoding for Catalyst.
627
628           MyApp::View::Xslate->config(encode_body => 0);
629
630       or
631
632           MyApp->config(encoding=>undef);
633
634       Preference is to disable it in the View.
635
636       Other views may be similar.  You should review View documentation and
637       test during upgrading.  We tried to make sure most common views worked
638       properly and noted all workaround but if we missed something please
639       alert the development team (instead of introducing a local hack into
640       your application that will mean nobody will ever upgrade it...).
641
642   Setting the response from an external PSGI application.
643       Catalyst::Response allows one to set the response from an external PSGI
644       application.  If you do this, and that external application sets a
645       character set on the content-type, we "clear_encoding" for the rest of
646       the response.  This is done to prevent double encoding.
647
648       NOTE Even if the character set of the content type is the same as the
649       encoding set in $c->encoding, we still skip encoding.  This is a
650       regrettable difference from the general rule outlined above, where if
651       the current character set is the same as the current encoding, we
652       encode anyway.  Nevertheless I think this is the correct behavior since
653       the earlier rule exists only to support backward compatibility with
654       Catalyst::View::TT.
655
656       In general if you want Catalyst to handle encoding, you should avoid
657       setting the content type character set since Catalyst will do so
658       automatically based on the requested response encoding.  Its best to
659       request alternative encodings by setting $c->encoding and if you
660       really want manual control of encoding you should always
661       $c->clear_encoding so that programmers that come after you are very
662       clear as to your intentions.
663
664   Disabling default UTF8 encoding
665       You may encounter issues with your legacy code running under default
666       UTF8 body encoding.  If so you can disable this with the following
667       configurations setting:
668
669           MyApp->config(encoding=>undef);
670
671       Where "MyApp" is your Catalyst subclass.
672
673       If you do not wish to disable all the Catalyst encoding features, you
674       may disable specific features via two additional configuration options:
675       'skip_body_param_unicode_decoding' and
676       'skip_complex_post_part_handling'.  The first will skip any attempt to
677       decode POST parameters in the creating of body parameters and the
678       second will skip creation of instances of Catalyst::Request::PartData
679       in the case that the multipart form upload contains parts with a mix of
680       content character sets.
681
682       If you believe you have discovered a bug in UTF8 body encoding, I
683       strongly encourage you to report it (and not try to hack a workaround
684       in your local code).  We also recommend that you regard such a
685       workaround as a temporary solution.  It is ideal if Catalyst extension
686       authors can start to count on Catalyst doing the right thing for
687       encoding.
688

Conclusion

690       This document has attempted to be a complete review of how UTF8 and
691       encoding works in the current version of Catalyst and also to document
692       known issues, gotchas and backward compatible hacks.  Please report
693       issues to the development team.
694

Author

696       John Napiorkowski jjnapiork@cpan.org <mailto:jjnapiork@cpan.org>
697
698
699
700perl v5.32.0                      2020-07-28                 Catalyst::UTF8(3)
Impressum