1Test::utf8(3)         User Contributed Perl Documentation        Test::utf8(3)
2
3
4

NAME

6       Test::utf8 - handy utf8 tests
7

SYNOPSIS

9         is_valid_string($string);   # check the string is valid
10         is_sane_utf8($string);      # check not double encoded
11         is_flagged_utf8($string);   # has utf8 flag set
12         is_within_latin_1($string); # but only has latin_1 chars in it
13

DESCRIPTION

15       This module is a collection of tests that's useful when dealing with
16       utf8 strings in Perl.
17
18   Validity
19       These two tests check if a string is valid, and if you've probably made
20       a mistake with your string
21
22       is_valid_string($string, $testname)
23           This passes and returns true true if and only if the scalar isn't a
24           invalid string; In short, it checks that the utf8 flag hasn't been
25           set for a string that isn't a valid utf8 encoding.
26
27       is_sane_utf8($string, $name)
28           This test fails if the string contains something that looks like it
29           might be dodgy utf8, i.e. containing something that looks like the
30           multi-byte sequence for a latin-1 character but perl hasn't been
31           instructed to treat as such.  Strings that are not utf8 always
32           automatically pass.
33
34           Some examples may help:
35
36             # This will pass as it's a normal latin-1 string
37             is_sane_utf8("Hello L\x{e9}eon");
38
39             # this will fail because the \x{c3}\x{a9} looks like the
40             # utf8 byte sequence for e-acute
41             my $string = "Hello L\x{c3}\x{a9}on";
42             is_sane_utf8($string);
43
44             # this will pass because the utf8 is correctly interpreted as utf8
45             Encode::_utf8_on($string)
46             is_sane_utf8($string);
47
48           Obviously this isn't a hundred percent reliable.  The edge case
49           where this will fail is where you have "\x{c2}" (which is "LATIN
50           CAPITAL LETTER WITH CIRCUMFLEX") or "\x{c3}" (which is "LATIN
51           CAPITAL LETTER WITH TILDE") followed by one of the latin-1
52           punctuation symbols.
53
54             # a capital letter A with tilde surrounded by smart quotes
55             # this will fail because it'll see the "\x{c2}\x{94}" and think
56             # it's actually the utf8 sequence for the end smart quote
57             is_sane_utf8("\x{93}\x{c2}\x{94}");
58
59           However, since this hardly comes up this test is reasonably
60           reliable in most cases.  Still, care should be applied in cases
61           where dynamic data is placed next to latin-1 punctuation to avoid
62           false negatives.
63
64           There exists two situations to cause this test to fail; The string
65           contains utf8 byte sequences and the string hasn't been flagged as
66           utf8 (this normally means that you got it from an external source
67           like a C library; When Perl needs to store a string internally as
68           utf8 it does it's own encoding and flagging transparently) or a
69           utf8 flagged string contains byte sequences that when translated to
70           characters themselves look like a utf8 byte sequence.  The test
71           diagnostics tells you which is the case.
72
73   Checking the Range of Characters in a String
74       These routines allow you to check the range of characters in a string.
75       Note that these routines are blind to the actual encoding perl
76       internally uses to store the characters, they just check if the string
77       contains only characters that can be represented in the named encoding.
78
79       is_within_ascii
80           Tests that a string only contains characters that are in the ASCII
81           charecter set.
82
83       is_within_latin_1
84           Tests that a string only contains characters that are in latin-1.
85
86   Simple utf8 Flag Tests
87       Simply check if a scalar is or isn't flagged as utf8 by perl's
88       internals.
89
90       is_flagged_utf8($string, $name)
91           Passes if the string is flagged by perl's internals as utf8, fails
92           if it's not.
93
94       isnt_flagged_utf8($string,$name)
95           The opposite of "is_flagged_utf8", passes if and only if the string
96           isn't flagged as utf8 by perl's internals.
97
98           Note: you can refer to this function as "isn't_flagged_utf8" if you
99           really want to.
100

AUTHOR

102         Copyright Mark Fowler 2004.  All rights reserved.
103
104         This program is free software; you can redistribute it
105         and/or modify it under the same terms as Perl itself.
106

BUGS

108       None known.  Please report any to me via the CPAN RT system.  See
109       http://rt.cpan.org/ for more details.
110

SEE ALSO

112       Test::DoubleEncodedEntities for testing for double encoded HTML
113       entities.
114
115
116
117perl v5.12.1                      2010-06-26                     Test::utf8(3)
Impressum