1Net::IDN::Standards(3)User Contributed Perl DocumentationNet::IDN::Standards(3)
2
3
4

NAME

6       Net::IDN::Standards -- Internationalized Domain Names for Applications
7       (IDNA)
8

INTRODUCTION

10       Historically, domain names and host names were restricted to a limited
11       repertoire of ASCII characters, i.e. letters, digits and the hyphen
12       (i.e. "/[A-Z0-9-]/i"). Words and names from languages that require
13       additional characters (such as diacritics or special characters) or
14       other scripts could not be used.
15
16       Internationalized Domain Names (IDNs) extend the character repertoire
17       for domain names from ASCII to Unicode while maintaining backwards
18       compatibility with software that only expects and handles ASCII
19       characters.
20
21       In order to do so, Unicode domain names are converted to ASCII using an
22       ASCII-compatible encoding (ACE) called Punycode. On the wire, converted
23       domain names start with "xn--", followed by the ASCII encoding of the
24       Unicode string.  The Unicode version is typically only shown in
25       applications presenting the domain to the user (hence Internationalized
26       Domain Names for Applications, IDNA).  Internationalized Resource
27       Identifiers (IRIs), the Unicode version of URLs, may also include
28       domain names in their Unicode form.
29
30       The IDNA specifications, however, do not only cover the actual Punycode
31       conversion but also include extensive rules for preparation (mapping
32       and/or validation) of input strings.  They typically define two
33       functions, "ToASCII" and "ToUnicode", which prepare and convert a
34       domain name to the ACE version or the Unicode version.
35

DIFFERENT STANDARDS

37         "The nice thing about standards is that you have so many to
38         choose from."
39                                              -- Andrew S. Tanenbaum
40
41       While the actual Punycode conversion is stable, there are different
42       specifications regarding mapping and/or validation (preparation):
43
44   IDNA2003
45       IDNA2003, which is defined in RFC 3490
46       (<http://tools.ietf.org/html/rfc3490>) and related documents, was the
47       original specification for the internationalization of domain names.
48
49       However, some issues were subsequently identified with IDNA2003: The
50       specification was tied to Unicode 3.2 and therefore did not allow
51       characters added in newer versions of Unicode (without updating the
52       specifications).
53
54       Furthermore, a few characters were mapped to other characters or
55       deleted although they would carry meaning in some languages (i.e.  'ss'
56       and 'X' were mapped to 'ss' and 'X'; ZWJ and ZWNJ were always mapped to
57       nothing, although some scripts like Arabic require them for correct
58       display).
59
60   IDNA2008
61       IDNA2008, which is defined in RFC 5890
62       (<http://tools.ietf.org/html/rfc5890>) and related documents, resolves
63       the issues found in IDNA2003.
64
65       This was done by allowing some characters that would either be mapped
66       to other characters, mapped to zero and/or cause the preparation to
67       fail. The new domain names would not be accessible by IDNA2003
68       implementations, of course.
69
70       However, IDNA2008 also disallowed a large number of characters that had
71       been allowed in IDNA2003 (mostly symbols). An implementation of
72       IDNA2008 would therefore no longer be able to access domain names such
73       as "X.com", which had been registered under IDNA2003.
74
75   UTS #46
76       Unicode Technical Standard #46 (UTS #46,
77       <http://unicode.org/reports/tr46/>) solves this problem by allowing
78       domain names that are valid in either IDNA2003 or IDNA2008.
79
80       This makes UTS #46 the perfect fit for domain lookup (be liberal in
81       what you accept) but unsuitable for validating domain names prior to
82       registration (be conservative in what you send).
83

AUTHOR

85       Claus Faerber <CFAERBER@cpan.org>
86
87
88
89perl v5.34.0                      2022-01-21            Net::IDN::Standards(3)
Impressum