1Text::Soundex(3pm) Perl Programmers Reference Guide Text::Soundex(3pm)
2
3
4
6 Text::Soundex - Implementation of the soundex algorithm.
7
9 use Text::Soundex;
10
11 # Original algorithm.
12 $code = soundex($name); # Get the soundex code for a name.
13 @codes = soundex(@names); # Get the list of codes for a list of names.
14
15 # American Soundex variant (NARA) - Used for US census data.
16 $code = soundex_nara($name); # Get the soundex code for a name.
17 @codes = soundex_nara(@names); # Get the list of codes for a list of names.
18
19 # Redefine the value that soundex() will return if the input string
20 # contains no identifiable sounds within it.
21 $Text::Soundex::nocode = 'Z000';
22
24 Soundex is a phonetic algorithm for indexing names by sound, as
25 pronounced in English. The goal is for names with the same
26 pronunciation to be encoded to the same representation so that they can
27 be matched despite minor differences in spelling. Soundex is the most
28 widely known of all phonetic algorithms and is often used (incorrectly)
29 as a synonym for "phonetic algorithm". Improvements to Soundex are the
30 basis for many modern phonetic algorithms. (Wikipedia, 2007)
31
32 This module implements the original soundex algorithm developed by
33 Robert Russell and Margaret Odell, patented in 1918 and 1922, as well
34 as a variation called "American Soundex" used for US census data, and
35 current maintained by the National Archives and Records Administration
36 (NARA).
37
38 The soundex algorithm may be recognized from Donald Knuth's The Art of
39 Computer Programming. The algorithm described by Knuth is the NARA
40 algorithm.
41
42 The value returned for strings which have no soundex encoding is
43 defined using $Text::Soundex::nocode. The default value is "undef",
44 however values such as 'Z000' are commonly used alternatives.
45
46 For backward compatibility with older versions of this module the
47 $Text::Soundex::nocode is exported into the caller's namespace as
48 $soundex_nocode.
49
50 In scalar context, "soundex()" returns the soundex code of its first
51 argument. In list context, a list is returned in which each element is
52 the soundex code for the corresponding argument passed to "soundex()".
53 For example, the following code assigns @codes the value "('M200',
54 'S320')":
55
56 @codes = soundex qw(Mike Stok);
57
58 To use "Text::Soundex" to generate codes that can be used to search one
59 of the publically available US Censuses, a variant of the soundex
60 algorithm must be used:
61
62 use Text::Soundex;
63 $code = soundex_nara($name);
64
65 An example of where these algorithm differ follows:
66
67 use Text::Soundex;
68 print soundex("Ashcraft"), "\n"; # prints: A226
69 print soundex_nara("Ashcraft"), "\n"; # prints: A261
70
72 Donald Knuth's examples of names and the soundex codes they map to are
73 listed below:
74
75 Euler, Ellery -> E460
76 Gauss, Ghosh -> G200
77 Hilbert, Heilbronn -> H416
78 Knuth, Kant -> K530
79 Lloyd, Ladd -> L300
80 Lukasiewicz, Lissajous -> L222
81
82 so:
83
84 $code = soundex 'Knuth'; # $code contains 'K530'
85 @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200'
86
88 As the soundex algorithm was originally used a long time ago in the US
89 it considers only the English alphabet and pronunciation. In
90 particular, non-ASCII characters will be ignored. The recommended
91 method of dealing with characters that have accents, or other unicode
92 characters, is to use the Text::Unidecode module available from CPAN.
93 Either use the module explicitly:
94
95 use Text::Soundex;
96 use Text::Unidecode;
97
98 print soundex(unidecode("Fran\xE7ais")), "\n"; # Prints "F652\n"
99
100 Or use the convenient wrapper routine:
101
102 use Text::Soundex 'soundex_unicode';
103
104 print soundex_unicode("Fran\xE7ais"), "\n"; # Prints "F652\n"
105
106 Since the soundex algorithm maps a large space (strings of arbitrary
107 length) onto a small space (single letter plus 3 digits) no inference
108 can be made about the similarity of two strings which end up with the
109 same soundex code. For example, both "Hilbert" and "Heilbronn" end up
110 with a soundex code of "H416".
111
113 This module is currently maintain by Mark Mielke ("mark@mielke.cc").
114
116 Version 3 is a significant update to provide support for versions of
117 Perl later than Perl 5.004. Specifically, the XS version of the
118 soundex() subroutine understands strings that are encoded using UTF-8
119 (unicode strings).
120
121 Version 2 of this module was a re-write by Mark Mielke
122 ("mark@mielke.cc") to improve the speed of the subroutines. The XS
123 version of the soundex() subroutine was introduced in 2.00.
124
125 Version 1 of this module was written by Mike Stok ("mike@stok.co.uk")
126 and was included into the Perl core library set.
127
128 Dave Carlsen ("dcarlsen@csranet.com") made the request for the NARA
129 algorithm to be included. The NARA soundex page can be viewed at:
130 "http://www.nara.gov/genealogy/soundex/soundex.html"
131
132 Ian Phillips ("ian@pipex.net") and Rich Pinder ("rpinder@hsc.usc.edu")
133 supplied ideas and spotted mistakes for v1.x.
134
135
136
137perl v5.12.4 2011-06-01 Text::Soundex(3pm)