1Text::Levenshtein(3) User Contributed Perl Documentation Text::Levenshtein(3)
2
3
4
6 Text::Levenshtein - calculate the Levenshtein edit distance between two
7 strings
8
10 use Text::Levenshtein qw(distance);
11
12 print distance("foo","four");
13 # prints "2"
14
15 my @words = qw/ four foo bar /;
16 my @distances = distance("foo",@words);
17
18 print "@distances";
19 # prints "2 0 3"
20
22 This module implements the Levenshtein edit distance, which measures
23 the difference between two strings, in terms of the edit distance.
24 This distance is the number of substitutions, deletions or insertions
25 ("edits") needed to transform one string into the other one (and vice
26 versa). When two strings have distance 0, they are the same.
27
28 To learn more about the Levenshtein metric, have a look at the
29 wikipedia page <http://en.wikipedia.org/wiki/Levenshtein_distance>.
30
31 distance()
32 The simplest usage will take two strings and return the edit distance:
33
34 $distance = distance('brown', 'green');
35 # returns 3, as 'r' and 'n' don't change
36
37 Instead of a single second string, you can pass a list of strings.
38 Each string will be compared to the first string passed, and a list of
39 the edit distances returned:
40
41 @words = qw/ green trainee brains /;
42 @distances = distances('brown', @words);
43 # returns (3, 5, 3)
44
45 fastdistance()
46 Previous versions of this module provided an alternative
47 implementation, in the function "fastdistance()". This function is
48 still provided, for backwards compatibility, but they now run the same
49 function to calculate the edit distance.
50
51 Unlike "distance()", "fastdistance()" only takes two strings, and
52 returns the edit distance between them.
53
55 Both the "distance()" and "fastdistance()" functions can take a hashref
56 with optional arguments, as the final argument. At the moment the only
57 option is "ignore_diacritics". If this is true, then any diacritics
58 are ignored when calculating edit distance. For example, "cafe" and
59 "café" normally have an edit distance of 1, but when diacritics are
60 ignored, the distance will be 0:
61
62 use Text::Levenshtein 0.11 qw/ distance /;
63 $distance = distance($word1, $word2, {ignore_diacritics => 1});
64
65 If you turn on this option, then Unicode::Collate will be loaded, and
66 used when comparing characters in the words.
67
68 Early version of "Text::Levenshtein" didn't support this version, so
69 you should require version 0.11 or later, as above.
70
72 There are many different modules on CPAN for calculating the edit
73 distance between two strings. Here's just a selection.
74
75 Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the
76 Levenshtein algorithm that require a C compiler, but will be a lot
77 faster than this module.
78
79 Text::Levenshtein::Flexible is another C implementation, but offers
80 some twists: you can specify a maximum distance that you're interested
81 in, which makes it faster; you can also give different costs to
82 insertion, deletion, and substitution. Hasn't been updated since 2014.
83
84 Text::Levenshtein::Edlib is a Perl wrapper around a C++ library that
85 provides the Levenshtein edit distance and optimal alignment path for a
86 pair of strings. It doesn't support UTF-8 strings, though.
87
88 Text::Levenshtein::BV implements the Levenshtein algorithm using bit
89 vectors, and claims to be faster than this implementation. I haven't
90 benchmarked them.
91
92 The Damerau-Levenshtein edit distance is like the Levenshtein distance,
93 but in addition to insertion, deletion and substitution, it also
94 considers the transposition of two adjacent characters to be a single
95 edit. The module Text::Levenshtein::Damerau defaults to using a pure
96 perl implementation, but if you've installed
97 Text::Levenshtein::Damerau::XS then it will be a lot quicker.
98
99 Text::WagnerFischer is an implementation of the Wagner-Fischer edit
100 distance, which is similar to the Levenshtein, but applies different
101 weights to each edit type.
102
103 Text::Brew is an implementation of the Brew edit distance, which is
104 another algorithm based on edit weights.
105
106 Text::Fuzzy provides a number of operations for partial or fuzzy
107 matching of text based on edit distance. Text::Fuzzy::PP is a pure perl
108 implementation of the same interface.
109
110 String::Similarity takes two strings and returns a value between 0
111 (meaning entirely different) and 1 (meaning identical). Apparently
112 based on edit distance.
113
114 Text::Dice calculates Dice's coefficient
115 <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient> for two
116 strings. This formula was originally developed to measure the
117 similarity of two different populations in ecological research.
118
119 String::KeyboardDistance and String::KeyboardDistanceXS calculate the
120 "keyboard distance" between two strings.
121
123 <https://github.com/neilbowers/Text-Levenshtein>
124
126 Dree Mistrut originally wrote this module and released it to CPAN in
127 2002.
128
129 Josh Goldberg then took over maintenance and released versions between
130 2004 and 2008.
131
132 Neil Bowers (NEILB on CPAN) is now maintaining this module. Version
133 0.07 was a complete rewrite, based on one of the algorithms on the
134 wikipedia page.
135
137 This software is copyright (C) 2002-2004 Dree Mistrut. Copyright (C)
138 2004-2014 Josh Goldberg. Copyright (C) 2014- Neil Bowers.
139
140 This is free software; you can redistribute it and/or modify it under
141 the same terms as the Perl 5 programming language system itself.
142
143
144
145perl v5.34.0 2021-07-23 Text::Levenshtein(3)