1Text::Levenshtein(3) User Contributed Perl Documentation Text::Levenshtein(3)
2
3
4
6 Text::Levenshtein - calculate the Levenshtein edit distance between two
7 strings
8
10 use Text::Levenshtein qw(distance);
11
12 print distance("foo","four");
13 # prints "2"
14
15 my @words = qw/ four foo bar /;
16 my @distances = distance("foo",@words);
17
18 print "@distances";
19 # prints "2 0 3"
20
22 This module implements the Levenshtein edit distance, which measures
23 the difference between two strings, in terms of the edit distance.
24 This distance is the number of substitutions, deletions or insertions
25 ("edits") needed to transform one string into the other one (and vice
26 versa). When two strings have distance 0, they are the same.
27
28 To learn more about the Levenshtein metric, have a look at the
29 wikipedia page <http://en.wikipedia.org/wiki/Levenshtein_distance>.
30
31 distance()
32 The simplest usage will take two strings and return the edit distance:
33
34 $distance = distance('brown', 'green');
35 # returns 3, as 'r' and 'n' don't change
36
37 Instead of a single second string, you can pass a list of strings.
38 Each string will be compared to the first string passed, and a list of
39 the edit distances returned:
40
41 @words = qw/ green trainee brains /;
42 @distances = distances('brown', @words);
43 # returns (3, 5, 3)
44
45 fastdistance()
46 Previous versions of this module provided an alternative
47 implementation, in the function "fastdistance()". This function is
48 still provided, for backwards compatibility, but they now run the same
49 function to calculate the edit distance.
50
51 Unlike "distance()", "fastdistance()" only takes two strings, and
52 returns the edit distance between them.
53
55 Both the "distance()" and "fastdistance()" functions can take a hashref
56 with optional arguments, as the final argument. At the moment the only
57 option is "ignore_diacritics". If this is true, then any diacritics
58 are ignored when calculating edit distance. For example, "cafe" and
59 "café" normally have an edit distance of 1, but when diacritics are
60 ignored, the distance will be 0:
61
62 use Text::Levenshtein 0.11 qw/ distance /;
63 $distance = distance($word1, $word2, {ignore_diacritics => 1});
64
65 If you turn on this option, then Unicode::Collate will be loaded, and
66 used when comparing characters in the words.
67
68 Early version of "Text::Levenshtein" didn't support this version, so
69 you should require version 0.11 or later, as above.
70
72 There are many different modules on CPAN for calculating the edit
73 distance between two strings. Here's just a selection.
74
75 Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the
76 Levenshtein algorithm that require a C compiler, but will be a lot
77 faster than this module.
78
79 The Damerau-Levenshtein edit distance is like the Levenshtein distance,
80 but in addition to insertion, deletion and substitution, it also
81 considers the transposition of two adjacent characters to be a single
82 edit. The module Text::Levenshtein::Damerau defaults to using a pure
83 perl implementation, but if you've installed
84 Text::Levenshtein::Damerau::XS then it will be a lot quicker.
85
86 Text::WagnerFischer is an implementation of the Wagner-Fischer edit
87 distance, which is similar to the Levenshtein, but applies different
88 weights to each edit type.
89
90 Text::Brew is an implementation of the Brew edit distance, which is
91 another algorithm based on edit weights.
92
93 Text::Fuzzy provides a number of operations for partial or fuzzy
94 matching of text based on edit distance. Text::Fuzzy::PP is a pure perl
95 implementation of the same interface.
96
97 String::Similarity takes two strings and returns a value between 0
98 (meaning entirely different) and 1 (meaning identical). Apparently
99 based on edit distance.
100
101 Text::Dice calculates Dice's coefficient
102 <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient> for two
103 strings. This formula was originally developed to measure the
104 similarity of two different populations in ecological research.
105
107 <https://github.com/neilbowers/Text-Levenshtein>
108
110 Dree Mistrut originally wrote this module and released it to CPAN in
111 2002.
112
113 Josh Goldberg then took over maintenance and released versions between
114 2004 and 2008.
115
116 Neil Bowers (NEILB on CPAN) is now maintaining this module. Version
117 0.07 was a complete rewrite, based on one of the algorithms on the
118 wikipedia page.
119
121 This software is copyright (C) 2002-2004 Dree Mistrut. Copyright (C)
122 2004-2014 Josh Goldberg. Copyright (C) 2014- Neil Bowers.
123
124 This is free software; you can redistribute it and/or modify it under
125 the same terms as the Perl 5 programming language system itself.
126
127
128
129perl v5.28.1 2015-08-11 Text::Levenshtein(3)