Text::Levenshtein(3pm)

1Text::Levenshtein(3)  User Contributed Perl Documentation Text::Levenshtein(3)
2
3
4

NAME

6       Text::Levenshtein - calculate the Levenshtein edit distance between two
7       strings
8

SYNOPSIS

10        use Text::Levenshtein qw(distance);
11
12        print distance("foo","four");
13        # prints "2"
14
15        my @words     = qw/ four foo bar /;
16        my @distances = distance("foo",@words);
17
18        print "@distances";
19        # prints "2 0 3"
20

DESCRIPTION

22       This module implements the Levenshtein edit distance, which measures
23       the difference between two strings, in terms of the edit distance.
24       This distance is the number of substitutions, deletions or insertions
25       ("edits") needed to transform one string into the other one (and vice
26       versa).  When two strings have distance 0, they are the same.
27
28       To learn more about the Levenshtein metric, have a look at the
29       wikipedia page <http://en.wikipedia.org/wiki/Levenshtein_distance>.
30
31   distance()
32       The simplest usage will take two strings and return the edit distance:
33
34        $distance = distance('brown', 'green');
35        # returns 3, as 'r' and 'n' don't change
36
37       Instead of a single second string, you can pass a list of strings.
38       Each string will be compared to the first string passed, and a list of
39       the edit distances returned:
40
41        @words     = qw/ green trainee brains /;
42        @distances = distances('brown', @words);
43        # returns (3, 5, 3)
44
45   fastdistance()
46       Previous versions of this module provided an alternative
47       implementation, in the function fastdistance().  This function is still
48       provided, for backwards compatibility, but they now run the same
49       function to calculate the edit distance.
50
51       Unlike distance(), fastdistance() only takes two strings, and returns
52       the edit distance between them.
53

ignore_diacritics

55       Both the distance() and fastdistance() functions can take a hashref
56       with optional arguments, as the final argument.  At the moment the only
57       option is "ignore_diacritics".  If this is true, then any diacritics
58       are ignored when calculating edit distance. For example, "cafe" and
59       "café" normally have an edit distance of 1, but when diacritics are
60       ignored, the distance will be 0:
61
62        use Text::Levenshtein 0.11 qw/ distance /;
63        $distance = distance($word1, $word2, {ignore_diacritics => 1});
64
65       If you turn on this option, then Unicode::Collate will be loaded, and
66       used when comparing characters in the words.
67
68       Early version of "Text::Levenshtein" didn't support this version, so
69       you should require version 0.11 or later, as above.
70

72       There are many different modules on CPAN for calculating the edit
73       distance between two strings. Here's just a selection.
74
75       Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the
76       Levenshtein algorithm that require a C compiler, but will be a lot
77       faster than this module.
78
79       Text::Levenshtein::Flexible is another C implementation, but offers
80       some twists: you can specify a maximum distance that you're interested
81       in, which makes it faster; you can also give different costs to
82       insertion, deletion, and substitution. Hasn't been updated since 2014.
83
84       Text::Levenshtein::Edlib is a Perl wrapper around a C++ library that
85       provides the Levenshtein edit distance and optimal alignment path for a
86       pair of strings.  It doesn't support UTF-8 strings, though.
87
88       Text::Levenshtein::BV implements the Levenshtein algorithm using bit
89       vectors, and claims to be faster than this implementation.  I haven't
90       benchmarked them.
91
92       The Damerau-Levenshtein edit distance is like the Levenshtein distance,
93       but in addition to insertion, deletion and substitution, it also
94       considers the transposition of two adjacent characters to be a single
95       edit.  The module Text::Levenshtein::Damerau defaults to using a pure
96       perl implementation, but if you've installed
97       Text::Levenshtein::Damerau::XS then it will be a lot quicker.
98
99       Text::WagnerFischer is an implementation of the Wagner-Fischer edit
100       distance, which is similar to the Levenshtein, but applies different
101       weights to each edit type.
102
103       Text::Brew is an implementation of the Brew edit distance, which is
104       another algorithm based on edit weights.
105
106       Text::Fuzzy provides a number of operations for partial or fuzzy
107       matching of text based on edit distance. Text::Fuzzy::PP is a pure perl
108       implementation of the same interface.
109
110       String::Similarity takes two strings and returns a value between 0
111       (meaning entirely different) and 1 (meaning identical).  Apparently
112       based on edit distance.
113
114       Text::Dice calculates Dice's coefficient
115       <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient> for two
116       strings. This formula was originally developed to measure the
117       similarity of two different populations in ecological research.
118
119       String::KeyboardDistance and String::KeyboardDistanceXS calculate the
120       "keyboard distance" between two strings.
121

REPOSITORY

123       <https://github.com/neilbowers/Text-Levenshtein>
124

AUTHOR

126       Dree Mistrut originally wrote this module and released it to CPAN in
127       2002.
128
129       Josh Goldberg then took over maintenance and released versions between
130       2004 and 2008.
131
132       Neil Bowers (NEILB on CPAN) is now maintaining this module.  Version
133       0.07 was a complete rewrite, based on one of the algorithms on the
134       wikipedia page.
135

COPYRIGHT AND LICENSE

137       This software is copyright (C) 2002-2004 Dree Mistrut.  Copyright (C)
138       2004-2014 Josh Goldberg.  Copyright (C) 2014- Neil Bowers.
139
140       This is free software; you can redistribute it and/or modify it under
141       the same terms as the Perl 5 programming language system itself.
142
143
144
145perl v5.36.0                      2023-01-20              Text::Levenshtein(3)