Text::Levenshtein(3pm)

1Text::Levenshtein(3)  User Contributed Perl Documentation Text::Levenshtein(3)
2
3
4

NAME

6       Text::Levenshtein - calculate the Levenshtein edit distance between two
7       strings
8

SYNOPSIS

10        use Text::Levenshtein qw(distance);
11
12        print distance("foo","four");
13        # prints "2"
14
15        my @words     = qw/ four foo bar /;
16        my @distances = distance("foo",@words);
17
18        print "@distances";
19        # prints "2 0 3"
20

DESCRIPTION

22       This module implements the Levenshtein edit distance, which measures
23       the difference between two strings, in terms of the edit distance.
24       This distance is the number of substitutions, deletions or insertions
25       ("edits") needed to transform one string into the other one (and vice
26       versa).  When two strings have distance 0, they are the same.
27
28       To learn more about the Levenshtein metric, have a look at the
29       wikipedia page <http://en.wikipedia.org/wiki/Levenshtein_distance>.
30
31   distance()
32       The simplest usage will take two strings and return the edit distance:
33
34        $distance = distance('brown', 'green');
35        # returns 3, as 'r' and 'n' don't change
36
37       Instead of a single second string, you can pass a list of strings.
38       Each string will be compared to the first string passed, and a list of
39       the edit distances returned:
40
41        @words     = qw/ green trainee brains /;
42        @distances = distances('brown', @words);
43        # returns (3, 5, 3)
44
45   fastdistance()
46       Previous versions of this module provided an alternative
47       implementation, in the function "fastdistance()".  This function is
48       still provided, for backwards compatibility, but they now run the same
49       function to calculate the edit distance.
50
51       Unlike "distance()", "fastdistance()" only takes two strings, and
52       returns the edit distance between them.
53

ignore_diacritics

55       Both the "distance()" and "fastdistance()" functions can take a hashref
56       with optional arguments, as the final argument.  At the moment the only
57       option is "ignore_diacritics".  If this is true, then any diacritics
58       are ignored when calculating edit distance. For example, "cafe" and
59       "café" normally have an edit distance of 1, but when diacritics are
60       ignored, the distance will be 0:
61
62        use Text::Levenshtein 0.11 qw/ distance /;
63        $distance = distance($word1, $word2, {ignore_diacritics => 1});
64
65       If you turn on this option, then Unicode::Collate will be loaded, and
66       used when comparing characters in the words.
67
68       Early version of "Text::Levenshtein" didn't support this version, so
69       you should require version 0.11 or later, as above.
70

72       There are many different modules on CPAN for calculating the edit
73       distance between two strings. Here's just a selection.
74
75       Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the
76       Levenshtein algorithm that require a C compiler, but will be a lot
77       faster than this module.
78
79       The Damerau-Levenshtein edit distance is like the Levenshtein distance,
80       but in addition to insertion, deletion and substitution, it also
81       considers the transposition of two adjacent characters to be a single
82       edit.  The module Text::Levenshtein::Damerau defaults to using a pure
83       perl implementation, but if you've installed
84       Text::Levenshtein::Damerau::XS then it will be a lot quicker.
85
86       Text::WagnerFischer is an implementation of the Wagner-Fischer edit
87       distance, which is similar to the Levenshtein, but applies different
88       weights to each edit type.
89
90       Text::Brew is an implementation of the Brew edit distance, which is
91       another algorithm based on edit weights.
92
93       Text::Fuzzy provides a number of operations for partial or fuzzy
94       matching of text based on edit distance. Text::Fuzzy::PP is a pure perl
95       implementation of the same interface.
96
97       String::Similarity takes two strings and returns a value between 0
98       (meaning entirely different) and 1 (meaning identical).  Apparently
99       based on edit distance.
100
101       Text::Dice calculates Dice's coefficient
102       <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient> for two
103       strings. This formula was originally developed to measure the
104       similarity of two different populations in ecological research.
105

REPOSITORY

107       <https://github.com/neilbowers/Text-Levenshtein>
108

AUTHOR

110       Dree Mistrut originally wrote this module and released it to CPAN in
111       2002.
112
113       Josh Goldberg then took over maintenance and released versions between
114       2004 and 2008.
115
116       Neil Bowers (NEILB on CPAN) is now maintaining this module.  Version
117       0.07 was a complete rewrite, based on one of the algorithms on the
118       wikipedia page.
119

COPYRIGHT AND LICENSE

121       This software is copyright (C) 2002-2004 Dree Mistrut.  Copyright (C)
122       2004-2014 Josh Goldberg.  Copyright (C) 2014- Neil Bowers.
123
124       This is free software; you can redistribute it and/or modify it under
125       the same terms as the Perl 5 programming language system itself.
126
127
128
129perl v5.30.1                      2020-01-30              Text::Levenshtein(3)