Finding the Levenshtein distance in Python

I discovered today that Moneygement won't accept unicode characters when someone adds transactions by email because of the editdist module I used to check it. Since I don't really need a fast function to do it (it's all eight-letter words on average), I decided to write my own Python version of the function and am sharing it here if anyone needs it (because I haven't found a Python implementation anywhere). It's released under a BSD license with attribution, meaning that I'd like it if my name was mentioned where it is used :)

def levenshtein_distance(first, second):
    """Find the Levenshtein distance between two strings."""
    if len(first) > len(second):
        first, second = second, first
    if len(second) == 0:
        return len(first)
    first_length = len(first) + 1
    second_length = len(second) + 1
    distance_matrix = [range(second_length) for x in range(first_length)]
    for i in range(1, first_length):
        for j in range(1, second_length):
            deletion = distance_matrix[i-1][j] + 1
            insertion = distance_matrix[i][j-1] + 1
            substitution = distance_matrix[i-1][j-1]
            if first[i-1] != second[j-1]:
                substitution += 1
            distance_matrix[i][j] = min(insertion, deletion, substitution)

    return distance_matrix[first_length-1][second_length-1]
Related material

Peter Norvig (of the AI book and Google fame) has an interesting article on spelling correction, which is a fairly typical use of levenshtein. In his case though he generates all of the words within edit distance 1 of a source word, which although very computationally intensive might be of interest as well.

His code, from the link, is below..


alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
n = len(word)
return set([word[0:i]+word[i+1:] for i in range(n)] + # deletion
[word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + # transposition
[word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] + # alteration
[word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet]) # insertion

Submitted by Ross (not verified) on Wed, 23/04/2008 - 11:20.
Re: Norvig

Oh, I remember reading that, it was indeed very interesting, thanks for the link.
---
Vidi, Vici, Veni.

Submitted by Poromenos on Wed, 23/04/2008 - 12:43.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Google ads

(You can disable these if you log in)