Find which cells have the smallest levenshtein distance
So, I have this Function which will quickly return the Levenshtein Distance between two Strings:
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long Dim i As Long, j As Long Dim string1_length As Long Dim string2_length As Long Dim distance() As Long string1_length = Len(string1) string2_length = Len(string2) ReDim distance(string1_length, string2_length) For i = 0 To string1_length distance(i, 0) = i Next For j = 0 To string2_length distance(0, j) = j Next For i = 1 To string1_length For j = 1 To string2_length If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then distance(i, j) = distance(i - 1, j - 1) Else distance(i, j) = Application.WorksheetFunction.Min _ (distance(i - 1, j) + 1, _ distance(i, j - 1) + 1, _ distance(i - 1, j - 1) + 1) End If Next Next Levenshtein = distance(string1_length, string2_length) End Function
I want to perform a fast comparison between all cells in the "A" column and return which ones have a "small" Levenshtein distance. How would I make all these comparisons?
Do you want to find which combinations of strings have small levenshtein distances or just overall how similar/disimilar each string is with all the other strings?
If it is the former this should work fine:
You just copy and paste transposed values to create all those headers(as Dale commented). You can use the conditional formatting to highlight the lowest results.
Or if you want the actual strings to return you should be able to use this:
Just copy and paste unique values if you want the returned combinations in a single column.