One of my tasks at work is to match one set of strings to another set of strings. Some of the strings in one set is similar to the strings in another set.
I used the string matching method to do this. Calculate the similarity score between one string and each string in another set using Levenshtein algorithm, and sort the similarity scores, the string with the highest score with be assigned as the match to the original string.
Here is the wiki page for the Levenshtein algorithm:
https://en.wikipedia.org/wiki/Levenshtein_distance
Here is part of the code.
The whole code is very long. We used some other searching logic to refine the search.
We deployed it to a web service.
Similarly, google used some searching algorithm. This method can possibly be a searching algorithm.
if level2=='UNK' and level3=='unk' and level1!='unk':
ee={}
level=level1
for e in range(len(feature)):
d2=float(Levenshtein.ratio(str(level),feature[e][len(feature[e])-1]))
ee[str(feature[e][2].strip())]=d2
my_list=sorted(ee.items(),key=lambda x:x[1],reverse=True)[:5]
match=selection(my_list,avails_dic)
if logging_level>5:
print "returned value : % s" % match
step2=time.strftime("%x %X")
tdelta = datetime.datetime.strptime(step2, FMT) - datetime.datetime.strptime(step1, FMT)
if logging_level>4:
print "first step ending time is:"+step2
print "first step used time is %s:" % tdelta
return match
No comments:
Post a Comment