One of my tasks at work is to match one set of strings to another set of strings. Some of the strings in one set is similar to the strings in another set.
I used the string matching method to do this. Calculate the similarity score between one string and each string in another set using Levenshtein algorithm, and sort the similarity scores, the string with the highest score with be assigned as the match to the original string.
Here is the wiki page for the Levenshtein algorithm:
https://en.wikipedia.org/wiki/Levenshtein_distance
Here is part of the code.
The whole code is very long. We used some other searching logic to refine the search.
We deployed it to a web service.
Similarly, google used some searching algorithm. This method can possibly be a searching algorithm.
if level2=='UNK' and level3=='unk' and level1!='unk':
ee={}
level=level1
for e in range(len(feature)):
d2=float(Levenshtein.ratio(str(level),feature[e][len(feature[e])-1]))
ee[str(feature[e][2].strip())]=d2
my_list=sorted(ee.items(),key=lambda x:x[1],reverse=True)[:5]
match=selection(my_list,avails_dic)
if logging_level>5:
print "returned value : % s" % match
step2=time.strftime("%x %X")
tdelta = datetime.datetime.strptime(step2, FMT) - datetime.datetime.strptime(step1, FMT)
if logging_level>4:
print "first step ending time is:"+step2
print "first step used time is %s:" % tdelta
return match
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Subscribe to:
Post Comments (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...
No comments:
Post a Comment