5ol.cc 发表于 2017-5-7 08:38:51

Peter Norvig用python写的拼写纠错

文章在这里:
http://www.norvig.com/spell-correct.html
import re, string, collections
def words(text): return re.findall('+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model += 1
return model
NWORDS = train(words(file('Documents/holmes.txt').read()))
def edits1(word):
n = len(word)
return set(+word for i in range(n)] + ## deletion
+word+word+word for i in range(n-1)] + ## transposition
+c+word for i in range(n) for c in string.lowercase] + ## alteration
+c+word for i in range(n+1) for c in string.lowercase]) ## insertion
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
return max(known() or known(edits1(word)) or known_edits2(word) or ,
key=lambda w: NWORDS)


牛人就是牛人,这几行代码是在飞机上写的.
页: [1]
查看完整版本: Peter Norvig用python写的拼写纠错