(安西) 发表于 2017-4-29 09:34:28

viterbi算法 python版

  牛mm细心给我讲了一个小时,终于明白它的含义,然后花了一两节分布式数据库的课实现了。当时牛mm还说不可能这么快实现,结果不可能事还是发生了。发现python果真非常好用。不明白此算法可以看这篇blog http://blog.csdn.net/NirvanaFeng/archive/2009/05/12/4171799.aspx
  初始化方法:
  def InitDicForViterbi(nodes,posw,posdi,n):newWordList = []# 解决未登录词for i in nodes:if not posw.has_key(i):newWordList.append(i)maxPosList = NPos(posdi,n)print maxPosListGenerateDicPos(newWordList,maxPosList,posw,posdi)def InitViterbi(node,posw,posdi):viStatePath = []for i in posw:if i <> "@@@":print "i:",iviStatePath.append(*posdi["@@@"],])return viStatePath
  viterbi算法函数
  """ nodes 就是分好的词, posw 是词转换为词性的概率,posdi是词性之间的转换概率,n 是n个最大的词性将此用于未登录词中,weightNone是未出现的词性转移的概率nodes format {word1,word2...} , posw format {word1:{pos1:fre,pos2:fre,..."@@@":totalnum},..."@@@":total}posdi format {pos1:{pos2:fre,pos3:fre...."@@@":total},...."@@@":total}"""def Viterbi(nodes,posw,posdi,n,weightNone):InitDicForViterbi(nodes,posw,posdi,n)viStatePath = InitViterbi(nodes,posw,posdi)length = len(nodes)currentNode = 1while currentNode < length:currentPosList = posw]paths = []# print "vstate:",viStatePathfor k in currentPosList:if k <> "@@@":ajk = weightNoneheap = []for j in xrange(len(viStatePath)):# compute every state j to every state k in ti#          temppath = viStatePath#         print "lastpos:",temppathlastpos = viStatePath[-1]lastweight = viStatePathlastposList = posdiif lastposList.has_key(k):ajk =lastposListcurrentweight = lastweight * ajk * currentPosList#            print "viStatePath:",viStatePathpathNew = ]pathNew.append(k)#             print "pathNew:",pathNewheappush(heap,)# get the max possibility of state k in ti#          print "path:",pathpaths.append(nlargest(1,heap))del viStatePathviStatePath = paths#    print "paths:",pathscurrentNode = currentNode + 1heap = []# get the max possibility pathfor i in viStatePath:heappush(heap,i)return nlargest(1,heap)
  结果打印输出函数:
  """nodes format {word1,word2,...} path is ]"""def Result(nodes,path,edcode="utf-8"):realPath = pathResultPrint(nodes,realPath,edcode)"""nodes format {word1,word2,...} path is """def ResultPrint(nodes,path,edcode="utf-8"):for i in xrange(len(nodes)):print nodes.decode(edcode),"/",path.decode(edcode)
  下面是测试代码:
  aa = ConvertGBKtoUTF("球球")    bb = ConvertGBKtoUTF("娃娃")cc = ConvertGBKtoUTF("吃饭")dd = ConvertGBKtoUTF("好")ee = ConvertGBKtoUTF("dddwieoewkem")dictions = {aa:{bb:1,"@@@":4},bb:{cc:2,aa:3,"@@@":40},"@@@":400}posdi = {"n":{"s":3,"v":3,"@@@":40},"s":{"v":2,"e":3,"@@@":33},"v":{"@@@":1},"@@@":100}posw = {aa:{"n":1,"v":3,"@@@":29},bb:{"n":1,"s":1,"@@@":19},cc:{"n":1,"@@@":1},"@@@":10002}nodes = path = Viterbi(nodes,posw,posdi,3,0.01)print pathResult(nodes,path)
  这里不要问为啥要encode,然后再decode 因为只有这样才能在屏幕上打印出中文。
页: [1]
查看完整版本: viterbi算法 python版