python 程序的性能分析优化（huffman编码程序性能分析的一个小结论）

andyyuduo · 发表于 2015-4-26 11:46:18

在前面的随笔，关于我写的用python 实现huffman,范式huffman的程序进行性能分析。http://www.iyunv.com/rocketfan/default.aspx?ref=setskin
发现问题出在file.read(1)的大量调用上，我现在不太清楚file.read(1)每次被调用是否都是去硬盘读还是从内存中缓存的文件内容中读，如果是有缓冲机制，那么
事实上调用file.read(1)和file.read(1000)减少调用self.read的次数读取大文件的效率其实应该差不多。
不过有一点是可以肯定的，如下
在Python中函数调用代价还是很大的。
在计算密集的地方，很大次数的循环体中要尽量减少函数调用的层次（能inline最好inline：））。
可爱的 Python: 基于生成器的状态机，一文中提到：http://www.ibm.com/developerworks/cn/linux/sdk/python/charm-26/
在 Python 中，函数调用代价不菲；除其它因素外，还要花一段时间解决函数参数列表（除了其它的事情外，还要分析位置参数和缺省参数）。初始化框架对象还要采取一些建立步骤（据 Tim Peters 在 comp.lang.python 上所说，有 100 多行 C 语言程序；我自己还没检查 Python 源代码呢）

对于huffman编码的程序而言，处理一个24M的文本，需要逐一处理其中所有的2429218个bytes，以计算所有字符出现的频率。
对于一个需要循环2429218的循环而言，调用2429218次file.read(1),和调用2429218/1000次的file.read(1000)在函数调用付出的代价会有很大不同的。
这是为什么用file.read(1)速度慢的重要原因。

我的程序里面在2429218次循环里分别尝试了使用
if cur == size #size 是一个常数

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
24292129 45.189 0.000 45.383 0.000 compressor.py:57(next)
      1 44.364 44.364 89.747 89.747 huffman.py:94(caculateFrequence)
24294 0.170 0.000 0.170 0.000 {method 'read' of 'file' objects}
24310 0.024 0.000 0.024 0.000 {len}
对比
if cur ==len(buf) # buf 是一个string

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
24292129 65.668 0.000 86.977 0.000 compressor.py:57(next)
      1 45.236 45.236  132.212  132.212 huffman.py:94(caculateFrequence)
24316439 21.027 0.000 21.027 0.000 {len}

对比一下就会发现，len(buf)带来的时间代价是不可忽略的，20s呢。

下面写了一个关于函数调用多一层调用带来的时间代价的验证小程序。

1 times = 24292128
2 def foo():
3    sum = 0
4    for i in range(10):
5       sum += 1
6    sum = 0
7
8 def useFoo():
9    foo()
10
11
12 def app1():
13    global times
14    for i in range(times):
15       foo()
16
17 def app2():
18    global times
19    for i in range(times):
20       useFoo()
21
22
23 app1()
24 app2()
25

运行结果：app2由于多了一层的函数调用，多消耗了将近1分钟的函数调用时间。

time python -m cProfile -s time testfunc.py
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
48584256  165.070 0.000  261.559 0.000 testfunc.py:2(foo)
48584258 98.150 0.000 98.150 0.000 {range}
24292128 53.236 0.000  184.595 0.000 testfunc.py:8(useFoo)
      1 32.634 32.634  163.862  163.862 testfunc.py:12(app1)
      1 32.256 32.256  217.485  217.485 testfunc.py:17(app2)
      1 0.001 0.001  381.348  381.348 {execfile}
      1 0.000 0.000  381.347  381.347 testfunc.py:1()
      1 0.000 0.000  381.348  381.348 :1()
      1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

1.回到huffman解码的例子，就需要一次读多个bytes，但是处理时候还是1个一个处理。直接把代码写在循环内部，不用函数调用，
这样速度最快读一遍24M文本，计算出每个字符频率。需要8-11s

1  self.infile.seek(0)
2       while True:
3          buf = self.infile.read(1000)
4          if buf == '':
5                break;
6          for c in buf:
7                if c in self.dict:
8                   self.dict[c] += 1
9                else:
10                   self.dict[c] = 0
这样类似与c++把函数inlie，但是python的动态特性使它不支持inline。可是这样处理如果以后在读文本，还要copy同样的code。不太好，
理想的情况我们的应用代码应该只是考虑每次读一个bytes，缓冲区的事情由其他地方写好的模块自动处理。
这样可读性维护性都比较好，直观，但是效率不能保证了，因为加入了函数调用。如果特别需要效率只能权衡了。
2.下面给出一个用iterator的实现，一个CharBufReader类，封装了buf，对外提供一次读取一个byte的接口(内部实现从buf读取，buf读完再fill buf）。
这样代码好复用。
因为提供next函数，所以可以用iterator访问。但是效率上很慢，和以前不优化，用file.read(1)差不多90s左右的时间。可以看出就是主要是因为
函数调用造成了原来程序速度慢。而不是因为不用自己写的缓冲读文件时间长。

1 class CharBufReader(object):
2    def __init__(self, mfile, bufSize = 1000):
3       self.mfile = mfile
4       #self.bufSize = 64 * 1024 #64k buf size
5       self.capacity = bufSize
6       self.buf = ''  #buf of char
7       self.cur = len(self.buf)
8       self.size = len(self.buf)
9
10    def __iter__(self):
11       return self
12
13    def next(self):
14       if self.cur == self.size:
15       #if self.cur == len(self.buf):
16       #if self.cur == self.buf.__len__():
17          self.buf = self.mfile.read(self.capacity)
18          self.size = len(self.buf)
19          if self.size == 0:
20                raise StopIteration
21          self.cur = 0
22       self.cur += 1
23       return self.buf[self.cur - 1]
24
25
26 class Compressor():
27    def caculateFrequence(self):
28       """The first time of reading the input file and caculate each
29       character frequence store in self.dict
30       """
31       self.infile.seek(0)
32       reader = compressor.CharBufReader(self.infile)
33       for c in reader:
34          if c in self.dict:
35                self.dict[c] += 1
36          else:
37                self.dict[c] = 0
3.网上查了一下用generator可以避免函数调用的代价，于是试了下，generator易于实现，好用，可读性强。
但是速度嘛，还是不如第一种情况，但是比第2种情况和优化前的程序要快。大概55S。

1    def readChar(self):
2       while True:
3          buf = self.infile.read(1000)
4          if buf == '':
5                break
6          for c in buf:
7                yield c
8
9    def caculateFrequence(self):
10       """The first time of reading the input file and caculate each
11       character frequence store in self.dict
12       """
13       self.infile.seek(0)
14       reader = self.readChar()
15       for c in reader:
16          if c in self.dict:
17                self.dict[c] += 1
18          else:
19                self.dict[c] = 0
g

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] python 程序的性能分析优化（huffman编码程序性能分析的一个小结论）

浏览过的版块

扫码加入运维网微信交流群