线程池python

hb120973135 · 发表于 2015-4-26 09:08:13

　　原创博文，转载请注明出处
　　今天在学习python进程与线程时，无意间发现了线程池threadpool模块，见官方文档。
　　模块使用非常简单，前提是得需要熟悉线程池的工作原理。
　　我们知道系统处理任务时，需要为每个请求创建和销毁对象。当有大量并发任务需要处理时，再使用传统的多线程就会造成大量的资源创建销毁导致服务器效率的下降。这时候，线程池就派上用场了。线程池技术为线程创建、销毁的开销问题和系统资源不足问题提供了很好的解决方案。
　　优点：
　　（1）可以控制产生线程的数量。通过预先创建一定数量的工作线程并限制其数量，控制线程对象的内存消耗。（2）降低系统开销和资源消耗。通过对多个请求重用线程，线程创建、销毁的开销被分摊到了多个请求上。另外通过限制线程数量，降低虚拟机在垃圾回收方面的开销。（3）提高系统响应速度。线程事先已被创建，请求到达时可直接进行处理，消除了因线程创建所带来的延迟，另外多个线程可并发处理。
　　线程池的基本实现方法：
　　（1）线程池管理器。创建并维护线程池，根据需要调整池的大小，并监控线程泄漏现象。
　　（2）工作线程。它是一个可以循环执行任务的线程，没有任务时处于 Wait 状态，新任务到达时可被唤醒。
　　（3）任务队列。它提供一种缓冲机制，用以临时存放待处理的任务，同时作为并发线程的 monitor 对象。
　　（4）任务接口。它是每个任务必须实现的接口，工作线程通过该接口调度任务的执行。
   构建线程池管理器时，首先初始化任务队列（Queue），运行时通过调用添加任务的方法将任务添加到任务队列中。之后创建并启动一定数量的工作线程，将这些线程保存在线程队列中。线程池管理器在运行时可根据需要增加或减少工作线程数量。工作线程运行时首先锁定任务队列，以保证多线程对任务队列的正确并发访问，如队列中有待处理的任务，工作线程取走一个任务并释放对任务队列的锁定，以便其他线程实现对任务队列的访问和处理。在获取任务之后工作线程调用任务接口完成对任务的处理。当任务队列为空时，工作线程加入到任务队列的等待线程列表中，此时工作线程处于 Wait 状态，几乎不占 CPU 资源。一旦新的任务到达，通过调用任务列表对象的notify方法，从等待线程列表中唤醒一个工作线程以对任务进行处理。通过这种协作模式，既节省了线程创建、销毁的开销，又保证了对任务的并发处理，提高了系统的响应速度。
　　简而言之：就是把并发执行的任务传递给一个线程池，来替代为每个并发执行的任务都启动一个新的线程。只要池里有空闲的线程，任务就会分配给一个线程执行。

1 pool = ThreadPool(poolsize)
2 requests = makeRequests(some_callable,list_of_args,callback)
3 [pool.putRequest(req) for req in requests]
4 pool.wait()
　　第一行的意思是创建一个可存放poolsize个数目的线程的线程池。
　　第二行的意思是调用makeRequests创建请求。 some_callable是需要开启多线程处理的函数，list_of_args是函数参数，callback是可选参数回调，默认是无。
　　第三行的意思是把运行多线程的函数放入线程池中。
　　最后一行的意思是等待所有的线程完成工作后退出。
　　通过分析源代码，其实发现里面的内容很简单。

  1 import sys
  2 import threading
  3 import Queue
  4 import traceback
  5
  6
  7 # exceptions
  8 class NoResultsPending(Exception):
  9    """All work requests have been processed."""
10    pass
11
12 class NoWorkersAvailable(Exception):
13    """No worker threads available to process remaining requests."""
14    pass
15
16
17 # internal module helper functions
18 def _handle_thread_exception(request, exc_info):
19    """Default exception handler callback function.
20
21    This just prints the exception info via ``traceback.print_exception``.
22
23    """
24    traceback.print_exception(*exc_info)
25
26
27 # utility functions
28 def makeRequests(callable_, args_list, callback=None,  #用来创建多个任务请求 callback是回调函数处理结果，exc_callback是用来处理发生的异常
29       exc_callback=_handle_thread_exception):
30    """Create several work requests for same callable with different arguments.
31
32    Convenience function for creating several work requests for the same
33    callable where each invocation of the callable receives different values
34    for its arguments.
35
36    ``args_list`` contains the parameters for each invocation of callable.
37    Each item in ``args_list`` should be either a 2-item tuple of the list of
38    positional arguments and a dictionary of keyword arguments or a single,
39    non-tuple argument.
40
41    See docstring for ``WorkRequest`` for info on ``callback`` and
42    ``exc_callback``.
43
44    """
45    requests = []
46    for item in args_list:
47       if isinstance(item, tuple):
48          requests.append(
49                WorkRequest(callable_, item[0], item[1], callback=callback,
50                   exc_callback=exc_callback)
51          )
52       else:
53          requests.append(
54                WorkRequest(callable_, [item], None, callback=callback,
55                   exc_callback=exc_callback)
56          )
57    return requests
58
59
60 # classes
61 class WorkerThread(threading.Thread):    #工作线程
62    """Background thread connected to the requests/results queues.
63
64    A worker thread sits in the background and picks up work requests from
65    one queue and puts the results in another until it is dismissed.
66
67    """
68
69    def __init__(self, requests_queue, results_queue, poll_timeout=5, **kwds):
70       """Set up thread in daemonic mode and start it immediatedly.
71
72       ``requests_queue`` and ``results_queue`` are instances of
73       ``Queue.Queue`` passed by the ``ThreadPool`` class when it creates a new
74       worker thread.
75
76       """
77       threading.Thread.__init__(self, **kwds)
78       self.setDaemon(1)
79       self._requests_queue = requests_queue    #任务队列
80       self._results_queue = results_queue    #结果队列
81       self._poll_timeout = poll_timeout
82       self._dismissed = threading.Event()
83       self.start()
84
85    def run(self):
86       """Repeatedly process the job queue until told to exit."""
87       while True:
88          if self._dismissed.isSet():  #如果标识位设为True，则表示线程非阻塞
89                # we are dismissed, break out of loop
90                break
91          # get next work request. If we don't get a new request from the
92          # queue after self._poll_timout seconds, we jump to the start of
93          # the while loop again, to give the thread a chance to exit.
94          try:
95                request = self._requests_queue.get(True, self._poll_timeout)#获取待处理任务，block设为True，标识线程同步，并设置超时时间
96          except Queue.Empty:
97                continue
98          else:
99                if self._dismissed.isSet():再次判断，因为在取任务期间，线程有可能被挂起
100                   # we are dismissed, put back request in queue and exit loop
101                   self._requests_queue.put(request) #添加任务到任务队列
102                   break
103                try:
104                   result = request.callable(*request.args, **request.kwds)
105                   self._results_queue.put((request, result))
106                except:
107                   request.exception = True
108                   self._results_queue.put((request, sys.exc_info()))
109
110    def dismiss(self):
111       """Sets a flag to tell the thread to exit when done with current job."""
112       self._dismissed.set()
113
114
115 class WorkRequest:    #创建单个任务请求
116    """A request to execute a callable for putting in the request queue later.
117
118    See the module function ``makeRequests`` for the common case
119    where you want to build several ``WorkRequest`` objects for the same
120    callable but with different arguments for each call.
121
122    """
123
124    def __init__(self, callable_, args=None, kwds=None, requestID=None,
125          callback=None, exc_callback=_handle_thread_exception):
126       """Create a work request for a callable and attach callbacks.
127
128       A work request consists of the a callable to be executed by a
129       worker thread, a list of positional arguments, a dictionary
130       of keyword arguments.
131
132       A ``callback`` function can be specified, that is called when the
133       results of the request are picked up from the result queue. It must
134       accept two anonymous arguments, the ``WorkRequest`` object and the
135       results of the callable, in that order. If you want to pass additional
136       information to the callback, just stick it on the request object.
137
138       You can also give custom callback for when an exception occurs with
139       the ``exc_callback`` keyword parameter. It should also accept two
140       anonymous arguments, the ``WorkRequest`` and a tuple with the exception
141       details as returned by ``sys.exc_info()``. The default implementation
142       of this callback just prints the exception info via
143       ``traceback.print_exception``. If you want no exception handler
144       callback, just pass in ``None``.
145
146       ``requestID``, if given, must be hashable since it is used by
147       ``ThreadPool`` object to store the results of that work request in a
148       dictionary. It defaults to the return value of ``id(self)``.
149
150       """
151       if requestID is None:
152          self.requestID = id(self) #id返回对象的内存地址
153       else:
154          try:
155                self.requestID = hash(requestID) #哈希处理
156          except TypeError:
157                raise TypeError("requestID must be hashable.")
158       self.exception = False
159       self.callback = callback
160       self.exc_callback = exc_callback
161       self.callable = callable_
162       self.args = args or []
163       self.kwds = kwds or {}
164
165    def __str__(self):
166       return "" % \
167          (self.requestID, self.args, self.kwds, self.exception)
168
169 class ThreadPool:  #线程池管理器
170    """A thread pool, distributing work requests and collecting results.
171
172    See the module docstring for more information.
173
174    """
175
176    def __init__(self, num_workers, q_size=0, resq_size=0, poll_timeout=5):
177       """Set up the thread pool and start num_workers worker threads.
178
179       ``num_workers`` is the number of worker threads to start initially.
180
181       If ``q_size > 0`` the size of the work *request queue* is limited and
182       the thread pool blocks when the queue is full and it tries to put
183       more work requests in it (see ``putRequest`` method), unless you also
184       use a positive ``timeout`` value for ``putRequest``.
185
186       If ``resq_size > 0`` the size of the *results queue* is limited and the
187       worker threads will block when the queue is full and they try to put
188       new results in it.
189
190       .. warning:
191          If you set both ``q_size`` and ``resq_size`` to ``!= 0`` there is
192          the possibilty of a deadlock, when the results queue is not pulled
193          regularly and too many jobs are put in the work requests queue.
194          To prevent this, always set ``timeout > 0`` when calling
195          ``ThreadPool.putRequest()`` and catch ``Queue.Full`` exceptions.
196
197       """
198       self._requests_queue = Queue.Queue(q_size)  #任务队列
199       self._results_queue = Queue.Queue(resq_size) #结果队列
200       self.workers = []  #工作线程
201       self.dismissedWorkers = [] #睡眠线程
202       self.workRequests = {}  #一个字典键是id 值是request
203       self.createWorkers(num_workers, poll_timeout)
204
205    def createWorkers(self, num_workers, poll_timeout=5):
206       """Add num_workers worker threads to the pool.
207
208       ``poll_timout`` sets the interval in seconds (int or float) for how
209       ofte threads should check whether they are dismissed, while waiting for
210       requests.
211
212       """
213       for i in range(num_workers):
214          self.workers.append(WorkerThread(self._requests_queue,
215                self._results_queue, poll_timeout=poll_timeout))
216
217    def dismissWorkers(self, num_workers, do_join=False):
218       """Tell num_workers worker threads to quit after their current task."""
219       dismiss_list = []
220       for i in range(min(num_workers, len(self.workers))):
221          worker = self.workers.pop()
222          worker.dismiss()
223          dismiss_list.append(worker)
224
225       if do_join:
226          for worker in dismiss_list:
227                worker.join()
228       else:
229          self.dismissedWorkers.extend(dismiss_list)
230
231    def joinAllDismissedWorkers(self):
232       """Perform Thread.join() on all worker threads that have been dismissed.
233       """
234       for worker in self.dismissedWorkers:
235          worker.join()
236       self.dismissedWorkers = []
237
238    def putRequest(self, request, block=True, timeout=None):
239       """Put work request into work queue and save its id for later."""
240       assert isinstance(request, WorkRequest)
241       # don't reuse old work requests
242       assert not getattr(request, 'exception', None)
243       self._requests_queue.put(request, block, timeout)
244       self.workRequests[request.requestID] = request  #确立一对一对应关系一个id对应一个request
245
246    def poll(self, block=False):#处理任务,
247       """Process any new results in the queue."""
248       while True:
249          # still results pending?
250          if not self.workRequests: #没有任务
251                raise NoResultsPending
252          # are there still workers to process remaining requests?
253          elif block and not self.workers:#无工作线程
254                raise NoWorkersAvailable
255          try:
256                # get back next results
257                request, result = self._results_queue.get(block=block)
258                # has an exception occured?
259                if request.exception and request.exc_callback:
260                   request.exc_callback(request, result)
261                # hand results to callback, if any
262                if request.callback and not \
263                      (request.exception and request.exc_callback):
264                   request.callback(request, result)
265                del self.workRequests[request.requestID]
266          except Queue.Empty:
267                break
268
269    def wait(self):
270       """Wait for results, blocking until all have arrived."""
271       while 1:
272          try:
273                self.poll(True)
274          except NoResultsPending:
275                break
　　有三个类 ThreadPool，workRequest，workThread，
　　第一步我们需要建立一个线程池调度ThreadPool实例（根据参数而产生多个线程works），然后再通过makeRequests创建具有多个不同参数的任务请求workRequest，然后把任务请求用putRequest放入线程池中的任务队列中，此时线程workThread就会得到任务callable，然后进行处理后得到结果，存入结果队列。如果存在callback就对结果调用函数。
　　注意：结果队列中的元素是元组(request,result)这样就一一对应了。
　　在我的下一篇文章关于爬虫方面的，我将尝试使用线程池来加强爬虫的爬取效率。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] 线程池python

扫码加入运维网微信交流群