Scrapy 源代码分析系列-4 scrapy.commands 子包
子包scrapy.commands定义了在命令scrapy中使用的子命令(subcommand): bench, check, crawl, deploy, edit, fetch,
genspider, list, parse, runspider, settings, shell, startproject, version, view。 所有的子命令模块都定义了一个继承自
类ScrapyCommand的子类Command。
首先来看一下子命令crawl, 该子命令用来启动spider。
1. crawl.py
关注的重点在方法run(self, args, opts):
1 def run(self, args, opts):
2 if len(args) < 1:
3 raise UsageError()
4 elif len(args) > 1:
5 raise UsageError("running 'scrapy crawl' with more than one spider is no longer supported ")
6 spname = args[0]
7
8 crawler = self.crawler_process.create_crawler() # A
9 spider = crawler.spiders.create(spname, **opts.spargs) # B
10 crawler.crawl(spider) # C
11 self.crawler_process.start() # D
那么问题来啦,run接口方法是从哪里调用的呢? 让我们回到 Python.Scrapy.11-scrapy-source-code-analysis-part-1
中 "1.2 cmdline.py command.py" 关于"_run_print_help() "的说明。
A: 创建类Crawler对象crawler。在创建Crawler对象时, 同时将创建Crawler对象的实例属性spiders(SpiderManager)。如下所示:
1 class Crawler(object):
2
3 def __init__(self, settings):
4 self.configured = False
5 self.settings = settings
6 self.signals = SignalManager(self)
7 self.stats = load_object(settings['STATS_CLASS'])(self)
8 self._start_requests = lambda: ()
9 self._spider = None
10 # TODO: move SpiderManager to CrawlerProcess
11 spman_cls = load_object(self.settings['SPIDER_MANAGER_CLASS'])
12 self.spiders = spman_cls.from_crawler(self) # spiders 的类型是: SpiderManager
Crawler对象对应一个SpiderManager对象,而SpiderManager对象管理多个Spider。
B: 获取Sipder对象。
C: 为Spider对象安装Crawler对象。(为蜘蛛安装爬行器)
D: 类CrawlerProcess的start()方法如下:
1 defstart (self):
2 if self.start_crawling():
3 self.start_reactor()
4
5 defstart_crawling (self):
6 log.scrapy_info(self.settings)
7 return self._start_crawler() is not None
8
9 defstart_reactor (self):
10 if self.settings.getbool('DNSCACHE_ENABLED'):
11 reactor.installResolver(CachingThreadedResolver(reactor))
12 reactor.addSystemEventTrigger('before', 'shutdown', self.stop)
13 reactor.run(installSignalHandlers=False) # blocking call
14
15 def_start_crawler (self):
16 if not self.crawlers or self.stopping:
17 return
18
19 name, crawler = self.crawlers.popitem()
20 self._active_crawler = crawler
21 sflo = log.start_from_crawler(crawler)
22 crawler.configure()
23 crawler.install()
24 crawler.signals.connect(crawler.uninstall, signals.engine_stopped )
25 if sflo:
26 crawler.signals.connect(sflo.stop, signals.engine_stopped )
27 crawler.signals.connect(self._check_done, signals.engine_stopped )
28 crawler.start() # 调用类Crawler的start()方法
29 return name, crawler
类Crawler的start()方法如下:
1 def start(self):
2 yield defer.maybeDeferred(self.configure)
3 if self._spider:
4 yield self.engine .open_spider(self._spider, self._start_requests()) # 和Engine建立了联系 (ExecutionEngine)
5 yield defer.maybeDeferred(self.engine.start)
关于类ExecutionEngine将在子包scrapy.core分析涉及。
2. startproject.py
3. subcommand是如何加载的
在cmdline.py的方法execute()中有如下几行代码:
1 inproject = inside_project()
2 cmds =_get_commands_dict (settings, inproject)
3 cmdname = _pop_command_name (argv)
_get_commands_dict():
1 def _get_commands_dict(settings, inproject):
2 cmds = _get_commands_from_module ('scrapy.commands', inproject)
3 cmds.update(_get_commands_from_entry_points(inproject))
4 cmds_module = settings['COMMANDS_MODULE']
5 if cmds_module:
6 cmds.update(_get_commands_from_module(cmds_module, inproject))
7 return cmds
_get_commands_from_module():
1 def _get_commands_from_module(module, inproject):
2 d = {}
3 for cmd in_iter_command_classes (module):
4 if inproject or not cmd.requires_project:
5 cmdname = cmd.__module__.split('.')[-1]
6 d[cmdname] = cmd()
7 return d
To Be Continued
接下来解析settings相关的逻辑。Python.Scrapy.15-scrapy-source-code-analysis-part-5
运维网声明
1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网 享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com