tomcat 性能之谜

wendu 发表于 2013-3-29 09:01:32

从接触java web开发，并用tomcat部署了第一个jsp应用已经有好些年了，一直以来java web应用的部署都是依赖类似tomcat这种符合servlet规范的容器。应用部署在容器中运行，过往的经验感觉应用的表现在各种容器中其实差不太多，倒也没太在意容器本身的性能。

在最近的一个项目中，严重依赖了tomcat的comet机制，并针对这个项目做了比较全面的性能测试，感觉整体性能表现并没有达到预期。开始猜想tomcat comet机制的实现不是存在一些性能问题，于是开始对tomcat容器本身的性能进行一个摸底测试，做一些定量的分析。

开始之前，我们先进行一个猜测，形成一个预期值，并根据实测结果来检验与预期值的差异。过去很多tomcat应用项目都做过针对应用本身的性能测试，测试结果tps根据应用类型的不同在几百到一千左右，所以一直以来形成一个感觉就是tomcat的tps大概也就几千。由于这次是测试tomcat容器本身，所以我们排除掉应用的执行延迟，只使用一个最简单的应用类型：echo 响应。tomcat 的版本选择了线上常用的 6.0.33（bio模型）和 7.0.35（comet nio 模型）。
测试环境如下：
客户端-服务端： model name: Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz    cache size: 6144 KB    cpu cores:4    jdk:    1.6.0_30-b12 network: 1000Mb memory: -Xms256m -Xmx256m Linux:    centos 5.7, kernel 2.6.18-274.el5
测试工具： apache benchmark
版本： tomcat       6.0.33 tomcat       7.0.35

首先，实现了一个EchoServlet，它返回固定1k左右的字符串, 并且关闭了http的keep-alive机制，tomcat connector 采用默认的bio模型。测试结果还是挺让人吃惊的，tps均值达到了22000左右（如下所示）
tomcat 6.0.33 bio servlet
This is ApacheBench, Version 2.3 <$Revision: 655654 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.28.170.56 (be patient)Completed 100000 requestsCompleted 200000 requestsCompleted 300000 requestsCompleted 400000 requestsCompleted 500000 requestsCompleted 600000 requestsCompleted 700000 requestsCompleted 800000 requestsCompleted 900000 requestsCompleted 1000000 requestsFinished 1000000 requests

Server Software:    Apache-Coyote/1.1Server Hostname:    10.28.170.56Server Port:          8118
Document Path:       /tomcat-test/echo?t=1111111111111111111111******（填充至1k左右）Document Length:    1024 bytes
Concurrency Level:    32Time taken for tests: 45.096 secondsComplete requests:    1000000Failed requests:    0Write errors:       0Total transferred:    1206002412 bytesHTML transferred:    1024002048 bytesRequests per second: 22175.13 [#/sec] (mean)Time per request:    1.443 (mean)Time per request:    0.045 (mean, across all concurrent requests)Transfer rate:       26116.47 received
Connection Times (ms)          minmean[+/-sd] median maxConnect:    0 0 0.2    0    3Processing: 0 1 0.8    1    30Waiting:    0 1 0.8    1    29Total:       0 1 0.8    1    30
Percentage of the requests served within a certain time (ms)50%    166%    275%    280%    290%    295%    298%    299%    2 100% 30 (longest request)
Cpu usage: 180%

注：测试过程中产生过2次吞吐抖动，rps降至16000左右，随后多次重复测试又恢复正常

接着，我们换用 tomcat comet 实现EchoServlet，connector采用了nio模型，使用了tomcat 7.0.35版本（6.x版本的NIO模型很不稳定，bug较多）测试结果tps略微低于bio模型，但nio模型在测试过程中，多次出现异常导致ab测试直接中断。
tomcat 7.0.35 nio servlet
This is ApacheBench, Version 2.3 <$Revision: 655654 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.28.170.56 (be patient)Completed 100000 requestsCompleted 200000 requestsCompleted 300000 requestsCompleted 400000 requestsCompleted 500000 requestsCompleted 600000 requestsCompleted 700000 requestsCompleted 800000 requestsCompleted 900000 requestsCompleted 1000000 requestsFinished 1000000 requests

Server Software:    Apache-Coyote/1.1Server Hostname:    10.28.170.56Server Port:          8118
Document Path:       /craft-cell-test/echo?t=1111111111111111111111******（填充至1k左右）Document Length:    1024 bytes
Concurrency Level:    32Time taken for tests: 49.072 secondsComplete requests:    1000000Failed requests:    0Write errors:       0Total transferred:    1206000000 bytesHTML transferred:    1024000000 bytesRequests per second: 20378.37 [#/sec] (mean)Time per request:    1.570 (mean)Time per request:    0.049 (mean, across all concurrent requests)Transfer rate:       24000.31 received
Connection Times (ms)          minmean[+/-sd] median maxConnect:    0 0 0.2    0    2Processing: 0 1 3.0    1 359Waiting:    0 1 1.3    1    36Total:       0 2 3.0    1 359
Percentage of the requests served within a certain time (ms)50%    166%    275%    280%    290%    295%    298%    399%    3 100% 359 (longest request)
Cpu usage: 200%
注：测试过程中经常出现连接被意外重置导致 ab 中断(apr_socket_recv: Connection reset by peer (104))，另有一次出现5个请求失败，可以推测 tomcat nio connector 的实现还存在潜在 bug

从上面的摸底测试中，我们发现tomcat bio模型表现相对稳定，nio模型还存在潜在bug经常使测试中断或出错。并且bio模型其实针对echo型的应用性能表现更佳，nio模型更适合需要保持大量连接的场景。而且tomcat的性能表现也远远超过我们之前的预期，那么为什么我们的应用部署到tomcat中后，在性能测试中表现不佳，tps经常只有几百不足一千？是我们应用写的太差导致的？首先让想到的是，我们现在开发java web应用通常采用一些框架，例如ssh或ssi，会不会跟框架有关？在性能测试echo场景下，用不到spring、hibernate或ibatis之类，于是仅把struts引入，实现一个EchoAction来测测看呢。
tomcat 7.0.35 nio struts2 dynamic method invocation
This is ApacheBench, Version 2.3 <$Revision: 655654 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.28.170.56 (be patient)Completed 10000 requestsCompleted 20000 requestsCompleted 30000 requestsCompleted 40000 requestsCompleted 50000 requestsCompleted 60000 requestsCompleted 70000 requestsCompleted 80000 requestsCompleted 90000 requestsCompleted 100000 requestsFinished 100000 requests

Server Software:    Apache-Coyote/1.1Server Hostname:    10.28.170.56Server Port:          8118
Document Path:       /craft-cell-test/echo?t=1111111111111111111111******（填充至1k左右）Document Length:    1024 bytes
Concurrency Level:    32Time taken for tests: 9.264 secondsComplete requests:    100000Failed requests:    0Write errors:       0Total transferred:    116501165 bytesHTML transferred:    102401024 bytesRequests per second: 10793.99 [#/sec] (mean)Time per request:    2.965 (mean)Time per request:    0.093 (mean, across all concurrent requests)Transfer rate:       12280.39 received
Connection Times (ms)          minmean[+/-sd] median maxConnect:    0 0 0.0    0    2Processing: 0 3 1.7    3    41Waiting:    0 3 1.3    2    27Total:       0 3 1.7    3    41
Percentage of the requests served within a certain time (ms)50%    366%    375%    380%    490%    495%    598%    799% 11 100% 41 (longest request)
Cpu usage: 280%

测试结果确实让人挺吃惊的，struts引入后tps下降为原来的一半，并且cpu消耗也上升了不少，而这还只是一个最简单的EchoAction。而现实中的很多web项目，基本都是采用ssh/ssi来开发的，实现业务的action基本也比这个EchoAction复杂不少。这样想来也就可以理解应用性能测试中为什么tps比这种基准性能测试低上一个数量级了。

后记：最近，负责的一个线上系统调用其他组的webservice接口，总是出现时不时超时的情况，调用接口的响应时间波动范围巨大，从数毫秒到数十秒之间，超时都是超过了30秒的情况。服务提供方的webservice接口基于cxf实现，和他们的应用一同部署在tomcat中，接口的逻辑很简单，数据传输量也很小（几百字节）那为什么调用响应时间波动如此巨大？其实这种现象的发生和我们目前的应用开发部署模式有很大关系，一个有一定复杂度的业务应用系统，一般我们现在采用B/S架构，基于ssh框架开发，部署到一个tomcat实例中运行。这样做的好处是开发简单，部署也简单，但带来的问题是同一个业务应用中，不同的业务场景处理消耗的资源差异巨大，服务没法隔离，服务质量不能得到保障。比如，一个系统提供了一个查询页面，查询条件由用户指定，查询过程的处理需要访问数据库，那么这个业务的处理耗时就相当不确定，可能很快也可能很慢，由具体查询涉及的数据集大小决定。同时该系统也提供一个快速k-v式查询服务接口供其他外系统调用，如果此时有很多人在通过查询页面查询一个大数据集，可能很快就消耗完tomcat的线程池，这时外系统再调用k-v快速查询服务接口也不得不等待慢查询的结束才能获取到结果。

目前，我们大多数基于tomcat容器的业务系统都没去做到基于业务来隔离线程池，这也是为什么应用系统在综合业务场景下的性能测试表现不佳。线上一些应用服务接口响应时间的抖动如此巨大，也是与此密切相关。看来，一开始我们对tomcat容器性能不佳的猜测经实测数据验证是错误的，tomcat本身性能并无问题，有问题的是我们采用的应用开发模式，甚至包括我们选择的一些应用开发框架。

小时？ 发表于 2013-3-29 09:34:53

我喜欢孩子，更喜欢造孩子的过程!

cxwpf200 发表于 2013-5-16 18:53:16

所有的男人生来平等，结婚的除外。

帝王发表于 2013-5-18 00:41:19

长大了娶唐僧做老公，能玩就玩一玩，不能玩就把他吃掉。

qq78707 发表于 2013-5-19 04:11:31

睡眠是一门艺术——谁也无法阻挡我追求艺术的脚步！

hncys 发表于 2013-5-20 13:59:23

真是收益匪浅

284354749 发表于 2013-5-21 23:52:12

你的丑和你的脸没有关系。。。。。。

页: [1]

运维网's Archiver

tomcat 性能之谜