%util
至少有一个活跃请求所占时间的百分比。更好的说法应该是,服务时间所占的百分比。以上面的输出为例。 一秒中内,读了2.5次,写了1.8次,每次请求的实际请求时间(不包括排队时间)为6.0ms,那么总的时间花费为(2.5+1.8)*6.0ms,即25.8ms,0.0258秒,转换成百分比再四舍五入就得到了util的值2.6%。
%util: When this figure is consistently approaching above 80% you will need to take any of the following actions -
increasing RAM so dependence on disk reduces
increasing RAID controller cache so disk dependence decreases
increasing number of disks so disk throughput increases (more spindles working parallely)
(await-svctim)/await*100: The percentage of time that IO operations spent waiting in queue in comparison to actually being serviced. If this figure goes above 50% then each IO request is spending more time waiting in queue than being processed. If this ratio skews heavily upwards (in the >75% range) you know that your disk subsystem is not being able to keep up with the IO requests and most IO requests are spending a lot of time waiting in queue. In this scenario you will again need to take any of the actions above
%iowait: This number shows the % of time the CPU is wasting in waiting for IO. A part of this number can result from network IO, which can be avoided by using an Async IO library. The rest of it is simply an indication of how IO-bound your application is. You can reduce this number by ensuring that disk IO operations take less time, more data is available in RAM, increasing disk throughput by increasing number of disks in a RAID array, using SSD (Check my post on Solid State drives vs Hard Drives) for portions of the data or all of the data etc
uptime
uptime是最最最最简单的工具了,但是也有值得讨论的地方,那就是它最后输出的三个数字是怎么得来的,有什么含义?
这三个数字的含义分别是1分钟、5分钟、15分钟内系统的平均负荷,关于这些数字的含义和由来,推荐看阮一峰的文章《理解linux系统负荷这里》。
我想说的是,这三个数字我天天看,时时看,但是我从来不相信它。
此话怎讲呢?我之所以天天看,时时看,是因为我将这三个数字显示到了tmux 的状态栏上,所以,我任何时候都可以看到,而不用专门输入uptime这个命令。
为什么我不相信它呢,因为这几个数字只能说明有多少线程在等待cpu,如果我们的一个任务有很多线程,系统负载不是特别高,但是这几个数字会出奇的高,也就是不能完全相信uptime的原因。如果不信,可以执行下面这条语句,然后再看看uptime的输出结果。
sysbench --test=mutex --num-threads=1600--mutex-num=2048 \ --mutex-locks=1000000--mutex-loops=5000 run运行了5分钟以后,我的电脑上输出如下所示。需要强调的是,这个时候电脑一点不卡,看uptime来判断系统负载,跟听cpu风扇声音判断系统负载一样。只能作为线索,不能作为系统负载很高的依据。
20:32:39 up 10:21,4 users, load average:386.53,965.37,418.57《Linux Performance Analysis and Tools》里面也说了,This is only useful as a clue. Use other tools to investigate! lsof
lsof(list open files)是一个列出当前系统打开文件的工具。在linux环境下,任何事物都以 文件的形式存在,通过文件不仅仅可以访问常规数据,还可以访问网络连接和硬件。所以如传输 控制协议(TCP)和用户数据报协议(UDP)套接字等,系统在后台都为该应用程序分配了一个文件描 述符,无论这个文件的本质如何,该文件描述符为应用程序与基础操作系统之间的交互提供了 通用接口。因为应用程序打开文件的描述符列表提供了大量关于这个应用程序本身的信息,因此 通过lsof工具能够查看这个列表对系统监测以及排错将是很有帮助的。
lsof 的使用方法可以参考这里。这里仅列出几种常见的用法。