设为首页 收藏本站
查看: 1376|回复: 0

[经验分享] postfix队列健康状况分析(一)

[复制链接]

尚未签到

发表于 2015-11-24 09:47:41 | 显示全部楼层 |阅读模式
Postfix Bottleneck AnalysisIntroducing the qshape tool  When mail is draining slowly or the queue is unexpectedly large,run qshape(1) as the super-user (root) to help zero in on the problem.The qshape(1) program displays a tabular view of the Postfix queuecontents.  

  •   On the horizontal axis, it displays the queue age withfine granularity for recent messages and (geometrically) less finegranularity for older messages.  
  •   The vertical axis displays the destination (or with the"-s" switch the sender) domain. Domains with the most messages arelisted first.
  For example, in the output below we see the top 10 lines ofthe (mostly forged) sender domain distribution for captured spamin the "hold" queue:
$ qshape -s hold | head                         T  5 10 20 40 80 160 320 640 1280 1280+                 TOTAL 486  0  0  1  0  0   2   4  20   40   419             yahoo.com  14  0  0  1  0  0   0   0   1    0    12  extremepricecuts.net  13  0  0  0  0  0   0   0   2    0    11        ms35.hinet.net  12  0  0  0  0  0   0   0   0    1    11      winnersdaily.net  12  0  0  0  0  0   0   0   2    0    10           hotmail.com  11  0  0  0  0  0   0   0   0    1    10           worldnet.fr   6  0  0  0  0  0   0   0   0    0     6        ms41.hinet.net   6  0  0  0  0  0   0   0   0    0     6                osn.de   5  0  0  0  0  0   1   0   0    0     4

  •   The "T" column shows the total (in this case sender) countfor each domain.  The columns with numbers above them, show countsfor messages aged fewer than that many minutes, but not youngerthan the age limit for the previous column.  The row labeled "TOTAL"shows the total count for all domains.
  •   In this example, there are 14 messages allegedly fromyahoo.com, 1 between 10 and 20 minutes old, 1 between 320 and 640minutes old and 12 older than 1280 minutes (1440 minutes in a day).
  When the output is a terminal intermediate results showing the top 20domains (-n option) are displayed after every 1000 messages (-N option)and the final output also shows only the top 20 domains. This makesqshape useful even when the deferred queue is very large and it mayotherwise take prohibitively long to read the entire deferred queue.
  By default, qshape shows statistics for the union of both theincoming and active queues which are the most relevant queues tolook at when analyzing performance.
  One can request an alternate list of queues:
$ qshape deferred$ qshape incoming active deferred  this will show the age distribution of the deferred queue orthe union of the incoming active and deferred queues.
  Command line options control the number of display "buckets",the age limit for the smallest bucket, display of parent domaincounts and so on. The "-h" option outputs a summary of the availableswitches.
Trouble shooting with qshape  Large numbers in the qshape output represent a large number ofmessages that are destined to (or alleged to come from) a particulardomain.  It should be possible to tell at a glance which domainsdominate the queue sender or recipient counts, approximately whena burst of mail started, and when it stopped.
  The problem destinations or sender domains appear near the topleft corner of the output table. Remember that the active queuecan accommodate up to 20000 ($qmgr_message_active_limit) messages.To check whether this limit has been reached, use:
$ qshape -s active       (show sender statistics)  If the total sender count is below 20000 the active queue isnot yet saturated, any high volume sender domains show near thetop of the output.
  With oqmgr(8) the active queue is also limited to at most 20000recipient addresses ($qmgr_message_recipient_limit). To check forexhaustion of this limit use:
$ qshape active          (show recipient statistics)  Having found the high volume domains, it is often useful tosearch the logs for recent messages pertaining to the domains inquestion.
# Find deliveries to example.com#$ tail -10000 /var/log/maillog |        egrep -i ': to=<.*@example\.com>,' |        less# Find messages from example.com#$ tail -10000 /var/log/maillog |        egrep -i ': from=<.*@example\.com>,' |        less  You may want to drill in on some specific queue ids:
# Find all messages for a specific queue id.#$ tail -10000 /var/log/maillog | egrep ': 2B2173FF68: '  Also look for queue manager warning messages in the log. Thesewarnings can suggest strategies to reduce congestion.
$ egrep 'qmgr.*(panic|fatal|error|warning):' /var/log/maillog  When all else fails try the Postfix mailing list for help, butplease don't forget to include the top 10 or 20 lines of qshape(1)output.  
Example 1: Healthy queue  When looking at just the incoming and active queues, undernormal conditions (no congestion) the incoming and active queuesare nearly empty. Mail leaves the system almost as quickly as itcomes in or is deferred without congestion in the active queue.
$ qshape        (show incoming and active queue status)                 T  5 10 20 40 80 160 320 640 1280 1280+          TOTAL  5  0  0  0  1  0   0   0   1    1     2  meri.uwasa.fi  5  0  0  0  1  0   0   0   1    1     2  If one looks at the two queues separately, the incoming queueis empty or perhaps briefly has one or two messages, while theactive queue holds more messages and for a somewhat longer time:
$ qshape incoming                 T  5 10 20 40 80 160 320 640 1280 1280+          TOTAL  0  0  0  0  0  0   0   0   0    0     0$ qshape active                 T  5 10 20 40 80 160 320 640 1280 1280+          TOTAL  5  0  0  0  1  0   0   0   1    1     2  meri.uwasa.fi  5  0  0  0  1  0   0   0   1    1     2Example 2: Deferred queue full ofdictionary attack bounces  This is from a server where recipient validation is not yetavailable for some of the hosted domains. Dictionary attacks onthe unvalidated domains result in bounce backscatter. The bouncesdominate the queue, but with proper tuning they do not saturate theincoming or active queues. The high volume of deferred mail is nota direct cause for alarm.
$ qshape deferred | head                         T  5 10 20 40 80 160 320 640 1280 1280+                TOTAL 2234  4  2  5  9 31  57 108 201  464  1353  heyhihellothere.com  207  0  0  1  1  6   6   8  25   68    92  pleazerzoneprod.com  105  0  0  0  0  0   0   0   5   44    56       groups.msn.com   63  2  1  2  4  4  14  14  14    8     0    orion.toppoint.de   49  0  0  0  1  0   2   4   3   16    23          kali.com.cn   46  0  0  0  0  1   0   2   6   12    25        meri.uwasa.fi   44  0  0  0  0  1   0   2   8   11    22    gjr.paknet.com.pk   43  1  0  0  1  1   3   3   6   12    16 aristotle.algonet.se   41  0  0  0  0  0   1   2  11   12    15  The domains shown are mostly bulk-mailers and all the volumeis the tail end of the time distribution, showing that short termarrival rates are moderate. Larger numbers and lower message agesare more indicative of current trouble. Old mail still going nowhereis largely harmless so long as the active and incoming queues areshort. We can also see that the groups.msn.com undeliverables arelow rate steady stream rather than a concentrated dictionary attackthat is now over.
$ qshape -s deferred | head                     T  5 10 20 40 80 160 320 640 1280 1280+            TOTAL 2193  4  4  5  8 33  56 104 205  465  1309    MAILER-DAEMON 1709  4  4  5  8 33  55 101 198  452   849      example.com  263  0  0  0  0  0   0   0   0    2   261      example.org  209  0  0  0  0  0   1   3   6   11   188      example.net    6  0  0  0  0  0   0   0   0    0     6      example.edu    3  0  0  0  0  0   0   0   0    0     3      example.gov    2  0  0  0  0  0   0   0   1    0     1      example.mil    1  0  0  0  0  0   0   0   0    0     1  Looking at the sender distribution, we see that as expectedmost of the messages are bounces.
Example 3: Congestion in the activequeue  This example is taken from a Feb 2004 discussion on the PostfixUsers list.  Congestion was reported with the active and incomingqueues large and not shrinking despite very large delivery agentprocess limits.  The thread is archived at:http://groups.google.com/groups?threadm=c0b7js$2r65$1@FreeBSD.csie.NCTU.edu.twandhttp://archives.neohapsis.com/archives/postfix/2004-02/thread.html#1371
  Using an older version of qshape(1) it was quickly determinedthat all the messages were for just a few destinations:
$ qshape        (show incoming and active queue status)                           T   A   5  10  20  40  80 160 320 320+                 TOTAL 11775 9996  0   0   1   1  42  94 221 1420  user.sourceforge.net  7678 7678  0   0   0   0   0   0   0    0 lists.sourceforge.net  2313 2313  0   0   0   0   0   0   0    0        gzd.gotdns.com   102    0  0   0   0   0   0   0   2  100  The &quot;A&quot; column showed the count of messages in the active queue,and the numbered columns showed totals for the deferred queue. At10000 messages (Postfix 1.x active queue size limit) the activequeue is full. The incoming was growing rapidly.
  With the trouble destinations clearly identified, the administratorquickly found and fixed the problem. It is substantially harder toglean the same information from the logs. While a careful readingof mailq(1) output should yield similar results, it is much harderto gauge the magnitude of the problem by looking at the queueone message at a time.
Example 4: High volume destination backlog  When a site you send a lot of email to is down or slow, mailmessages will rapidly build up in the deferred queue, or worse, inthe active queue. The qshape output will show large numbers forthe destination domain in all age buckets that overlap the startingtime of the problem:
$ qshape deferred | head                    T   5  10  20  40   80  160 320 640 1280 1280+           TOTAL 5000 200 200 400 800 1600 1000 200 200  200   200  highvolume.com 4000 160 160 320 640 1280 1440   0   0    0     0             ...  Here the &quot;highvolume.com&quot; destination is continuing to accumulatedeferred mail. The incoming and active queues are fine, but thedeferred queue started growing some time between 1 and 2 hours agoand continues to grow.
  If the high volume destination is not down, but is insteadslow, one might see similar congestion in the active queue. Activequeue congestion is a greater cause for alarm; one might need totake measures to ensure that the mail is deferred instead or evenadd an access(5) rule asking the sender to try again later.
  If a high volume destination exhibits frequent bursts of consecutiveconnections refused by all MX hosts or &quot;421 Server busy errors&quot;, itis possible for the queue manager to mark the destination as &quot;dead&quot;despite the transient nature of the errors. The destination will beretried again after the expiration of a $minimal_backoff_time timer.If the error bursts are frequent enough it may be that only a smallquantity of email is delivered before the destination is again marked&quot;dead&quot;. In some cases enabling static (not on demand) connectioncaching by listing the appropriate nexthop domain in a table included in&quot;smtp_connection_cache_destinations&quot; may help to reduce the error rate,because most messages will re-use existing connections.
  The MTA that has been observed most frequently to exhibit suchbursts of errors is Microsoft Exchange, which refuses connectionsunder load. Some proxy virus scanners in front of the Exchangeserver propagate the refused connection to the client as a &quot;421&quot;error.
  Note that it is now possible to configure Postfix to exhibit similarlyerratic behavior by misconfiguring the anvil(8) service.  Do not useanvil(8) for steady-state rate limiting, its purpose is (unintentional)DoS prevention and the rate limits set should be very generous!
  If one finds oneself needing to deliver a high volume of mail to adestination that exhibits frequent brief bursts of errors and connectioncaching does not solve the problem, there is a subtle workaround.

  •   Postfix version 2.5 and later:

    •   In master.cf set up a dedicated clone of the &quot;smtp&quot; transportfor the destination in question. In the example below we will callit &quot;fragile&quot;.
    •   In master.cf configure a reasonable process limit for thecloned smtp transport (a number in the 10-20 range is typical).
    •   IMPORTANT!!! In main.cf configure a large per-destinationpseudo-cohort failure limit for the cloned smtp transport.
      /etc/postfix/main.cf:    transport_maps = hash:/etc/postfix/transport    fragile_destination_concurrency_failed_cohort_limit = 100    fragile_destination_concurrency_limit = 20/etc/postfix/transport:    example.com  fragile:/etc/postfix/master.cf:    # service type  private unpriv  chroot  wakeup  maxproc command    fragile   unix     -       -       n       -      20    smtp  See also the documentation fordefault_destination_concurrency_failed_cohort_limit anddefault_destination_concurrency_limit.

  •   Earlier Postfix versions:

    •   In master.cf set up a dedicated clone of the &quot;smtp&quot;transport for the destination in question. In the example belowwe will call it &quot;fragile&quot;.
    •   In master.cf configure a reasonable process limit for thetransport (a number in the 10-20 range is typical).
    •   IMPORTANT!!! In main.cf configure a very large initialand destination concurrency limit for this transport (say 2000).
      /etc/postfix/main.cf:    transport_maps = hash:/etc/postfix/transport    initial_destination_concurrency = 2000    fragile_destination_concurrency_limit = 2000/etc/postfix/transport:    example.com  fragile:/etc/postfix/master.cf:    # service type  private unpriv  chroot  wakeup  maxproc command    fragile   unix     -       -       n       -      20    smtp  See also the documentation for default_destination_concurrency_limit.

  The effect of this configuration is that up to 2000consecutive errors are tolerated without marking the destinationdead, while the total concurrency remains reasonable (10-20processes). This trick is only for a very specialized situation:high volume delivery into a channel with multi-error burststhat is capable of high throughput, but is repeatedly throttled bythe bursts of errors.
  When a destination is unable to handle the load even after thePostfix process limit is reduced to 1, a desperate measure is toinsert brief delays between delivery attempts.

  •   Postfix version 2.5 and later:

    •   In master.cf set up a dedicated clone of the &quot;smtp&quot; transportfor the problem destination. In the example below we call it &quot;slow&quot;.
    •   In main.cf configure a short delay between deliveries tothe same destination.  
      /etc/postfix/main.cf:    transport_maps = hash:/etc/postfix/transport    slow_destination_rate_delay = 1    slow_destination_concurrency_failed_cohort_limit = 100/etc/postfix/transport:    example.com  slow:/etc/postfix/master.cf:    # service type  private unpriv  chroot  wakeup  maxproc command    slow      unix     -       -       n       -       -    smtp
      See also the documentation for default_destination_rate_delay.
      This solution forces the Postfix smtp(8) client to wait for$slow_destination_rate_delay seconds between deliveries to the samedestination.  
      IMPORTANT!! The large slow_destination_concurrency_failed_cohort_limitvalue is needed. This prevents Postfix from deferring all mail forthe same destination after only one connection or handshake error(the reason for this is that non-zero slow_destination_rate_delayforces a per-destination concurrency of 1).  

  •   Earlier Postfix versions:

    •   In the transport map entry for the problem destination,specify a dead host as the primary nexthop.
    •   In the master.cf entry for the transport specify theproblem destination as the fallback_relay and specify a smallsmtp_connect_timeout value.
      /etc/postfix/main.cf:    transport_maps = hash:/etc/postfix/transport/etc/postfix/transport:    example.com  slow:[dead.host]/etc/postfix/master.cf:    # service type  private unpriv  chroot  wakeup  maxproc command    slow      unix     -       -       n       -       1    smtp        -o fallback_relay=problem.example.com        -o smtp_connect_timeout=1        -o smtp_connection_cache_on_demand=no
      This solution forces the Postfix smtp(8) client to wait for$smtp_connect_timeout seconds between deliveries. The connectioncaching feature is disabled to prevent the client from skippingover the dead host.  

Postfix queue directories  The following sections describe Postfix queues: their purpose,what normal behavior looks like, and how to diagnose abnormalbehavior.
The &quot;maildrop&quot; queue   Messages that have been submitted via the Postfix sendmail(1)command, but not yet brought into the main Postfix queue by thepickup(8) service, await processing in the &quot;maildrop&quot; queue. Messagescan be added to the &quot;maildrop&quot; queue even when the Postfix systemis not running. They will begin to be processed once Postfix isstarted.  
  The &quot;maildrop&quot; queue is drained by the single threaded pickup(8)service scanning the queue directory periodically or when notifiedof new message arrival by the postdrop(1) program. The postdrop(1)program is a setgid helper that allows the unprivileged Postfixsendmail(1) program to inject mail into the &quot;maildrop&quot; queue andto notify the pickup(8) service of its arrival.
  All mail that enters the main Postfix queue does so via thecleanup(8) service. The cleanup service is responsible for envelopeand header rewriting, header and body regular expression checks,automatic bcc recipient processing, milter content processing, andreliable insertion of the message into the Postfix &quot;incoming&quot; queue.
  In the absence of excessive CPU consumption in cleanup(8) headeror body regular expression checks or other software consuming allavailable CPU resources, Postfix performance is disk I/O bound.The rate at which the pickup(8) service can inject messages intothe queue is largely determined by disk access times, since thecleanup(8) service must commit the message to stable storage beforereturning success. The same is true of the postdrop(1) programwriting the message to the &quot;maildrop&quot; directory.
  As the pickup service is single threaded, it can only deliverone message at a time at a rate that does not exceed the reciprocaldisk I/O latency (+ CPU if not negligible) of the cleanup service.
  Congestion in this queue is indicative of an excessive local messagesubmission rate or perhaps excessive CPU consumption in the cleanup(8)service due to excessive body_checks, or (Postfix ≥ 2.3) high latencymilters.
  Note, that once the active queue is full, the cleanup servicewill attempt to slow down message injection by pausing $in_flow_delayfor each message. In this case &quot;maildrop&quot; queue congestion may bea consequence of congestion downstream, rather than a problem inits own right.
  Note, you should not attempt to deliver large volumes of mail viathe pickup(8) service. High volume sites should avoid using &quot;simple&quot;content filters that re-inject scanned mail via Postfix sendmail(1)and postdrop(1).
  A high arrival rate of locally submitted mail may be an indicationof an uncaught forwarding loop, or a run-away notification program.Try to keep the volume of local mail injection to a moderate level.
  The &quot;postsuper -r&quot; command can place selected messages intothe &quot;maildrop&quot; queue for reprocessing. This is most useful forresetting any stale content_filter settings. Requeuing a large numberof messages using &quot;postsuper -r&quot; can clearly cause a spike in thesize of the &quot;maildrop&quot; queue.

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-142954-1-1.html 上篇帖子: CentOS postfix无法启动——mysql-libs未安装 下篇帖子: RHEL4上搭建基于postfix的全功能邮件服务器
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表