设为首页 收藏本站
查看: 731|回复: 0

[经验分享] Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Win

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2015-11-4 08:31:03 | 显示全部楼层 |阅读模式
Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012
RATE THIS
DSC0000.png DSC0001.png ntdebug 7 Dec 2012 1:21 PM

  • 1
What is a bug check 0x133?
Starting in Windows Server 2012, a DPC watchdog timer is enabled which will bug check a system if too much time is spent in DPC routines. This bug check was added to help identify driversthat are deadlocked or misbehaving.  The bug check is of type "DPC_WATCHDOG_VIOLATION" and has a code of 0x133.  (Windows 7 also included a DPC watchdog but by default, it only took action when a kernel debugger was attached to the system.)  A descriptionof DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff544084(v=vs.85).aspx.

The DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug checkset to 0.  In this case, the system's time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2.  Alternatively, if the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQLwas raised to DPC level, the system will stop with a 0x133 with parameter 1 set to 1.  Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on thesystem are set much higher.

How to debug a 0x133 (0, …
In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver.  For example, here is a debug of a 0x133 (0,…) kernel dump:

0: kd> .bugcheck
Bugcheck code 00000133
Arguments 00000000`00000000 00000000`00000283 00000000`00000282 00000000`00000000  

Per MSDN, weknow that this DPC has run for 0x283 ticks, when the limit was 0x282.

0: kd> k
Child-SP          RetAddr           Call Site
fffff803`08c18428 fffff803`098525df nt!KeBugCheckEx
fffff803`08c18430 fffff803`09723f11 nt! ??::FNODOBFM::`string'+0x13ba4
fffff803`08c184b0 fffff803`09724d98 nt!KeUpdateRunTime+0x51
fffff803`08c184e0 fffff803`09634eba nt!KeUpdateTime+0x3f9
fffff803`08c186d0 fffff803`096f24ae hal!HalpTimerClockInterrupt+0x86
fffff803`08c18700 fffff803`0963dba2 nt!KiInterruptDispatchLBControl+0x1ce
fffff803`08c18898 fffff803`096300d0 hal!HalpTscQueryCounter+0x2
fffff803`08c188a0 fffff880`04be3409 hal!HalpTimerStallExecutionProcessor+0x131
fffff803`08c18930 fffff880`011202ee ECHO!EchoEvtTimerFunc+0x7d                //Here is our driver, and we can see it calls into StallExecutionProcessor
fffff803`08c18960 fffff803`097258b4 Wdf01000!FxTimer::TimerHandler+0x92
fffff803`08c189a0 fffff803`09725ed5 nt!KiProcessExpiredTimerList+0x214
fffff803`08c18ae0 fffff803`09725d88 nt!KiExpireTimerTable+0xa9
fffff803`08c18b80 fffff803`0971fe76 nt!KiTimerExpiration+0xc8
fffff803`08c18c30 fffff803`0972457a nt!KiRetireDpcList+0x1f6
fffff803`08c18da0 00000000`00000000 nt!KiIdleLoop+0x5a

Let’s view the driver’s unassembled DPC routine and see what it is doing

0: kd> ub fffff880`04be3409
ECHO!EchoEvtTimerFunc+0x54:
fffff880`04be33e0 448b4320        mov     r8d,dword ptr[rbx+20h]
fffff880`04be33e4 488b0d6d2a0000  mov     rcx,qword ptr [ECHO!WdfDriverGlobals (fffff880`04be5e58)]
fffff880`04be33eb 4883631800      and     qword ptr [rbx+18h],0
fffff880`04be33f0 488bd7          mov     rdx,rdi
fffff880`04be33f3 ff150f260000    call    qword ptr [ECHO!WdfFunctions+0x838(fffff880`04be5a08)]
fffff880`04be33f9 bbc0d40100      mov     ebx,1D4C0h
fffff880`04be33fe b964000000      mov     ecx,64h
fffff880`04be3403 ff15f70b0000    call    qword ptr[ECHO!_imp_KeStallExecutionProcessor (fffff880`04be4000)]   //Its Calling KeStallExecutionProcessor with 0x64 (decimal 100) as a parameter
0: kd> u fffff880`04be3409
ECHO!EchoEvtTimerFunc+0x7d:
fffff880`04be3409 4883eb01        sub     rbx,1
fffff880`04be340d 75ef            jne     ECHO!EchoEvtTimerFunc+0x72 (fffff880`04be33fe)     //Here we can see it is jumping back to call KeStallExecutionProcessor in a loop
fffff880`04be340f 488b5c2430      mov     rbx,qword ptr[rsp+30h]
fffff880`04be3414 4883c420        add     rsp,20h
fffff880`04be3418 5f              pop     rdi
fffff880`04be3419 c3              ret
fffff880`04be341a cc              int     3
fffff880`04be341b cc              int     3

0: kd> !pcr
KPCR for Processor 0 at fffff80309974000:
    Major 1 Minor 1
      NtTib.ExceptionList: fffff80308c11000
          NtTib.StackBase: fffff80308c12080
         NtTib.StackLimit: 000000d70c7bf988
       NtTib.SubSystemTib: fffff80309974000
            NtTib.Version: 0000000009974180
        NtTib.UserPointer: fffff803099747f0
            NtTib.SelfTib: 000007f7ab80c000

                  SelfPcr: 0000000000000000
                     Prcb: fffff80309974180
                     Irql: 0000000000000000
                      IRR: 0000000000000000
                      IDR: 0000000000000000
            InterruptMode: 0000000000000000
                      IDT: 0000000000000000
                      GDT: 0000000000000000
                      TSS: 0000000000000000

            CurrentThread: fffff803099ce880
               NextThread: fffffa800261cb00
               IdleThread: fffff803099ce880

                DpcQueue:  0xfffffa80020ce790 0xfffff880012e4e9c [Normal] NDIS!NdisReturnNetBufferLists
                           0xfffffa800185f118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize
                           0xfffff8030994fda0 0xfffff8030972bc30 [Normal] nt!KiBalanceSetManagerDeferredRoutine
                           0xfffffa8001dbc118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize
                           0xfffffa8002082300 0xfffff88001701df0 [Normal] USBPORT

The !pcr output shows us queued DPCs for this processor. If you want to see more information about DPCs and the DPC Watchdog, you could dump the PRCB listed in the !pcr output like this:

dt nt!_KPRCB fffff80309974180 Dpc*

Often the driver will be calling into a function like KeStallExecutionProcessor ina loop, as in our example debug.  To resolve this problem, contact the driver vendor to request an updated driver version that spends less time in its DPC Routine.

How to troubleshoot a 0x133 (1, …
Determining the cause of a stop 0x133 with a first parameter of 1 is a bit more difficult because the problem is a result of DPCs running from multiple drivers, so the call stack is insufficientto determine the culprit.  To troubleshoot this stop, first make sure that the NT Kernel Logger or Circular Kernel Context Logger ETW traces are enabled on the system.  (For directions on setting this up, seehttp://blogs.msdn.com/b/ntdebugging/archive/2009/12/11/test.aspx.)

Once the logging is enabled and the system bug checks, dump out the list of ETW loggers using !wmitrace.strdump. Find the ID of the NT Kernel logger or the Circular logger.  You can then use!wmitrace.logsave (ID) (path to ETL) to write out the ETL log to a file.  Load it up with Windows Performance Analyzer and add the DPC or DPC/ISR Duration by Module, Function view (located in the Computation group) to your current analysis window:

DSC0002.jpg

Next, make sure the table is also shown by clicking the box in the upper right of the view:
DSC0003.jpg


Ensure that the Address column is added on the left of the gold bar, then expand each address entry to see individual DPC enters/exits for each function.  Using this data, you can determinewhich DPC routines took the longest by looking at the inclusive duration column, which should be added to the right of the gold bar:
DSC0004.jpg

In this case, these DPCs took 1 second, which is well over the recommended maximum of 100 us.  The module column (and possible the function column, if you have symbols) will show which driveris responsible for that DPC routine.  Since our ECHO driver was based on WDF, that is the module named here.

For an example of doing this type of analysis in xperf, see http://blogs.msdn.com/b/ntdebugging/archive/2008/04/03/windows-performance-toolkit-xperf.aspx.

More Information
For additional information about Stop 0x133 errors, see this page on MSDN: http://msdn.microsoft.com/en-us/library/windows/hardware/jj154556(v=vs.85).aspx.

For DPC timing recommendations and for advice on capturing DPC timing information using tracelog, see http://msdn.microsoft.com/en-us/library/windows/hardware/ff545764(v=vs.85).aspx.

Guidelines for writing DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff546551(v=vs.85).aspx.


-Matt Burrough


Debugging, Hangs, windows, bugcheck, Matt, XPERF, ETW, Burrough, windbg, debug, kernel

点击打开链接

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-134733-1-1.html 上篇帖子: 在 Windows Server 2012 上安装 ASP.NET STATE SERVICE 下篇帖子: 2012云计算架构师峰会演讲PPT专辑 由windows首席架构师 IBM资深架构师等更多名企架构师演讲分享
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表