Core OS Events in Windows 7, Part 1

191145685 · 发表于 2015-5-12 05:22:15

Dr. Insung Park and Alex Bendetovers　　Today's  computer software constantly breaks new grounds. Consumer software  applications offer a sophisticated set of features that enable rich new  experiences. Powerful server applications are setting new records in  throughput, speed and scale. These improvements have been made possible  by rapid progress in hardware technologies and continuous adoption of  software advancements in optimization, virtualization, and distributed  and parallel computing. However, as a result, software applications have  become larger and more complicated. At the same time, users'  expectations about software quality are higher than ever. Fundamental  characteristics such as performance, reliability and manageability have  proved essential in the long-term success of software products, and they  are often celebrated as primary features.
　　Increasing software  complexity and higher user expectations on quality thus present a  difficult challenge in software development. When an unexpected problem  occurs, predicting internal states of all relevant components is nearly  impossible. Retracing the history of execution flows is cumbersome and  tricky, but often necessary in finding out the root cause of software  problems. When users report problems after deployment, they expect the  root cause of the problem to be quickly identified and addressed. The  overwhelming number of hardware and software combinations, different  workload characteristics, and usage patterns of end users make such  tasks even tougher. The ability to use a mechanism that enables you to  understand system execution in a transparent manner, with minimal  overhead, is invaluable.
Event Instrumentation
　　Instrumentation  is one such effective solution in measuring and improving software  quality. Software performance counters have provided a convenient way to  monitor application execution status and resource usage at an aggregate  level. Event instrumentation has also been popular over the years.  Events raised by a software component at different stages of execution  can significantly reduce the time it takes to diagnose various problems.  In addition to scanning for certain events or patterns of events, one  can apply data mining and correlation techniques to further analyze the  events to produce meaningful statistics and reports on program execution  and problematic behavior. The ability to collect events on production  systems in real time helps avoid the need to have an unwieldy debugger  setup on customer machines.
　　Introduced in the Windows 2000  operating system, Event Tracing for Windows (ETW) is a general-purpose  event-tracing platform on Windows operating systems. Using an efficient  buffering and logging mechanism implemented in the kernel, ETW provides a  mechanism to persist events raised by both user-mode applications and  kernel-mode device drivers. Additionally, ETW gives users the ability to  enable and disable logging dynamically, making it easy to perform  detailed tracing in production environments without requiring reboots or  application restarts.
　　The operating system itself has been  heavily instrumented with ETW events. The ability to analyze and  simulate core OS activities based on ETW events in development, as well  as on production-mode systems, has been valuable to developers in  solving many quality problems. With each subsequent Windows release, the  number of ETW events raised by the operating system has increased;  Windows 7 is the most instrumented operating system to date. In  addition, Windows 7 contains tools that can utilize these operating  system ETW events to analyze system performance and reliability, as well  as uncover quality problems in software applications.
　　Many  application problems surface as anomalies in OS resource usage, such as  unexpected patterns or spikes in the consumption of CPU, memory, network  bandwidth, IOs and so on. Because OS events for most system activities  can be traced to the originating process and thread, one can make  considerable progress in narrowing down possible root causes of many  application problems, even without ETW instrumentation in applications.  Of course, ETW instrumentation in the application would allow further  diagnosis to be significantly more efficient.
　　In the first article  of our two-part series, we present a high-level overview of the ETW  technology and core OS instrumentation. Then, we discuss tool support to  obtain and consume OS events. Next, we provide more details on the  events from various subcomponents in the core OS. We also explain how  the different system events can be combined to produce a comprehensive  picture of system behavior, which we demonstrate by using a set of  Windows PowerShell scripts.
Event Tracing for Windows
　　As  mentioned earlier, ETW is a logging platform that efficiently records  the events sent by software applications or kernel-mode components.  Using ETW provider APIs, any application, DLL or driver can become an  event provider (a component that raises events) for ETW. A provider  first registers with ETW and sends events from various points in the  code by inserting ETW logging API calls. Any recordable activity of  importance can be an event, and it is represented by a piece of data  written by ETW at the time of logging. These logging API calls are  ignored when the provider is not enabled. An ETW controller application  starts an ETW session and enables certain providers to it. When an  enabled event provider makes a logging API call, the event is then  directed to the session designated by the controller. Events sent to a  session may be stored in a log file, consumed programmatically in real  time, or kept in memory until the controller requests a flush of that  data to a file. A previous article, "Improve Debugging And Performance  Tuning with ETW"  (msdn.microsoft.com/en-us/magazine/dvdarchive/cc163437.aspx), has more  details about the ETW technology and how to add ETW instrumentation into  an application. Over the years, ETW has come to support many different  logging modes and features, which are documented on MSDN.
　　An ETW  event consists of a fixed header followed by context-specific data. The  header identifies the event and the component raising the event, while  the context-specific data ("event payload" hereafter) refers to any  additional data that the component raising the event wants to record.  When an event raised by a provider is written to an ETW session, ETW  adds additional metadata to the header, including thread and process  IDs, the current CPU on which the logging thread is running, CPU time  usage of the thread, and timestamp. Figure 1 shows an XML representation  of an event (a Process event of type Start) as decoded by the tracerpt  tool (to be discussed later) in the XML dump file. The section is common to all events and represents the common header that  ETW records for each event. This contains timestamp, process and thread  ID, provider GUID, CPU time usage, CPU ID, and so on. The section displays the logged payload of this event. As  shown in Figure 1, a Process Start event from the Windows kernel  contains process key (a unique key assigned to each process for  identification), process ID, parent process ID, session ID, exit status  (valid only for a Process End event), user SID, executable file name of  the process, and the command that started the process.
　　ETW  controllers are the applications that use the ETW control API set to  start ETW sessions and enable one or more providers to those sessions.  They need to give each session a unique name, and on Windows 7 there can  be up to 64 sessions running concurrently.
　　Figure 1 Process Start Event in XML Dump






0
3
0
0
1
0x0

0xFFFFFA8005BBC950
0x1430
0xA18
1
259
0x4E1D6000
guest
notepad.exe
notepad

Start
MSNT_SystemTrace

Process

{3d6fa8d0-fe05-11d0-9dda-00c04fd7ba7c}

      　　The events from the kernel and core system components,  however, are logged in a manner different from user-mode applications or  kernel-mode device drivers. The core system logs events to a special  session with a reserved name, "NT Kernel Logger." Only the "NT Kernel  Logger" session ("kernel session" hereafter) receives core system  events, and it does not accept events from any other regular event  provider. Also, core OS events are enabled by specifying appropriate  flags when a session is started. Each flag represents event  instrumentation in a different core component that can be enabled  selectively. This helps reduce the instrumentation overhead in the  kernel and reinforces the authenticity of system events. In addition, a  new feature in Windows 7 allows users to capture call stacks at the time  of logging. If symbols are available, one can trace a chain of function  calls that trigger the kernel events to be logged. The benefit from  call-stack analysis will be discussed in subsequent sections, when we  discuss individual events in more detail.
Collecting Events Using Tools on Windows
　　There  are a few ETW control tools available on Windows that allow users to  collect events. For instance, the Performance Monitor exposes ETW  control in the form of a data collection set. The EventLog service is  also capable of starting and stopping ETW sessions and viewing events.  For command-line and script interfaces, logman.exe offers options to  perform ETW control operations. For event consumption, the command-line  tool tracerpt.exe can consume ETW log files and produce dumps in several  formats, including CSV and XML. An example of an XML representation of a  Process Start event is shown in Figure 1. In this article, we use logman.exe and tracerpt.exe in the samples we present. The "logman query providers" command in Figure 2 lists the different flags that can be used by a controller when enabling the kernel session.
　　The  following command starts the kernel session and enables process,  thread, disk, network, image, and registry events. The collected events  will be stored in a file called systemevents.etl in the current  directory. Controlling the kernel session and collecting core OS events  require administrator privileges:




   > logman start "NT Kernel Logger" –p "Windows Kernel Trace"
(process,thread,img,disk,net,registry) –o systemevents.etl –ets       　　To stop the collection, users need to issue the "logman stop
-ets" command:




   > logman stop "NT Kernel Logger" –ets       　　The tracerpt tool can process the events in the log file into a  readable format. By default, tracerpt accepts one or more log files and  generates an event output file and a summary file. The default format  of the output file is XML:




   > tracerpt systemevents.etl       　　The logman and tracerpt help text, Windows Help and Support,  and online documentation have more details on the switches and features  of these tools.
　　There are advanced performance analysis tools that  consume core OS events for various analysis and tuning scenarios.  Windows Performance Toolkit (WPT) is one such notable tool, and is  available from the Windows SDK (msdn.microsoft.com/en-us/performance/cc825801.aspx).  WPT is useful to a broad audience, including system builders, hardware  manufacturers, driver developers and general application developers. Its  trace analysis tools, Xperf and XperfView, apply sophisticated  techniques (including those introduced here) to aggregate and analyze  the core OS events to offer meaningful perspectives into the OS and  application behavior. Its flexible GUI provides many rich and  customizable presentation options that can help users focus on different  aspects of system activities. Figure 3 shows a screenshot of the  XperfView tool in action.
Core OS Events
　　The Windows core OS has many components instrumented with ETW. As shown in Figure 2,  events are available for activities spanning various subsystems  including processes, threads, disk and file IOs, memory, network,  registry and so on. In this section, we provide details for each group  of events. We also discuss a number of core operating system concepts.  More information on these concepts can be found in OS-centric reference  materials such as "Windows Internals, 5th Edition," by Russinovitch,  Solomon and Ionescu (Microsoft Press, 2009). Core OS events are subject  to change in future versions of Windows, as the platform and its  instrumentation evolve to meet new requirements.
　　There are several  ways to make use of event data. One can scan for a certain event, such  as an error event, or a pattern of events that represent execution flow.  Other popular methodologies include statistical analysis (counting and  summarizing events), delta analysis (analyzing deltas between pairs of  events, such as Start and End), activity analysis (tracking an  activity/request through event correlation), and aggregation and pattern  analysis based on call-stacks. Over the years, we have found it  effective to construct a state machine and simulate OS activities when  we look at core system events. When we explain the system events in this  section, we will also describe how we use them in building the state  machine.
Process, Thread, Image, Process Counter Events
　　A  thread is a basic unit of execution on the Windows OS, and a process  acts as a container for threads. Each process (and thread) is assigned  an ID that is unique while it is running. IDs for processes and threads  share the same number space. That is, while a thread with ID A is  active, A is never given to other processes or threads. The only  exception is the per-CPU idle threads and the idle process, whose IDs  are all identical and have been historically 0. Process and Thread  events are the basic building blocks in establishing the state machine,  which is helpful in understanding more advanced system activities. There  are two types of events for Process and Thread events: Start and End.  Process Start and End are logged when a process starts and terminates,  respectively. The same applies to Thread Start and End events. The  payload for Process events has more details about the process, such as  process name and parent process ID, as shown in Figure 1.  Likewise, the Thread event payload contains thread-specific  information, such as stack base and limit, and start address. It should  be noted that the IDs of the starting processes or threads are part of  the Process and Thread event payload, although the event header already  has those items in it. Process Start events are logged in the context of  a parent process that creates the current process. In that case, the  process ID in the  section (event header) is the parent  process. The process ID in the event payload is the one being created.  The same applies to thread ID in Thread events.
　　For processes and  threads that started before the event collection, ETW logs state rundown  events. For the purpose of analysis, they are used to denote the  running processes and threads at the time the event collection begins.  ETW also logs rundown events for the remaining processes and threads  still running when the collection ends. Process and Thread rundown  events enumerate and log events in the same format as Process and Thread  Start events for all processes and threads, including system and idle  processes. Process and Thread rundowns events use different types,  DCStart and DCEnd, to distinguish themselves from real process and  thread creation and termination. A separate Defunct event is written for  a process that has terminated, but with outstanding references to it.
　　Every  process and thread active during event collection can be tracked using  Process and Thread events. When you build a state machine, the  processing routine should keep a list of active processes and threads,  perhaps as some type of objects in the program. When a thread or process  terminates, the corresponding object may be placed in a different  ("complete") list. The process objects may also contain more details  (such as process name) on the processes or threads that they refer to,  picked up from the Process and Thread event payload. It is simpler and a  lot more informative if one can refer to a process by name during  analysis, rather than by ID, as IDs can be recycled once a process or  thread terminates (and all references to it are released). When events  from other core components indicate OS activities during the state  machine construction and simulation, the threads and processes that  initiated them are located in the state machine, and their objects are  updated to attribute those activities, using primarily the process and  thread IDs in the event headers. At the end, process and thread objects  are aggregated and summarized into a report with various metrics.
　　Image  events correspond to image (also known as module) files getting loaded  and unloaded into process address space. There are four types of Image  events: Load, Unload, DCStart and DCEnd. These events do not directly  correlate to LoadLibrary calls, however. If a DLL is already loaded in  the process, subsequent LoadLibrary calls for the same DLL simply  increment the count of module references but will not map the module  again. Like the DCStart and DCEnd types of Process and Thread events,  Image DCStart and DCEnd are used to enumerate loaded modules of already  running processes. Image events allow for the tracking of loaded modules  and the mapping of addresses within a process. They are also important  in mapping DPC, ISR, and System Call events, as will be discussed in a  later section and in Part 2 of this series. The Image event payload  contains information such as the module address base and size, and the  name of the binary file loaded (or unloaded). Image events are also  required for decoding call-stacks. In a typical state machine  construction, Image events trigger the updating of a list of loaded  modules into an aforementioned process object.
　　Process Counter  events, when enabled, are logged at process termination and record in  its payload a few properties regarding the process execution statistics  over the lifetime of the process. They consist of peak memory size, peak  working set size, peak paged and nonpaged pool usage, and peak page  file usage. These events indicate how a process behaved with respect to  memory usage. Like Process events, separate rundown Process Counter  events are logged for all active processes at the end of event  collection.
　　Figure 2 NT Kernel Logger Enable Flags




   >
logman query providers "Windows Kernel Trace"
Provider GUID
------------------------------------------------------------------------
Windows Kernel Trace {9E814AAD-3204-11D2-9A82-006008A86939}
Value Keyword Description
------------------------------------------------------------------------
0x0000000000000001 process Process creations/deletions
0x0000000000000002 thread Thread creations/deletions
0x0000000000000004 img Image load
0x0000000000000008 proccntr Process counters
0x0000000000000010 cswitch Context switches
0x0000000000000020 dpc Deferred procedure calls
0x0000000000000040 isr Interrupts
0x0000000000000080 syscall System calls
0x0000000000000100 disk Disk IO
0x0000000000000200 file File details
0x0000000000000400 diskinit Disk IO entry
0x0000000000000800 dispatcher Dispatcher operations
0x0000000000001000 pf Page faults
0x0000000000002000 hf Hard page faults
0x0000000000004000 virtalloc Virtual memory allocations
0x0000000000010000 net Network TCP/IP
0x0000000000020000 registry Registry details
0x0000000000100000 alpc ALPC
0x0000000000200000 splitio Split IO
0x0000000000800000 driver Driver delays
0x0000000001000000 profile Sample based profiling
0x0000000002000000 fileiocompletion File IO completion
0x0000000004000000 fileio File IO
The command completed successfully.       Context Switch, DPC and ISR Events
　　Context Switch  events are logged every time thread switches occur on a CPU and can be  used to construct a very accurate history as to which threads have been  running and for how long. They occur very frequently and produce a large  amount of data. In each switch, two threads are involved. The old  thread will give up its share of execution time and hand the execution  to the new thread. Thus, Context Switch events contains old and new  thread IDs, old and new thread priorities, wait reason and wait time.  Context switches can happen for various reasons, including blocking on  kernel synchronization objects (events, timers, semaphores and so on.),  preemption by a higher priority thread, quantum expiration, and changes  in thread affinity. A certain amount of context switches are always  expected. However, excessive context switching can be an indication of  inefficient use of synchronization primitives and can lead to poor  scaling in performance. Enabling call-stacks on Context Switch events  allows in-depth analysis on reasons for threads getting switched out.
　　Deferred  Procedure Call (DPC) events are logged when DPCs are executed. DPC is a  kernel-mode function executed at elevated interrupt-level execution  mode, and it preempts regular thread execution. The DPC event payload  includes DPC entry time and routine address. Interrupt Service Routine  (ISR) is a similar mechanism, and it runs at a higher execution level  than DPC. ISR events have ISR entry time, routine address, ISR vector  and ISR return value. DPC and ISR mechanisms are important elements in a  Windows driver, as they are typically used for handling hardware  interrupts. Drivers and kernel-mode components have a right to use DPC  and ISR, but it is strongly recommended that they spend as little time  as possible in these elevated modes. DPC and ISR events are used to  monitor and verify the behavior of various drivers and kernel-mode  components. By comparing the routine addresses against the range  information in Image events, one can locate the kernel component  responsible for those DPC and ISR events.
　　In state machine  construction, combining Context Switch, DPC and ISR events enables a  very accurate accounting of CPU utilization. By setting aside storage  for each processor that records its current active thread based on  Context Switch, DPC and ISR events, one can monitor -- given any  timestamp -- what each CPU was doing at that time and whose code it was  executing. In the state machine simulation method, when a context switch  takes place, a CPU object is updated with the new thread ID, and so is  the object for the old thread with CPU usage up to the switch. Likewise,  DPC and ISR events are attributed to the corresponding kernel-mode  components, if needed.
　　In certain cases, such as with IO, Memory  or System Call events, ETW does not record the process or thread IDs in  the event header, primarily to reduce the overhead of very frequent  events. For such events, the values of thread and process IDs in the  header show up as 0xFFFFFFFF (= 4294967295). If Context Switch, DPC and  ISR events are tracked as described above, those events can be traced to  the thread or kernel-mode component by examining the CPU object for the  currently running thread or DPC/ISR.

Memory Events
　　Memory  events denote memory manager (MM) operations. Windows offers Page Fault  events, Hard Page Fault events and Virtual Memory events. Memory events  tend to be very frequent, especially on a busy system.
　　A page  fault occurs when a sought-out page table entry is invalid. If the  requested page needs to be brought in from disk, it is called a hard  page fault (a very expensive operation), and all other types are  considered soft page faults (a less expensive operation). A Page Fault  event payload contains the virtual memory address for which a page fault  happened and the instruction pointer that caused it. A hard page fault  requires disk access to occur, which could be the first access to  contents in a file or accesses to memory blocks that were paged out.  Enabling Page Fault events causes a hard page fault to be logged as a  page fault with a type Hard Page Fault. However, a hard fault typically  has a considerably larger impact on performance, so a separate event is  available just for a hard fault that can be enabled independently. A  Hard Fault event payload has more data, such as file key, offset and  thread ID, compared with a Page Fault event.
　　Virtual Memory events  consist of Virtual Memory Allocation and Virtual Memory Free types and  correspond to MM calls to allocate and free virtual memory ranges. Their  payload contains the process ID, base address, requested size and flags  used in the API call. Virtual Memory events were newly added for  Windows 7 and are useful for tracking down leaked calls to the  VirtualAlloc function and excessive virtual memory usage by  applications.
　　Memory event headers do not contain the IDs of the  threads and processes that caused the particular activities. The Page  Fault event payload, however, has the thread ID of the thread causing  the fault. This allows the correlation of Page Fault events to threads  and processes through the state machine. The Virtual Memory event  payload contains the ID of the process for which virtual memory  operations were performed. To track it to the thread making the API  call, you need the context switch accounting, described in the previous  section.
Next Time
　　Windows 7 features hundreds of event  providers from various components. In the first part of this two-part  article series, we have presented some of the core OS ETW events  available on Windows 7 and the analysis techniques that we have used for  many years. Individual events indicate certain activities in the core  OS, but if combined through context-sensitive analysis methods, they can  be used to produce meaningful reports that provide insights into  patterns and anomalies in resource usage. In Part 2, we plan to cover  other core OS ETW events as well as present simple scripts to  demonstrate a few basic accounting techniques on some of the OS events  introduced in these two parts. We hope that many people will take  advantage of the content presented here and that it will lead to the  promotion of sound engineering practice, greater software quality and  better user experiences.
　　Dr. Insung Parkis  a development manager for the Windows Instrumentation and Diagnosis  Platform Team. He has published a dozen papers on performance analysis,  request tracking, instrumentation technology, and programming  methodology and support. His e-mail address is insungp@microsoft.com.
Alex Bendetov  is a development lead for the Windows Instrumentation and Diagnosis  Platform Team. He works on both Event Tracing for Windows and the  Performance Counter technologies. He can be reached at alexbe@microsoft.com.

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

Core OS Events in Windows 7, Part 1

浏览过的版块

扫码加入运维网微信交流群