xperf WinDBG C# .NET 4.5.2应用程序 - 理解进程转储

Question

xperf WinDBG C# .NET 4.5.2应用程序 - 理解进程转储

64

在重负载下，我们的应用程序使得一个强大的服务器的CPU使用率达到100％。阅读进程转储，在查看线程时，其中一些线程已经运行了10分钟。当使用!CLRStack时，它们中的任何一个都没有给我任何见解。

!runaway 给了我：

0:030> !runaway
 User Mode Time
  Thread       Time
  53:2e804      0 days 0:10:04.703
  30:31894      0 days 0:07:51.593
  33:47100      0 days 0:07:24.890
  42:11e54      0 days 0:06:45.875
  35:35e18      0 days 0:06:07.578
  41:54464      0 days 0:05:49.796
  47:57700      0 days 0:05:45.000
  44:3c2d4      0 days 0:05:44.265
  32:3898c      0 days 0:05:43.593
  50:54894      0 days 0:05:41.968
  51:5bc58      0 days 0:05:40.921
  43:14af4      0 days 0:05:40.734
  48:35074      0 days 0:05:40.406
  ...

在其中一个线程上调用!DumpStack，我得到了以下结果：

0000001ab442f900 00007ff9ef4c1148 KERNELBASE!WaitForSingleObjectEx+0x94, calling ntdll!NtWaitForSingleObject
0000001ab442f980 00007ff9e920beb2 clr!SVR::gc_heap::compute_new_dynamic_data+0x17b, calling clr!SVR::gc_heap::desired_new_allocation
0000001ab442f9a0 00007ff9e90591eb clr!CLREventWaitHelper2+0x38, calling kernel32!WaitForSingleObjectEx
0000001ab442f9b0 00007ff9e90e0d2c clr!WriteBarrierManager::UpdateEphemeralBounds+0x1c, calling clr!WriteBarrierManager::NeedDifferentWriteBarrier
0000001ab442f9e0 00007ff9e9059197 clr!CLREventWaitHelper+0x1f, calling clr!CLREventWaitHelper2
0000001ab442fa40 00007ff9e9059120 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventWaitHelper
0000001ab442fa70 00007ff9ef4c149c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent
0000001ab442faa0 00007ff9e90ef1e1 clr!SVR::gc_heap::set_gc_done+0x22, calling clr!CLREventBase::Set
0000001ab442fad0 00007ff9e90e9331 clr!SVR::gc_heap::gc_thread_function+0x8a, calling clr!CLREventBase::WaitEx
0000001ab442fb00 00007ff9e92048e7 clr!SVR::gc_heap::gc_thread_stub+0x7a, calling clr!SVR::gc_heap::gc_thread_function
0000001ab442fb60 00007ff9e91a0318 clr!Thread::CLRSetThreadStackGuarantee+0x48, calling kernel32!SetThreadStackGuaranteeStub
0000001ab442fb90 00007ff9e91a01ef clr!Thread::CommitThreadStack+0x10, calling clr!Thread::CLRSetThreadStackGuarantee
0000001ab442fbd0 00007ff9e910df0b clr!ClrFlsSetValue+0x57, calling kernel32!SetLastErrorStub
0000001ab442fc00 00007ff9e92048dc clr!SVR::gc_heap::gc_thread_stub+0x6f, calling clr!_chkstk
0000001ab442fc40 00007ff9f0d316ad kernel32!BaseThreadInitThunk+0xd
0000001ab442fc70 00007ff9f1e54409 ntdll!RtlUserThreadStart+0x1d

这告诉我什么？我看到很多对CLR的调用，但是我无法理解问题出在哪里。在Thomas建议使用.reload命令之后，现在我可以看到GC的调用。

更新1

运行xperf后，每个w3wp.exe占用大约45%的CPU。通过其中一个进行过滤并根据函数分组，有一个带有“？”标签的函数负责13.62％，其他小于2.67％。我怎样才能知道这个“？”是什么？

更新2

再次运行xperf，函数JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel占CPU使用率的12.31％。那个“？”函数仍然存在。

按堆栈分组：

Line #, Stack, Count, Weight (in view), TimeStamp, % Weight
2,   |- ?!?, 501191, 501222.365294, , 35.51
3,   |    |- clr.dll!JITutil_MonContention, 215749, 215752.552227, , 15.28
4,   |    |- clr.dll!JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel, 170804, 170777.100191, , 12.10

如您所见，这两个进程占用了超过27%的CPU使用率（每个进程），因此非常重要。

更新3

在使用wpr.exe（由@magicandre1981建议）后：

wpr.exe -start cpu and wpr -stop result.etl

我发现FormsAuthentication和一些紧急路径上不必要的Ninject调用导致了大约16%的CPU使用率。我仍然不理解运行时间超过10分钟的线程。

更新4

尝试了DebugDiag（来自@leppie的建议），它只是确认挂起的线程都类似于：

Thread ID: 53     Total CPU Time: 00:09:11.406     Entry Point for Thread: clr!Thread::intermediateThreadProc 
Thread ID: 35     Total CPU Time: 00:07:26.046     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 50     Total CPU Time: 00:07:01.515     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 29     Total CPU Time: 00:06:02.264     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 31     Total CPU Time: 00:06:41.281     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub

或者是由于 StackExchange.Redis 的原因：

DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)+e1 
[[InlinedCallFrame] (StackExchange.Redis.SocketManager.select)] StackExchange.Redis.SocketManager.select(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValueByRef) 
StackExchange.Redis.SocketManager.ReadImpl()+889 
StackExchange.Redis.SocketManager.Read()+66

或者

[[GCFrame]] 
[[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 
mscorlib_ni!System.Threading.Monitor.Wait(System.Object, Int32)+19 
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+24f 
StackExchange.Redis.RedisBase.ExecuteSync[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+77 
[[StubHelperFrame]] 
StackExchange.Redis.RedisDatabase.SetMembers(StackExchange.Redis.RedisKey, StackExchange.Redis.CommandFlags)+ee

- Walter Macambira

3

因为WaitForSingleObjectEx()中等待的对象处于等待状态而不是执行状态，所以它不会消耗CPU。当线程陷入死锁时，它们也不会消耗CPU，因为它们都在等待资源。 - Thomas Weller

3

使用ETW/xperf/WPA来跟踪CPU使用情况：https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-42-WPT-CPU-Analysis - magicandre1981

3

分析你的GC数据：https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-33-CLR-GC-Part-1，https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-34-CLR-GC-Part-2，https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-35-CLR-GC-Part-3，https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-36-CLR-GC-Part-4。 - magicandre1981

5

在这种情况下，你应该进行性能分析而不是分析内存转储。推荐一些好的.NET性能分析工具：https://dev59.com/FnVD5IYBdhLWcg3wXaYd - Jonathan Dickinson

3

使用DebugDiag检查锁定情况，这是我的建议。 - leppie

显示剩余28条评论

2个回答

-1

这个应用程序慢，可能是因为代码执行缓慢或者.NET引擎的问题。

首先，检查clr.dll是否有问题。如果有问题，可以下载并替换它。

否则，尝试以下方法：

我认为您应该检查应用程序代码，并检查每个需要大量处理的地方，尝试平衡CPU和RAM之间的代码操作负载。循环，对象初始化或递归函数等都会给CPU带来负载，请尝试将共享对象存储在静态或常量中。

- user3538022

无法理解第二段。阅读如此大型应用程序的整个代码并不容易，但有时这可能是唯一的选择（至少是经常使用的部分）。关于使用静态共享对象，如果数据失效，则必须在所有实例上更新它，这可能会导致问题（虽然可以解决，但这不是本案例）。 - Walter Macambira

我会尽力帮忙 :) 也许对我来说很难理解这个情况。 - user3538022

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- efaruk · Accepted Answer

手动处理需要勇气 ;) 请查看这个官方的微软DebugDiag 2.2: https://www.microsoft.com/en-us/download/details.aspx?id=49924 它带有分析器，因此您不必亲自动手。使用DebugDiag，我认为你会发现问题比以往任何时候都要更快地解决...