Closed Bug 1346415 Opened 8 years ago Closed 8 years ago

We need more native stacks from BHR

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: nika)

References

Details

Attachments

(2 files, 4 obsolete files)

Right now BHR only attempts to collect native stacks for hangs over 8 seconds. It's sad that there is data that we can collect, but 8 seconds is way too long to be super useful for Quantum Flow. I'd like to try to get this number down as much as we possibly can to capture as much native stacks as we can. Things to consider: * Data storage: This data presumably costs us something in terms of storage space, so we should be careful when reducing the 8 second time. If that means we want to have different thresholds per channel where we have different numbers of users, that's OK. * Stack walking efficiency: now that we have frame pointers on, I'm not too much worried about that, but we should probably measure on the three platforms. I care about Windows the most for now. * Crashiness: in theory with frame pointers stack walking shouldn't crash, but... we need to be careful post-landing The code: http://searchfox.org/mozilla-central/rev/b035220d0f939559f4f8cf7b9079bc12f6adfc35/xpcom/build/XPCOMInit.cpp#507 Michael, any chance you can please take this on? This is P0 for Quantum Flow. :-) Thanks!
Flags: needinfo?(michael)
Currently waiting for some email replies before we can figure out what level produces an acceptable amount of data for native stacks.
Assignee: nobody → michael
Flags: needinfo?(michael)
Whiteboard: [qf:p1]
This is my first pass at collecting all native stacks for hangs over 128ms on nightly builds. I set a limit of at most 300 native stacks collected for a given process to try to keep the size down. MozReview-Commit-ID: 4LVBFHkeTdD
Attachment #8849731 - Flags: review?(gfritzsche)
(In reply to Michael Layzell [:mystor] from comment #2) > Created attachment 8849731 [details] [diff] [review] > Collect native stacks for all hangs over 128ms on Nightly > > This is my first pass at collecting all native stacks for hangs over 128ms > on nightly builds. I set a limit of at most 300 native stacks collected for > a given process to try to keep the size down. Do you have an estimate of what that will add to the raw payload size in the worst case? Will measuring the performance impact happen as part of landing or QA?
Flags: needinfo?(michael)
(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #3) > Do you have an estimate of what that will add to the raw payload size in the > worst case? IIRC when I looked at the sizes of these payloads for the day I mention in the patch, the largest size which the payload could have was ~900 bytes, which would be about 270kB. When mconley estimated a conservative maximum size for a stack, he estimated 1875 bytes - which would be ~560kB. > Will measuring the performance impact happen as part of landing or QA? I'm hoping that this has a low enough perf impact. If we notice a perf impact on nightly (which should be the only channel which is changed), it should be easy enough to back out I would think.
Flags: needinfo?(michael)
(In reply to Michael Layzell [:mystor] from comment #4) > (In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from > comment #3) > > Do you have an estimate of what that will add to the raw payload size in the > > worst case? > > IIRC when I looked at the sizes of these payloads for the day I mention in > the patch, the largest size which the payload could have was ~900 bytes, > which would be about 270kB. When mconley estimated a conservative maximum > size for a stack, he estimated 1875 bytes - which would be ~560kB. @mreid, any concerns here? > > > Will measuring the performance impact happen as part of landing or QA? > > I'm hoping that this has a low enough perf impact. If we notice a perf > impact on nightly (which should be the only channel which is changed), it > should be easy enough to back out I would think. How will you notice the performance impact?
Flags: needinfo?(mreid)
Comment on attachment 8849731 [details] [diff] [review] Collect native stacks for all hangs over 128ms on Nightly Review of attachment 8849731 [details] [diff] [review]: ----------------------------------------------------------------- Mike, can you review this given your recent work on this? Otherwise maybe froydnj or mstange? ::: toolkit/components/telemetry/ThreadHangStats.h @@ +27,5 @@ > +// > +// NOTE: 300 native hang stacks would, by a rough estimation based on stacks > +// collected from nightly on March 21, 2017, take up approximately 168kb of > +// space. > +static const uint32_t kMaximumNativeHangStacks = 300; This is only used internally in BackgroundHangMonitor.cpp? Please move it there.
Attachment #8849731 - Flags: review?(gfritzsche) → review?(mconley)
Comment on attachment 8849731 [details] [diff] [review] Collect native stacks for all hangs over 128ms on Nightly Review of attachment 8849731 [details] [diff] [review]: ----------------------------------------------------------------- This seems good to me. I have a note though about Compositor - see below. Maybe good fodder for follow-up work. ::: toolkit/components/telemetry/ThreadHangStats.h @@ +27,5 @@ > +// > +// NOTE: 300 native hang stacks would, by a rough estimation based on stacks > +// collected from nightly on March 21, 2017, take up approximately 168kb of > +// space. > +static const uint32_t kMaximumNativeHangStacks = 300; I agree that BackgroundHangMonitor.cpp is likely where it belongs. ::: xpcom/build/XPCOMInit.cpp @@ +504,5 @@ > } > > + // Set experimental values for main thread hangs: > + // 128ms for transient hangs and 8192ms for permanent hangs > + static const uint32_t kTransientHangTimeout = 128; Might want to put these in BackgroundHangMonitor.cpp, since there might be other threads that we want to follow the same rules (example, Compositor: http://searchfox.org/mozilla-central/rev/9af9b1c7946ebef6056d2ee4c368d496395cf1c8/gfx/layers/ipc/CompositorThread.cpp#94). Might be good fodder for a follow-up though.
Attachment #8849731 - Flags: review?(mconley) → review+
(In reply to Mike Conley (:mconley) (Offsite until March 27) from comment #7) > Might want to put these in BackgroundHangMonitor.cpp, since there might be > other threads that we want to follow the same rules (example, Compositor: > http://searchfox.org/mozilla-central/rev/ > 9af9b1c7946ebef6056d2ee4c368d496395cf1c8/gfx/layers/ipc/CompositorThread. > cpp#94). > > Might be good fodder for a follow-up though. Currently those threads use a completely different mechanism for setting up their BHRs. I am inclined to think that we should handle that case separately. In addition, there is bug 1350412 which ehsan filed, because he hasn't seen any hangs from the compositor thread actually show up in telemetry. (In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #5) > How will you notice the performance impact? TBH, I'm not sure. I doubt that this will have a significant perf impact (especially because we'll do it _at most_ 300 times in the lifetime of a process), but if it starts showing up in people's profiles and stuff we can of course back it out.
(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #5) > (In reply to Michael Layzell [:mystor] from comment #4) > > (In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from > > comment #3) > > > Do you have an estimate of what that will add to the raw payload size in the > > > worst case? > > > > IIRC when I looked at the sizes of these payloads for the day I mention in > > the patch, the largest size which the payload could have was ~900 bytes, > > which would be about 270kB. When mconley estimated a conservative maximum > > size for a stack, he estimated 1875 bytes - which would be ~560kB. > > @mreid, any concerns here? This should be fine on Nightly/Aurora, but we should revisit this before we increase the payload size this much on Beta and beyond. In terms of client impact, is there a histogram that measures how long it takes to submit a ping? If we do, we should keep an eye on it to see if we're in any danger of increasing the rate of submission timeouts or other upload failures.
Flags: needinfo?(mreid)
We have a few introspective probes (TELEMETRY_* histograms). There are two that measure how long it takes to submit a ping: TELEMETRY_SEND_SUCCESS and TELEMETRY_SEND_FAILURE
Backed out for frequently failing toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js and likely also breaking Windows 8 x64 opt builds: https://hg.mozilla.org/integration/mozilla-inbound/rev/b0bac8c5930689a8edfe702db5f358c14ed1538e Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=460c2bcc15d3259df10dd84d3851b01145df052e&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log xpcshell: https://treeherder.mozilla.org/logviewer.html#?job_id=87277713&repo=mozilla-inbound [task 2017-03-29T15:26:06.125959Z] 15:26:06 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 80] true == true [task 2017-03-29T15:26:06.131329Z] 15:26:06 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 81] 1 != 0 [task 2017-03-29T15:26:06.134083Z] 15:26:06 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 83] true == true [task 2017-03-29T15:26:06.136635Z] 15:26:06 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 84] 0 != 0 [task 2017-03-29T15:26:06.139365Z] 15:26:06 INFO - /home/worker/workspace/build/tests/xpcshell/tests/toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js:check_results:84 [task 2017-03-29T15:26:06.142999Z] 15:26:06 INFO - /home/worker/workspace/build/tests/xpcshell/tests/toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js:run_test/<:103 [task 2017-03-29T15:26:06.145551Z] 15:26:06 INFO - /home/worker/workspace/build/tests/xpcshell/head.js:run:703 [task 2017-03-29T15:26:06.151003Z] 15:26:06 INFO - /home/worker/workspace/build/tests/xpcshell/head.js:_do_main:211 [task 2017-03-29T15:26:06.153275Z] 15:26:06 INFO - /home/worker/workspace/build/tests/xpcshell/head.js:_execute_test:546 [task 2017-03-29T15:26:06.155611Z] 15:26:06 INFO - -e:null:1
Flags: needinfo?(michael)
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/aa882c81840b Collect native stacks for all hangs over 128ms on Nightly, r=gfritzsche
There seems to be a cubeb assertion failure on try which I'm assuming is caused by this patch - I'm hoping to look into it soon. https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd83830458559883119cd493974ff36fd608142d MozReview-Commit-ID: CAHarvGSuTY
Attachment #8857506 - Flags: review?(n.nethercote)
Flags: needinfo?(michael)
Comment on attachment 8857506 [details] [diff] [review] Use FramePointerStackWalk for less deadlocking when stackwalking on x86 Review of attachment 8857506 [details] [diff] [review]: ----------------------------------------------------------------- r+ with changes for the tools/profiler/ changes. For the xpcom/ changes: I don't understand the differences between the various stack walking functions, so I will defer to glandium. ::: tools/profiler/core/platform.cpp @@ +3100,5 @@ > + for (uint32_t i = 0; i < threads.size(); ++i) { > + if (threads[i]->ThreadId() == aThreadId) { > + return threads[i]->StackTop(); > + } > + } Please use FindLiveThreadInfo() to find the ThreadInfo. ::: xpcom/threads/ThreadStackHelper.cpp @@ +243,5 @@ > + MOZ_ASSERT(stackTop, "The thread should be registered by the profiler"); > + > + // Double check that the values we pulled for the thread make sense before > + // walking the stack. > + if (stackTop && framePointer >= stackPointer && framePointer < stackTop) { I generally write range conditions like this: |k1 <= x && x < k2| because I find them much easier to read that way. @@ +248,5 @@ > + FramePointerStackWalk(callback, /* skipFrames */ 0, > + /* maxFrames */ Telemetry::HangStack::sMaxNativeFrames, > + reinterpret_cast<void*>(&aStack), framePointer, > + stackTop); > + } https://dxr.mozilla.org/mozilla-central/rev/f40e24f40b4c4556944c762d4764eace261297f5/tools/profiler/core/platform.cpp#806-820 is similar looking code, but it uses FramePointerStackWalk for Mac and Win32 and MozStackWalk for Win64. glandium might have suggestions here.
Attachment #8857506 - Flags: review?(n.nethercote)
Attachment #8857506 - Flags: review?(mh+mozilla)
Attachment #8857506 - Flags: review+
(In reply to Nicholas Nethercote [:njn] from comment #16) > Comment on attachment 8857506 [details] [diff] [review] > Use FramePointerStackWalk for less deadlocking when stackwalking on x86 > > Review of attachment 8857506 [details] [diff] [review]: > ----------------------------------------------------------------- > > r+ with changes for the tools/profiler/ changes. > > For the xpcom/ changes: I don't understand the differences between the > various stack walking functions, so I will defer to glandium. > > ::: tools/profiler/core/platform.cpp > @@ +3100,5 @@ > > + for (uint32_t i = 0; i < threads.size(); ++i) { > > + if (threads[i]->ThreadId() == aThreadId) { > > + return threads[i]->StackTop(); > > + } > > + } > > Please use FindLiveThreadInfo() to find the ThreadInfo. The ThreadID which I am using to get the stack top here is not necessarily the current thread's threadId - so I don't think I can use that function unless I add a new parameter to it. > https://dxr.mozilla.org/mozilla-central/rev/ > f40e24f40b4c4556944c762d4764eace261297f5/tools/profiler/core/platform. > cpp#806-820 is similar looking code, but it uses FramePointerStackWalk for > Mac and Win32 and MozStackWalk for Win64. glandium might have suggestions > here. For this patch I am changing from MozStackWalk to FramePointerStackWalk to avoid a bug which would come up decently frequently (on infra at least) where the program would deadlock as the suspended thread is holding the windows internal lock on the function table while we're trying to walk the stack with MozStackWalk (which also tries to lock that table). FramePointerStackWalk is done entirely in userspace and doesn't use that lock meaning that we can avoid the unfortunate deadlock. Unfortunately we don't have frame pointers reliably on 64-bit so I am disabling stack collection on that platform.
> For this patch I am changing from MozStackWalk to FramePointerStackWalk to > avoid a bug which would come up decently frequently (on infra at least) > where the program would deadlock as the suspended thread is holding the > windows internal lock on the function table while we're trying to walk the > stack with MozStackWalk (which also tries to lock that table). > FramePointerStackWalk is done entirely in userspace and doesn't use that > lock meaning that we can avoid the unfortunate deadlock. > > Unfortunately we don't have frame pointers reliably on 64-bit so I am > disabling stack collection on that platform. All this info would be extremely useful to put into a comment.
Updated to add the comment requested MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859364 - Flags: review?(mh+mozilla)
Attachment #8849731 - Attachment is obsolete: true
Attachment #8857506 - Attachment is obsolete: true
Attachment #8857506 - Flags: review?(mh+mozilla)
As reviews on this have taken a while, I've had some time to think about how best to implement this patch. I think that I've changed my mind about how we should implement this increased native stack collection on nightly to instead collect these native stacks at the same time that we collect pseudostacks when on nightly. This has the advantage of not moving our permahang timer down, meaning that we can get more accurate hang measurements for long hangs. This also has the advantage that it ensures that there is a meaningful relationship between the pseudostack and native stack which we measure. Unfortunately, with this patch we don't currently read in the pseudostack and native stack within the same pausing of the target thread, meaning that the target thread could make progress between these two stack readings. I hope to change this in a future patch, but I wanted to post this minimal version in case the review on the FramePointerStackWalk came in.
Attachment #8859365 - Flags: review?(mconley)
Comment on attachment 8859364 [details] [diff] [review] Use FramePointerStackWalk for less deadlocking when stackwalking on x86 Review of attachment 8859364 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +3063,5 @@ > + > + const PS::ThreadVector& threads = gPS->LiveThreads(lock); > + for (uint32_t i = 0; i < threads.size(); ++i) { > + if (threads[i]->ThreadId() == aThreadId) { > + return threads[i]->StackTop(); Does it matter that this is doing a linear search in a possibly ever growing list of threads? ::: xpcom/threads/ThreadStackHelper.cpp @@ +204,5 @@ > > + // NOTE: We can only perform frame pointer stack walking on non win64 > + // platforms, because Win64 always omits frame pointers. We don't want to use > + // MozStackWalk here, so we just skip collecting stacks entirely. > +#ifdef MOZ_THREADSTACKHELPER_X86 #ifndef MOZ_THREADSTACKHELPER_X64 ? @@ +254,5 @@ > + // > + // FramePointerStackWalk is implemented entirely in userspace and thus > + // doesn't have the same issues with deadlocking. Unfortunately as 64-bit > + // windows is not guaranteed to have frame pointers, the stack walking > + // code is only enabled on 32-bit windows builds. Condidering we're pushing for more people to use the 64-bits builds, it seems like this should be fixed somehow (which unfortunately sounds like having to implement a stack walker entirely (or find a third-party one)). File a followup and add the bug number in the comment?
Attachment #8859364 - Flags: review?(mh+mozilla) → review+
Status: NEW → ASSIGNED
Comment on attachment 8859365 [details] [diff] [review] Collect native stacks at the same time as pseudostacks on nightly Review of attachment 8859365 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! Thanks!
Attachment #8859365 - Flags: review?(mconley) → review+
I changed the logic for BHR to walk the ThreadInfo array once during thread startup instead of every time a hang is detected, I think this should still work (doing a test run on try). I'd like you to take a second look over it before I land it. MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859669 - Flags: review?(mh+mozilla)
Attachment #8859364 - Attachment is obsolete: true
I forgot to add the header to the commit - oops. MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859696 - Flags: review?(mh+mozilla)
Attachment #8859669 - Attachment is obsolete: true
Attachment #8859669 - Flags: review?(mh+mozilla)
Comment on attachment 8859696 [details] [diff] [review] Use FramePointerStackWalk for less deadlocking when stackwalking on x86 Hey, glandium hasn't looked at this in a while, and I'd like to get this landed soon. Most of the changes are in xpcom so I figure you could perhaps look at it? Thanks.
Attachment #8859696 - Flags: review?(mh+mozilla) → review?(nfroyd)
Comment on attachment 8859696 [details] [diff] [review] Use FramePointerStackWalk for less deadlocking when stackwalking on x86 Review of attachment 8859696 [details] [diff] [review]: ----------------------------------------------------------------- Not willing to r+ this yet. (glandium is on PTO, btw.) ::: xpcom/threads/ThreadStackHelper.cpp @@ +205,5 @@ > > + // NOTE: We can only perform frame pointer stack walking on non win64 > + // platforms, because Win64 always omits frame pointers. We don't want to use > + // MozStackWalk here, so we just skip collecting stacks entirely. > +#ifdef MOZ_THREADSTACKHELPER_X86 glandium asked whether this should be #ifndef MOZ_THREADSTACKHELPER_X64 in comment 21, and I didn't see a response... It seems peculiar that the comment is focused on win64, and then the #ifdef is about x86. @@ +246,5 @@ > + // walking the stack. > + if (mStackTop && framePointer >= stackPointer && framePointer < mStackTop) { > + // NOTE: In bug 1346415 this was changed to use FramePointerStackWalk. > + // This was done because lowering the background hang timer threshold > + // would cause it to fire on infra early during the boot process, causing Nit: "startup" instead of "boot"? We have things that try to workaround the win64 issues: http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk_windows.h#7 http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#480 http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#346 Can we not use these functions somehow? It would depend on the exact way in which things fail on infra.
ni? for comment 27.
Flags: needinfo?(michael)
(In reply to Nathan Froyd [:froydnj] from comment #27) > Nit: "startup" instead of "boot"? > > We have things that try to workaround the win64 issues: > > http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk_windows. > h#7 > http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#480 > http://dxr.mozilla.org/mozilla-central/source/mozglue/misc/StackWalk.cpp#346 > > Can we not use these functions somehow? It would depend on the exact way in > which things fail on infra. I'll look into using those functions. I was not aware of their existence and they would probably fix the problem I was running into. I think I would like to do that in a follow-up however because collecting this data ASAP seems like a good thing to do. Would you be OK with R+-ing this if I change the ifdef to be `#ifndef MOZ_THREADSTACKHELPER_X64` and file a follow up which I will work on immediately to get it working on x64 with those functions?
Flags: needinfo?(michael) → needinfo?(nfroyd)
(In reply to Michael Layzell [:mystor] from comment #29) > I'll look into using those functions. I was not aware of their existence and > they would probably fix the problem I was running into. I think I would like > to do that in a follow-up however because collecting this data ASAP seems > like a good thing to do. Would you be OK with R+-ing this if I change the > ifdef to be `#ifndef MOZ_THREADSTACKHELPER_X64` and file a follow up which I > will work on immediately to get it working on x64 with those functions? "Do you solemnly swear to devote the entirety of your workday, neither checking Hacker News nor wading into flamewars on reddit, to working on, with all speed and haste, the support of this functionality on Win64?" Data is better than no data, I suppose. Let's land this.
Flags: needinfo?(nfroyd)
Comment on attachment 8859696 [details] [diff] [review] Use FramePointerStackWalk for less deadlocking when stackwalking on x86 Review of attachment 8859696 [details] [diff] [review]: ----------------------------------------------------------------- r+ with changes previously discussed.
Attachment #8859696 - Flags: review?(nfroyd) → review+
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0d6724dbb127 Use FramePointerStackWalk for less deadlocking when stackwalking on x86, r=njn https://hg.mozilla.org/integration/mozilla-inbound/rev/fb51b9217c6c Collect native stacks at the same time as pseudostacks on nightly, r=mconley
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/370e948c7b07 Initialize mNativeStackCnt in ThreadHangStats constructor, a=valgrind-bustage
Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/adabba6c5f74 Disable the native stack gathering test outside of windows x86, a=xpc-bustage
Depends on: 1364243
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: