We need more native stacks from BHR

ASSIGNED
Assigned to

Status

()

Core
XPCOM
ASSIGNED
2 months ago
3 days ago

People

(Reporter: Ehsan, Assigned: mystor)

Tracking

(Blocks: 5 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qf:p1])

Attachments

(2 attachments, 4 obsolete attachments)

Right now BHR only attempts to collect native stacks for hangs over 8 seconds.  It's sad that there is data that we can collect, but 8 seconds is way too long to be super useful for Quantum Flow.  I'd like to try to get this number down as much as we possibly can to capture as much native stacks as we can.

Things to consider:

 * Data storage: This data presumably costs us something in terms of storage space, so we should be careful when reducing the 8 second time.  If that means we want to have different thresholds per channel where we have different numbers of users, that's OK.

 * Stack walking efficiency: now that we have frame pointers on, I'm not too much worried about that, but we should probably measure on the three platforms.  I care about Windows the most for now.

 * Crashiness: in theory with frame pointers stack walking shouldn't crash, but... we need to be careful post-landing

The code: http://searchfox.org/mozilla-central/rev/b035220d0f939559f4f8cf7b9079bc12f6adfc35/xpcom/build/XPCOMInit.cpp#507

Michael, any chance you can please take this on?  This is P0 for Quantum Flow.  :-)  Thanks!
Flags: needinfo?(michael)
(Assignee)

Comment 1

a month ago
Currently waiting for some email replies before we can figure out what level produces an acceptable amount of data for native stacks.
Assignee: nobody → michael
Flags: needinfo?(michael)
Whiteboard: [qf:p1]
(Assignee)

Comment 2

a month ago
Created attachment 8849731 [details] [diff] [review]
Collect native stacks for all hangs over 128ms on Nightly

This is my first pass at collecting all native stacks for hangs over 128ms on nightly builds. I set a limit of at most 300 native stacks collected for a given process to try to keep the size down.

MozReview-Commit-ID: 4LVBFHkeTdD
Attachment #8849731 - Flags: review?(gfritzsche)
(In reply to Michael Layzell [:mystor] from comment #2)
> Created attachment 8849731 [details] [diff] [review]
> Collect native stacks for all hangs over 128ms on Nightly
> 
> This is my first pass at collecting all native stacks for hangs over 128ms
> on nightly builds. I set a limit of at most 300 native stacks collected for
> a given process to try to keep the size down.

Do you have an estimate of what that will add to the raw payload size in the worst case?

Will measuring the performance impact happen as part of landing or QA?
Flags: needinfo?(michael)
(Assignee)

Comment 4

a month ago
(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #3)
> Do you have an estimate of what that will add to the raw payload size in the
> worst case?

IIRC when I looked at the sizes of these payloads for the day I mention in the patch, the largest size which the payload could have was ~900 bytes, which would be about 270kB. When mconley estimated a conservative maximum size for a stack, he estimated 1875 bytes - which would be ~560kB.

> Will measuring the performance impact happen as part of landing or QA?

I'm hoping that this has a low enough perf impact. If we notice a perf impact on nightly (which should be the only channel which is changed), it should be easy enough to back out I would think.
Flags: needinfo?(michael)
(In reply to Michael Layzell [:mystor] from comment #4)
> (In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from
> comment #3)
> > Do you have an estimate of what that will add to the raw payload size in the
> > worst case?
> 
> IIRC when I looked at the sizes of these payloads for the day I mention in
> the patch, the largest size which the payload could have was ~900 bytes,
> which would be about 270kB. When mconley estimated a conservative maximum
> size for a stack, he estimated 1875 bytes - which would be ~560kB.

@mreid, any concerns here?

>
> > Will measuring the performance impact happen as part of landing or QA?
> 
> I'm hoping that this has a low enough perf impact. If we notice a perf
> impact on nightly (which should be the only channel which is changed), it
> should be easy enough to back out I would think.

How will you notice the performance impact?
Flags: needinfo?(mreid)
Comment on attachment 8849731 [details] [diff] [review]
Collect native stacks for all hangs over 128ms on Nightly

Review of attachment 8849731 [details] [diff] [review]:
-----------------------------------------------------------------

Mike, can you review this given your recent work on this?
Otherwise maybe froydnj or mstange?

::: toolkit/components/telemetry/ThreadHangStats.h
@@ +27,5 @@
> +//
> +// NOTE: 300 native hang stacks would, by a rough estimation based on stacks
> +// collected from nightly on March 21, 2017, take up approximately 168kb of
> +// space.
> +static const uint32_t kMaximumNativeHangStacks = 300;

This is only used internally in BackgroundHangMonitor.cpp?
Please move it there.
Attachment #8849731 - Flags: review?(gfritzsche) → review?(mconley)
Comment on attachment 8849731 [details] [diff] [review]
Collect native stacks for all hangs over 128ms on Nightly

Review of attachment 8849731 [details] [diff] [review]:
-----------------------------------------------------------------

This seems good to me. I have a note though about Compositor - see below. Maybe good fodder for follow-up work.

::: toolkit/components/telemetry/ThreadHangStats.h
@@ +27,5 @@
> +//
> +// NOTE: 300 native hang stacks would, by a rough estimation based on stacks
> +// collected from nightly on March 21, 2017, take up approximately 168kb of
> +// space.
> +static const uint32_t kMaximumNativeHangStacks = 300;

I agree that BackgroundHangMonitor.cpp is likely where it belongs.

::: xpcom/build/XPCOMInit.cpp
@@ +504,5 @@
>    }
>  
> +  // Set experimental values for main thread hangs:
> +  // 128ms for transient hangs and 8192ms for permanent hangs
> +  static const uint32_t kTransientHangTimeout = 128;

Might want to put these in BackgroundHangMonitor.cpp, since there might be other threads that we want to follow the same rules (example, Compositor: http://searchfox.org/mozilla-central/rev/9af9b1c7946ebef6056d2ee4c368d496395cf1c8/gfx/layers/ipc/CompositorThread.cpp#94).

Might be good fodder for a follow-up though.
Attachment #8849731 - Flags: review?(mconley) → review+
(Assignee)

Comment 8

a month ago
(In reply to Mike Conley (:mconley) (Offsite until March 27) from comment #7)
> Might want to put these in BackgroundHangMonitor.cpp, since there might be
> other threads that we want to follow the same rules (example, Compositor:
> http://searchfox.org/mozilla-central/rev/
> 9af9b1c7946ebef6056d2ee4c368d496395cf1c8/gfx/layers/ipc/CompositorThread.
> cpp#94).
> 
> Might be good fodder for a follow-up though.

Currently those threads use a completely different mechanism for setting up their BHRs. I am inclined to think that we should handle that case separately. In addition, there is bug 1350412 which ehsan filed, because he hasn't seen any hangs from the compositor thread actually show up in telemetry.

(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #5)
> How will you notice the performance impact?

TBH, I'm not sure. I doubt that this will have a significant perf impact (especially because we'll do it _at most_ 300 times in the lifetime of a process), but if it starts showing up in people's profiles and stuff we can of course back it out.

Comment 9

28 days ago
(In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from comment #5)
> (In reply to Michael Layzell [:mystor] from comment #4)
> > (In reply to Georg Fritzsche [:gfritzsche] [away March 24 - April 2] from
> > comment #3)
> > > Do you have an estimate of what that will add to the raw payload size in the
> > > worst case?
> > 
> > IIRC when I looked at the sizes of these payloads for the day I mention in
> > the patch, the largest size which the payload could have was ~900 bytes,
> > which would be about 270kB. When mconley estimated a conservative maximum
> > size for a stack, he estimated 1875 bytes - which would be ~560kB.
> 
> @mreid, any concerns here?

This should be fine on Nightly/Aurora, but we should revisit this before we increase the payload size this much on Beta and beyond.

In terms of client impact, is there a histogram that measures how long it takes to submit a ping? If we do, we should keep an eye on it to see if we're in any danger of increasing the rate of submission timeouts or other upload failures.
Flags: needinfo?(mreid)

Comment 10

28 days ago
We have a few introspective probes (TELEMETRY_* histograms). There are two that measure how long it takes to submit a ping: TELEMETRY_SEND_SUCCESS and TELEMETRY_SEND_FAILURE
Backed out for frequently failing toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js and likely also breaking Windows 8 x64 opt builds:

https://hg.mozilla.org/integration/mozilla-inbound/rev/b0bac8c5930689a8edfe702db5f358c14ed1538e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=460c2bcc15d3259df10dd84d3851b01145df052e&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log xpcshell: https://treeherder.mozilla.org/logviewer.html#?job_id=87277713&repo=mozilla-inbound

[task 2017-03-29T15:26:06.125959Z] 15:26:06     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 80] true == true
[task 2017-03-29T15:26:06.131329Z] 15:26:06     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 81] 1 != 0
[task 2017-03-29T15:26:06.134083Z] 15:26:06     INFO -  TEST-PASS | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 83] true == true
[task 2017-03-29T15:26:06.136635Z] 15:26:06  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js | check_results - [check_results : 84] 0 != 0
[task 2017-03-29T15:26:06.139365Z] 15:26:06     INFO -  /home/worker/workspace/build/tests/xpcshell/tests/toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js:check_results:84
[task 2017-03-29T15:26:06.142999Z] 15:26:06     INFO -  /home/worker/workspace/build/tests/xpcshell/tests/toolkit/components/telemetry/tests/unit/test_ThreadHangStats.js:run_test/<:103
[task 2017-03-29T15:26:06.145551Z] 15:26:06     INFO -  /home/worker/workspace/build/tests/xpcshell/head.js:run:703
[task 2017-03-29T15:26:06.151003Z] 15:26:06     INFO -  /home/worker/workspace/build/tests/xpcshell/head.js:_do_main:211
[task 2017-03-29T15:26:06.153275Z] 15:26:06     INFO -  /home/worker/workspace/build/tests/xpcshell/head.js:_execute_test:546
[task 2017-03-29T15:26:06.155611Z] 15:26:06     INFO -  -e:null:1
Flags: needinfo?(michael)

Comment 12

26 days ago
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa882c81840b
Collect native stacks for all hangs over 128ms on Nightly, r=gfritzsche
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c29fedc22a05
https://treeherder.mozilla.org/#/jobs?repo=try&revision=02315227eb85

Updated

13 days ago
Blocks: 1345904
(Assignee)

Comment 15

12 days ago
Created attachment 8857506 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

There seems to be a cubeb assertion failure on try which I'm assuming is caused by this patch - I'm hoping to look into it soon.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd83830458559883119cd493974ff36fd608142d

MozReview-Commit-ID: CAHarvGSuTY
Attachment #8857506 - Flags: review?(n.nethercote)
(Assignee)

Updated

12 days ago
Flags: needinfo?(michael)
Comment on attachment 8857506 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

Review of attachment 8857506 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with changes for the tools/profiler/ changes.

For the xpcom/ changes: I don't understand the differences between the various stack walking functions, so I will defer to glandium.

::: tools/profiler/core/platform.cpp
@@ +3100,5 @@
> +  for (uint32_t i = 0; i < threads.size(); ++i) {
> +    if (threads[i]->ThreadId() == aThreadId) {
> +      return threads[i]->StackTop();
> +    }
> +  }

Please use FindLiveThreadInfo() to find the ThreadInfo.

::: xpcom/threads/ThreadStackHelper.cpp
@@ +243,5 @@
> +    MOZ_ASSERT(stackTop, "The thread should be registered by the profiler");
> +
> +    // Double check that the values we pulled for the thread make sense before
> +    // walking the stack.
> +    if (stackTop && framePointer >= stackPointer && framePointer < stackTop) {

I generally write range conditions like this: |k1 <= x && x < k2| because I find them much easier to read that way.

@@ +248,5 @@
> +      FramePointerStackWalk(callback, /* skipFrames */ 0,
> +                            /* maxFrames */ Telemetry::HangStack::sMaxNativeFrames,
> +                            reinterpret_cast<void*>(&aStack), framePointer,
> +                            stackTop);
> +    }

https://dxr.mozilla.org/mozilla-central/rev/f40e24f40b4c4556944c762d4764eace261297f5/tools/profiler/core/platform.cpp#806-820 is similar looking code, but it uses FramePointerStackWalk for Mac and Win32 and MozStackWalk for Win64. glandium might have suggestions here.
Attachment #8857506 - Flags: review?(n.nethercote)
Attachment #8857506 - Flags: review?(mh+mozilla)
Attachment #8857506 - Flags: review+
(Assignee)

Comment 17

11 days ago
(In reply to Nicholas Nethercote [:njn] from comment #16)
> Comment on attachment 8857506 [details] [diff] [review]
> Use FramePointerStackWalk for less deadlocking when stackwalking on x86
> 
> Review of attachment 8857506 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with changes for the tools/profiler/ changes.
> 
> For the xpcom/ changes: I don't understand the differences between the
> various stack walking functions, so I will defer to glandium.
> 
> ::: tools/profiler/core/platform.cpp
> @@ +3100,5 @@
> > +  for (uint32_t i = 0; i < threads.size(); ++i) {
> > +    if (threads[i]->ThreadId() == aThreadId) {
> > +      return threads[i]->StackTop();
> > +    }
> > +  }
> 
> Please use FindLiveThreadInfo() to find the ThreadInfo.

The ThreadID which I am using to get the stack top here is not necessarily the current thread's threadId - so I don't think I can use that function unless I add a new parameter to it.

> https://dxr.mozilla.org/mozilla-central/rev/
> f40e24f40b4c4556944c762d4764eace261297f5/tools/profiler/core/platform.
> cpp#806-820 is similar looking code, but it uses FramePointerStackWalk for
> Mac and Win32 and MozStackWalk for Win64. glandium might have suggestions
> here.

For this patch I am changing from MozStackWalk to FramePointerStackWalk to avoid a bug which would come up decently frequently (on infra at least) where the program would deadlock as the suspended thread is holding the windows internal lock on the function table while we're trying to walk the stack with MozStackWalk (which also tries to lock that table). FramePointerStackWalk is done entirely in userspace and doesn't use that lock meaning that we can avoid the unfortunate deadlock.

Unfortunately we don't have frame pointers reliably on 64-bit so I am disabling stack collection on that platform.
> For this patch I am changing from MozStackWalk to FramePointerStackWalk to
> avoid a bug which would come up decently frequently (on infra at least)
> where the program would deadlock as the suspended thread is holding the
> windows internal lock on the function table while we're trying to walk the
> stack with MozStackWalk (which also tries to lock that table).
> FramePointerStackWalk is done entirely in userspace and doesn't use that
> lock meaning that we can avoid the unfortunate deadlock.
> 
> Unfortunately we don't have frame pointers reliably on 64-bit so I am
> disabling stack collection on that platform.

All this info would be extremely useful to put into a comment.
(Assignee)

Comment 19

6 days ago
Created attachment 8859364 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

Updated to add the comment requested

MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859364 - Flags: review?(mh+mozilla)
(Assignee)

Updated

6 days ago
Attachment #8849731 - Attachment is obsolete: true
Attachment #8857506 - Attachment is obsolete: true
Attachment #8857506 - Flags: review?(mh+mozilla)
(Assignee)

Comment 20

6 days ago
Created attachment 8859365 [details] [diff] [review]
Collect native stacks at the same time as pseudostacks on nightly

As reviews on this have taken a while, I've had some time to think about how
best to implement this patch. I think that I've changed my mind about how we
should implement this increased native stack collection on nightly to instead
collect these native stacks at the same time that we collect pseudostacks when
on nightly. This has the advantage of not moving our permahang timer down,
meaning that we can get more accurate hang measurements for long hangs.

This also has the advantage that it ensures that there is a meaningful
relationship between the pseudostack and native stack which we measure.

Unfortunately, with this patch we don't currently read in the pseudostack and
native stack within the same pausing of the target thread, meaning that the
target thread could make progress between these two stack readings. I hope to
change this in a future patch, but I wanted to post this minimal version in case
the review on the FramePointerStackWalk came in.
Attachment #8859365 - Flags: review?(mconley)
Comment on attachment 8859364 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

Review of attachment 8859364 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +3063,5 @@
> +
> +  const PS::ThreadVector& threads = gPS->LiveThreads(lock);
> +  for (uint32_t i = 0; i < threads.size(); ++i) {
> +    if (threads[i]->ThreadId() == aThreadId) {
> +      return threads[i]->StackTop();

Does it matter that this is doing a linear search in a possibly ever growing list of threads?

::: xpcom/threads/ThreadStackHelper.cpp
@@ +204,5 @@
>  
> +  // NOTE: We can only perform frame pointer stack walking on non win64
> +  // platforms, because Win64 always omits frame pointers. We don't want to use
> +  // MozStackWalk here, so we just skip collecting stacks entirely.
> +#ifdef MOZ_THREADSTACKHELPER_X86

#ifndef MOZ_THREADSTACKHELPER_X64 ?

@@ +254,5 @@
> +      //
> +      // FramePointerStackWalk is implemented entirely in userspace and thus
> +      // doesn't have the same issues with deadlocking. Unfortunately as 64-bit
> +      // windows is not guaranteed to have frame pointers, the stack walking
> +      // code is only enabled on 32-bit windows builds.

Condidering we're pushing for more people to use the 64-bits builds, it seems like this should be fixed somehow (which unfortunately sounds like having to implement a stack walker entirely (or find a third-party one)). File a followup and add the bug number in the comment?
Attachment #8859364 - Flags: review?(mh+mozilla) → review+
Status: NEW → ASSIGNED
Comment on attachment 8859365 [details] [diff] [review]
Collect native stacks at the same time as pseudostacks on nightly

Review of attachment 8859365 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! Thanks!
Attachment #8859365 - Flags: review?(mconley) → review+
(Assignee)

Updated

5 days ago
Blocks: 1357829
(Assignee)

Comment 23

5 days ago
Created attachment 8859669 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

I changed the logic for BHR to walk the ThreadInfo array once during thread
startup instead of every time a hang is detected, I think this should still work
(doing a test run on try).

I'd like you to take a second look over it before I land it.

MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859669 - Flags: review?(mh+mozilla)
(Assignee)

Updated

5 days ago
Attachment #8859364 - Attachment is obsolete: true
(Assignee)

Comment 24

5 days ago
Created attachment 8859696 [details] [diff] [review]
Use FramePointerStackWalk for less deadlocking when stackwalking on x86

I forgot to add the header to the commit - oops.

MozReview-Commit-ID: CAHarvGSuTY
Attachment #8859696 - Flags: review?(mh+mozilla)
(Assignee)

Updated

5 days ago
Attachment #8859669 - Attachment is obsolete: true
Attachment #8859669 - Flags: review?(mh+mozilla)
(Assignee)

Comment 25

5 days ago
This is the try run with the extra headed included: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c43601142391ba0d190ce45525203b985db2607e
(Assignee)

Updated

3 days ago
Blocks: 1358619
You need to log in before you can comment on or make changes to this bug.