Closed Bug 1443329 Opened 3 years ago Closed 2 years ago

BHR triggers for ForcePaint don't account for all scenarios

Categories

(Core :: DOM: Content Processes, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: dthayer, Assigned: dthayer)

References

Details

(Whiteboard: [bhr][fxperf:p1])

Attachments

(2 files)

When sending RenderLayers to the content process, we:

- Race the message with a ForcePaint message to the hang monitor thread
- Then, in that thread, we call NotifyActivity for the BHR monitor
- Then, we trigger a JS interrupt, which will call RecvRenderLayers if JS was running
- When exiting RecvRenderLayers, we call NotifyWait on the BHR monitor, ending the "hang" and reporting a sample stack

This flow works for most cases, but it doesn't seem to account for the case where the ForcePaint message loses its race by a large enough margin that RecvRenderLayers has already exited. In this scenario, the chain of events would be:

- RecvRenderLayers -> BHR NotifyWait()
- ForcePaint -> BHR NotifyActivity()
- Interrupt JS (no JS is running) -> BHR registers a permahang, because we don't NotifyWait until the next RecvRenderLayers
Confirmed by adding a sleep in between sending RenderLayers and ForcePaint to the content process. Doing a sleep of any duration >100ms yields a 1024ms "hang" (1024 ms is the cut-off point for the ForcePaint BHR monitor.)
Comment on attachment 8956274 [details]
Bug 1443329 - Don't notify BHR if ForcePaint loses its race

https://reviewboard.mozilla.org/r/225166/#review231326

Thanks for finding this, dthayer. Hopefully this will help clean up our BHR data a bit!

::: dom/ipc/ProcessHangMonitor.cpp:427
(Diff revision 1)
> +    if (mForcePaintEpoch >= aLayerObserverEpoch) {
> +      return IPC_OK();
> +    }
> +    mForcePaintMonitor->NotifyActivity();

I guess this means (or rather, I guess this has always meant) that we don't ever gather BHR stacks whenever ProcessHangMonitor loses the race.

I _assume_ that's a rare enough case that we should still be able to get some useful data about hangs here. From your testing, is my assumption grounded in reality?

In either case, I wonder if it'd add too much complication for us to trigger a NotifyActivity from the RecvRenderLayers on the main thread as well. Perhaps fodder for a follow-up.

::: dom/ipc/ProcessHangMonitor.cpp:427
(Diff revision 1)
>  
> -  mForcePaintMonitor->NotifyActivity();
> -
>    {
>      MonitorAutoLock lock(mMonitor);
> +    if (mForcePaintEpoch >= aLayerObserverEpoch) {

Can you please add an inline comment describing the race-losing case that we're dealing with here? Same with the ClearForcePaint function where we check the epoch.
Attachment #8956274 - Flags: review?(mconley) → review+
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #3)
> I _assume_ that's a rare enough case that we should still be able to get
> some useful data about hangs here. From your testing, is my assumption
> grounded in reality?

Hmm, now that you mention it I think we should probably just get it out of the way in this bug. Should be pretty simple.
Component: Tabbed Browser → DOM: Content Processes
Product: Firefox → Core
(In reply to Doug Thayer [:dthayer] from comment #4)
> Hmm, now that you mention it I think we should probably just get it out of
> the way in this bug. Should be pretty simple.

In the process of testing the change for this I'm noticing that BHR is grossly misreporting the duration of ForcePaint hangs. Not sure why though. Looking into it.
Depends on: 1443688
So essentially the ForcePaint BHR data is just garbage, since it should hit the case from Bug 1443329 100% of the time. Going to fix that and then apply the changes from this on top of it.
Whiteboard: [bhr][fxperf:p1]
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Priority: -- → P1
Comment on attachment 8960292 [details]
Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR

https://reviewboard.mozilla.org/r/229062/#review234844

Clearing review request until I hear more about the Atomics thing. Maybe I'm just misunderstanding - but if so, I'd like to know where my understanding is falling over.

::: dom/ipc/ProcessHangMonitor.cpp:453
(Diff revision 1)
>  
> +
> +void
> +HangMonitorChild::MaybeStartForcePaint()
> +{
> +  if (mBHRMonitorActive.exchange(true)) {

Wait a second... maybe I'm confused by how Atomic.exchange works, but aren't we getting back what _was_ inside mBHRMonitorActive as the return value?

So after the most recent ClearForcePaint, doesn't that mean that the first call to MaybeStartForcePaint will evaluate as false, but subsequent calls to MaybeStartForcePaint (until the next ClearForcePaint is called) will call NotifyActivity?

Or am I misunderstanding?

::: dom/ipc/ProcessHangMonitor.cpp:453
(Diff revision 1)
>  
> +
> +void
> +HangMonitorChild::MaybeStartForcePaint()
> +{
> +  if (mBHRMonitorActive.exchange(true)) {

Probably a good idea to `MOZ_RELEASE_ASSERT(IsOnThread())` here too.
Attachment #8960292 - Flags: review?(mconley)
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #9)
> Comment on attachment 8960292 [details]
> Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR
> 
> https://reviewboard.mozilla.org/r/229062/#review234844
> 
> Clearing review request until I hear more about the Atomics thing. Maybe I'm
> just misunderstanding - but if so, I'd like to know where my understanding
> is falling over.
> 
> ::: dom/ipc/ProcessHangMonitor.cpp:453
> (Diff revision 1)
> >  
> > +
> > +void
> > +HangMonitorChild::MaybeStartForcePaint()
> > +{
> > +  if (mBHRMonitorActive.exchange(true)) {
> 
> Wait a second... maybe I'm confused by how Atomic.exchange works, but aren't
> we getting back what _was_ inside mBHRMonitorActive as the return value?
> 
> So after the most recent ClearForcePaint, doesn't that mean that the first
> call to MaybeStartForcePaint will evaluate as false, but subsequent calls to
> MaybeStartForcePaint (until the next ClearForcePaint is called) will call
> NotifyActivity?
> 
> Or am I misunderstanding?

Woops, no you're totally right.
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #9)
> ::: dom/ipc/ProcessHangMonitor.cpp:453
> (Diff revision 1)
> >  
> > +
> > +void
> > +HangMonitorChild::MaybeStartForcePaint()
> > +{
> > +  if (mBHRMonitorActive.exchange(true)) {
> 
> Probably a good idea to `MOZ_RELEASE_ASSERT(IsOnThread())` here too.

I'm not sure I understand. The goal of this function is to be safe to call from either the main thread or the hang monitor thread.
(In reply to Doug Thayer [:dthayer] from comment #11)
> I'm not sure I understand. The goal of this function is to be safe to call
> from either the main thread or the hang monitor thread.

Ah - right, sorry. In that case, maybe best to document that the method is (and should remain) threadsafe.
Comment on attachment 8960292 [details]
Bug 1443329 - Ensure we always call NotifyActivity on ForcePaint BHR

https://reviewboard.mozilla.org/r/229062/#review234868

Looks good! Thanks, dthayer.
Attachment #8960292 - Flags: review?(mconley) → review+
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d47df4e1481b
Don't notify BHR if ForcePaint loses its race r=mconley
https://hg.mozilla.org/integration/autoland/rev/ff98ae41c558
Ensure we always call NotifyActivity on ForcePaint BHR r=mconley
https://hg.mozilla.org/mozilla-central/rev/d47df4e1481b
https://hg.mozilla.org/mozilla-central/rev/ff98ae41c558
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.