Open Bug 1063776 Opened 10 years ago Updated 2 years ago

Alerts and other modal dialogs freeze the browser quite often

Categories

(Core :: Graphics: Layers, defect, P3)

35 Branch
x86
macOS
defect

Tracking

()

Tracking Status
firefox35 --- wontfix
firefox36 --- wontfix
firefox37 --- wontfix

People

(Reporter: bzbarsky, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [gfx-noted])

Attachments

(3 files, 2 obsolete files)

Attached patch alerthangSplinter Review
[Tracking Requested - why for this release]:

STEPS TO REPRODUCE:

1)  Apply the attached patch.
2)  mach mochitest-plain content/html/content/test/test_img_attributes_reflection.html

EXPECTED RESULTS: Can click away the two alerts easily

ACTUAL RESULTS: The browser beachballs and you only regain control minutes later, if ever.

Sampling shows a bunch of painting and region operations, and a local bisect says:

The first bad revision is:
changeset:   203119:4effa50d8934
user:        Matt Woodrow <mwoodrow@mozilla.com>
date:        Fri Jul 25 12:49:45 2014 +1200
summary:     Bug 982338 - Enable tiling for OSX. r=Bas

Sadly, just starting the browser from that revision on a file with an alert() in it does not show the problem.  So I'm not sure what the important ingredients are here, but the above steps reproduce quite reliably for me.

Also, loading the "file with an alert" test in the browser "mach mochitest-plain" starts _does_ show the problem for me, so it's worth looking into what the test harness does that might be affected by the patch in question.
Blocks: osx-tiling
I can't reproduce this on OSX 10.9 :(

I see some painting time from within nsGlobalWindow::AlertOrConfirm, but it doesn't take long at all.

Do you have a gecko profiler profile that you could upload?

I guess for some reason we're running paints within the alert handler and not processing the click event? Not obvious why switching to tiling would affect that though.
> I can't reproduce this on OSX 10.9 :(

Huh.  I'm on 10.9.4, fwiw.

> Do you have a gecko profiler profile that you could upload?

How do I run the Gecko profiler inside the mochitest test run?

I could do an instruments profile if you want...

One other thing that might be relevant: this was a debug build.  Not sure whether you were trying to reproduce in debug or opt.
(In reply to Boris Zbarsky [:bz] from comment #2)
> > I can't reproduce this on OSX 10.9 :(
> 
> Huh.  I'm on 10.9.4, fwiw.

Same here.


> How do I run the Gecko profiler inside the mochitest test run?
> 
> I could do an instruments profile if you want...

I have no idea! Doing an instruments one would be fine too.

> 
> One other thing that might be relevant: this was a debug build.  Not sure
> whether you were trying to reproduce in debug or opt.

Debug here too.
Attached file Instruments trace
OK, some more data:

1)  I do _not_ see the issue in an opt build.
2)  In the debug build, while the alert is up the CPU is pegged (130%, actually).
3)  A profile (attached) shows a bunch of CPU time on two different threads, as follows:

Main thread:

A deep stack (through the Alert call, which spins the event loop), eventually landing in one of two places:

* About half under PresShell::Paint via the refresh driver being told a
  compositing transaction is complete.  This is looking like typical painting:
  about half display list construction and half actual painting.
* About half under PresShell::Paint via the refresh driver timer firing.  So
  it's all painting code.

A non-main thread:

A stack all under RunnableMethod<mozilla::layers::CompositorParent, void (mozilla::layers::CompositorParent::*)(), Tuple0>::Run() which is doing a bunch of RenderLayer on thebes layers and container layers.  The vast majority of this time is under TiledContentHost bits.
Interesting.

I'm seeing a lot of CPU usage while the alert is visible, both with and without tiling. It looks like we're still animating the throbber, and with a debug build doing that at 60fps is hard work.

Tiling has higher CPU usage, with a lot more time spent in the compositor. A lot of this appears to be within NS_LogAddRef. There's also just more overhead because we have more things to process with tiling.

The extra overhead is causing us to miss our 16.7ms refresh driver interval (we wait on the compositor if it's too slow), so a lot of our paints are being driven by the DidComposite message rather than the refresh driver timer.

It's not obvious why this would cause you to hang though. As far as I can tell the chromium IPC events (MessageLoop) stuff isn't treated any differently to normal events, so these shouldn't be interfering with click handling. The throttling code should also stop us from getting more than two DidComposite messages sent in a row without them being processed (which is what I see), so we shouldn't be backing up a huge queue of these.
Attached patch WIP patch (obsolete) — Splinter Review
WIP patch that addresses the extra compositor time, which (afaict) is the only change that tiling had on this bug.

The fact that this extra cpu time manages to stop firefox from responding to input events for you isn't fixed, but this might hide it again.
That patch definitely helps; I can click the alert away again.

I guess the reason we keep painting so much is because the throbber is running?  I can confirm that if I wait until after onload before doing the alert, then even without the "WIP patch" I don't get a hang.

Fwiw, with just the printfs from that patch I'm seeing paint intervals on the order of 40-50ms over here...
To be clear, I mean just the printfs from "WIP patch" plus the alert before onload testcase.
https://tbpl.mozilla.org/?tree=Try&rev=de68310f25f7

Looks like this is a win for TART/CART as well as improving debug build performance considerably.
Assignee: nobody → matt.woodrow
Attachment #8486820 - Flags: review?(bgirard)
Attachment #8486230 - Attachment is obsolete: true
Looks good! I assume most of the challenges I encountered when trying this in bug 940977 comment 4 don't apply any more? You'll probably still need to make some changes to include the titlebar in the invalid region if something changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS display is enabled, it probably wouldn't be a bad idea to include that as well.
I'm not sure how directly calling fScissor without ScopedScissorRect interacts with the existing uses of ScopedScissorRect. You may want to check that and leave a comment.
Comment on attachment 8486820 [details] [diff] [review]
Restrict OGL compositing to the invalid region

Review of attachment 8486820 [details] [diff] [review]:
-----------------------------------------------------------------

If we do this we have to be really sure about the invalid region. Markus correct points out that this will block debugging code.

Are you sure the invalid region is being calculated properly with async-video, OMTA, APZ?
(In reply to Markus Stange [:mstange] from comment #10)
> Looks good! I assume most of the challenges I encountered when trying this
> in bug 940977 comment 4 don't apply any more? You'll probably still need to
> make some changes to include the titlebar in the invalid region if something
> changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS
> display is enabled, it probably wouldn't be a bad idea to include that as
> well.
> I'm not sure how directly calling fScissor without ScopedScissorRect
> interacts with the existing uses of ScopedScissorRect. You may want to check
> that and leave a comment.

Some of them! Others still apply, I'll look into them.

(In reply to Benoit Girard (:BenWa) from comment #11)
> Comment on attachment 8486820 [details] [diff] [review]
> Restrict OGL compositing to the invalid region
> 
> Review of attachment 8486820 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If we do this we have to be really sure about the invalid region. Markus
> correct points out that this will block debugging code.
> 
> Are you sure the invalid region is being calculated properly with
> async-video, OMTA, APZ?

I'm not sure, no. But we do this for BasicCompositor, and HWC already uses some of the output from this, so if it's not correct then we need to fix it. It's certainly setup to handle compositor side changes, we'll find out if there are bugs.
> Looks good! I assume most of the challenges I encountered when trying this
> in bug 940977 comment 4 don't apply any more? You'll probably still need to
> make some changes to include the titlebar in the invalid region if something
> changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS
> display is enabled, it probably wouldn't be a bad idea to include that as
> well.
> I'm not sure how directly calling fScissor without ScopedScissorRect
> interacts with the existing uses of ScopedScissorRect. You may want to check
> that and leave a comment.

OSX titlebar drawing disables the scissor rect. I guess we could not do that, and make sure to include the titlebar in the dirty rect when it changes but I doubt its worth it.

GPU switching seems to work fine, I guess OSX preserves the contents of the framebuffer when switching.

ScopedScissorRect just does automatically restores the scissor rect when it goes out of scope. It shouldn't have any effect on us just setting the scissor in BeginFrame. I can add a comment if you like.

FPS display is broken! I'll fix that.
I take the bit about GPU switching back, had the wrong prefs set.
Hrm, looks like we don't even get a drawRect() call to nsChildView when we switch GPU.

I think we need to add a call to CGDisplayRegisterReconfigurationCallback (one per nsCocoaWindow? nsChildView?) and have the callback notify the compositor that we need to composite the entire window.

We have ClientLayerManager::SendInvalidRegion already for the latter.

Markus, does that sound reasonable?
Flags: needinfo?(mstange)
(In reply to Matt Woodrow (:mattwoodrow) from comment #13)
> OSX titlebar drawing disables the scissor rect.

Hmm. So the titlebar gets composited in whole on every composite, but the background behind the titlebar might not? Say there's an animation somewhere outside the titlebar. Will that slowly turn the anti-aliased pixels around the titlebar buttons opaque?

I have the same question for the other callers of glScissor, or at least for those that apply the scissor rect to the screen framebuffer. CompositorOGL::DrawQuad seems to be the only one, actually. Is the scissor rect that's set there always contained in the "root" scissor rect you're adding?

> ScopedScissorRect just does automatically restores the scissor rect when it
> goes out of scope. It shouldn't have any effect on us just setting the
> scissor in BeginFrame. I can add a comment if you like.

Somehow I was under the impression that it was encouraged to only change scissor rects through the scoped helpers, and that things might break if we don't do that. But ScopedScissorRect is obviously not an option in BeginFrame. And what you say makes sense, I don't see how setting the scissor rect directly can break anything. So this looks fine, and the comment is probably not needed.

(In reply to Matt Woodrow (:mattwoodrow) from comment #15)
> Hrm, looks like we don't even get a drawRect() call to nsChildView when we
> switch GPU.
> 
> I think we need to add a call to CGDisplayRegisterReconfigurationCallback
> (one per nsCocoaWindow? nsChildView?) and have the callback notify the
> compositor that we need to composite the entire window.
> 
> We have ClientLayerManager::SendInvalidRegion already for the latter.
> 
> Markus, does that sound reasonable?

Sounds very reasonable to me. One per nsChildView, I'd say.
Flags: needinfo?(mstange)
Ok, this version handles GPU switches fine, should handle titlebar changes (we request a drawRect call which will call SendInvalidRegion), and only changes behaviour on mac (to limit the potential regressions since I haven't tested it elsewhere).
Attachment #8486820 - Attachment is obsolete: true
Attachment #8486820 - Flags: review?(bgirard)
Attachment #8488414 - Flags: review?(mstange)
Attachment #8488414 - Flags: review?(bgirard)
Comment on attachment 8488414 [details] [diff] [review]
Restrict OGL compositing to the invalid region v2

Review of attachment 8488414 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/cocoa/nsChildView.mm
@@ +3719,5 @@
> +  nsIntRegion region = [self nativeDirtyRegionWithBoundingRect:aRect];
> +  if (mGeckoChild->GetLayerManager()->GetBackendType() == LayersBackend::LAYERS_CLIENT) {
> +    ClientLayerManager *client = static_cast<ClientLayerManager*>(mGeckoChild->GetLayerManager());
> +    client->SendInvalidRegion(region);
> +  }

Can you call this from nsChildView::NotifyDirtyRegion instead? I think there was a reason why calling NotifyDirtyRegion from viewWillDraw was better than calling it from drawRect, and the same reason probably applies here.
Also, does this make the WindowOverlayChanged unnecessary? You can remove that then.
Attachment #8488414 - Flags: review?(mstange) → review+
Attachment #8488414 - Flags: review?(bgirard) → review+
Is this ready to checkin?  We're getting on in the 35 beta cycle so need to know if this will be ready and low-risk enough to uplift by Mon Dec 29th.
Flags: needinfo?(mstange)
Flags: needinfo?(milan)
We have the two r+, but there is the comment 18 sounds potentially serious enough.  It would definitely have us miss the Beta though, by the time people come back and this gets sorted out, it'll be the end of next week.
Jeff, thoughts?
Flags: needinfo?(milan) → needinfo?(jmuizelaar)
I'm not sure this is still a problem.

Boris, can you still reproduce this?
Flags: needinfo?(jmuizelaar) → needinfo?(bzbarsky)
I was not able to reproduce on 10.9 with a debug build
Between comment 20 and 22 it sounds like waiting, if not considering this WFM, is best.  Wontfixing for 35.
Attachment #8485214 - Attachment is patch: true
Attachment #8485214 - Attachment mime type: message/rfc822 → text/plain
I can't reproduce anymore either.  It's possible one of the many other followups to bug 982338 helped here...
Flags: needinfo?(bzbarsky)
At some point we decided to not go ahead with the approach in this bug because it's very brittle and because it doesn't fix the real bug. I'll mark the patches as obsolete.
Why is it brittle? Because we're making assumptions about the contents of the window framebuffer at the time of compositing. Our logic will break as soon as the GL context doesn't use exactly two buffers for double-buffering and alternatingly makes us paint into each one. Consulting GLX_EXT_buffer_age instead of blindly assuming double-bufferedness would give us more certainty here, but it's unclear whether it's worth implementing.
What's the real bug? Something to do with our Mac event loop. Bug 1102302 has much better steps to reproduce for a similar bug that involves a hang during a modal dialog.
Flags: needinfo?(mstange)
Comment on attachment 8488414 [details] [diff] [review]
Restrict OGL compositing to the invalid region v2

see comment 25
Attachment #8488414 - Flags: review+ → review-
Seems like it is also a wontfix for 36 & 37!
Matt, any update here? Seems like this stalled 10 months ago.
Flags: needinfo?(matt.woodrow)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #28)
> Matt, any update here? Seems like this stalled 10 months ago.

Not really, Comment 25 still reflects the current state afaik.
Flags: needinfo?(matt.woodrow)
Whiteboard: [gfx-noted]
Version: unspecified → 35 Branch

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: matt.woodrow → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: