Open
Bug 1063776
Opened 10 years ago
Updated 2 years ago
Alerts and other modal dialogs freeze the browser quite often
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
NEW
People
(Reporter: bzbarsky, Unassigned)
References
Details
(Keywords: regression, Whiteboard: [gfx-noted])
Attachments
(3 files, 2 obsolete files)
546 bytes,
patch
|
Details | Diff | Splinter Review | |
3.59 MB,
application/x-bzip2
|
Details | |
20.65 KB,
patch
|
mstange
:
review-
BenWa
:
review+
|
Details | Diff | Splinter Review |
[Tracking Requested - why for this release]: STEPS TO REPRODUCE: 1) Apply the attached patch. 2) mach mochitest-plain content/html/content/test/test_img_attributes_reflection.html EXPECTED RESULTS: Can click away the two alerts easily ACTUAL RESULTS: The browser beachballs and you only regain control minutes later, if ever. Sampling shows a bunch of painting and region operations, and a local bisect says: The first bad revision is: changeset: 203119:4effa50d8934 user: Matt Woodrow <mwoodrow@mozilla.com> date: Fri Jul 25 12:49:45 2014 +1200 summary: Bug 982338 - Enable tiling for OSX. r=Bas Sadly, just starting the browser from that revision on a file with an alert() in it does not show the problem. So I'm not sure what the important ingredients are here, but the above steps reproduce quite reliably for me. Also, loading the "file with an alert" test in the browser "mach mochitest-plain" starts _does_ show the problem for me, so it's worth looking into what the test harness does that might be affected by the patch in question.
Reporter | ||
Updated•10 years ago
|
Blocks: osx-tiling
Comment 1•10 years ago
|
||
I can't reproduce this on OSX 10.9 :( I see some painting time from within nsGlobalWindow::AlertOrConfirm, but it doesn't take long at all. Do you have a gecko profiler profile that you could upload? I guess for some reason we're running paints within the alert handler and not processing the click event? Not obvious why switching to tiling would affect that though.
Reporter | ||
Comment 2•10 years ago
|
||
> I can't reproduce this on OSX 10.9 :( Huh. I'm on 10.9.4, fwiw. > Do you have a gecko profiler profile that you could upload? How do I run the Gecko profiler inside the mochitest test run? I could do an instruments profile if you want... One other thing that might be relevant: this was a debug build. Not sure whether you were trying to reproduce in debug or opt.
Comment 3•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #2) > > I can't reproduce this on OSX 10.9 :( > > Huh. I'm on 10.9.4, fwiw. Same here. > How do I run the Gecko profiler inside the mochitest test run? > > I could do an instruments profile if you want... I have no idea! Doing an instruments one would be fine too. > > One other thing that might be relevant: this was a debug build. Not sure > whether you were trying to reproduce in debug or opt. Debug here too.
Reporter | ||
Comment 4•10 years ago
|
||
OK, some more data: 1) I do _not_ see the issue in an opt build. 2) In the debug build, while the alert is up the CPU is pegged (130%, actually). 3) A profile (attached) shows a bunch of CPU time on two different threads, as follows: Main thread: A deep stack (through the Alert call, which spins the event loop), eventually landing in one of two places: * About half under PresShell::Paint via the refresh driver being told a compositing transaction is complete. This is looking like typical painting: about half display list construction and half actual painting. * About half under PresShell::Paint via the refresh driver timer firing. So it's all painting code. A non-main thread: A stack all under RunnableMethod<mozilla::layers::CompositorParent, void (mozilla::layers::CompositorParent::*)(), Tuple0>::Run() which is doing a bunch of RenderLayer on thebes layers and container layers. The vast majority of this time is under TiledContentHost bits.
Comment 5•10 years ago
|
||
Interesting. I'm seeing a lot of CPU usage while the alert is visible, both with and without tiling. It looks like we're still animating the throbber, and with a debug build doing that at 60fps is hard work. Tiling has higher CPU usage, with a lot more time spent in the compositor. A lot of this appears to be within NS_LogAddRef. There's also just more overhead because we have more things to process with tiling. The extra overhead is causing us to miss our 16.7ms refresh driver interval (we wait on the compositor if it's too slow), so a lot of our paints are being driven by the DidComposite message rather than the refresh driver timer. It's not obvious why this would cause you to hang though. As far as I can tell the chromium IPC events (MessageLoop) stuff isn't treated any differently to normal events, so these shouldn't be interfering with click handling. The throttling code should also stop us from getting more than two DidComposite messages sent in a row without them being processed (which is what I see), so we shouldn't be backing up a huge queue of these.
Comment 6•10 years ago
|
||
WIP patch that addresses the extra compositor time, which (afaict) is the only change that tiling had on this bug. The fact that this extra cpu time manages to stop firefox from responding to input events for you isn't fixed, but this might hide it again.
Reporter | ||
Comment 7•10 years ago
|
||
That patch definitely helps; I can click the alert away again. I guess the reason we keep painting so much is because the throbber is running? I can confirm that if I wait until after onload before doing the alert, then even without the "WIP patch" I don't get a hang. Fwiw, with just the printfs from that patch I'm seeing paint intervals on the order of 40-50ms over here...
Reporter | ||
Comment 8•10 years ago
|
||
To be clear, I mean just the printfs from "WIP patch" plus the alert before onload testcase.
Comment 9•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=de68310f25f7 Looks like this is a win for TART/CART as well as improving debug build performance considerably.
Assignee: nobody → matt.woodrow
Attachment #8486820 -
Flags: review?(bgirard)
Updated•10 years ago
|
Attachment #8486230 -
Attachment is obsolete: true
Comment 10•10 years ago
|
||
Looks good! I assume most of the challenges I encountered when trying this in bug 940977 comment 4 don't apply any more? You'll probably still need to make some changes to include the titlebar in the invalid region if something changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS display is enabled, it probably wouldn't be a bad idea to include that as well. I'm not sure how directly calling fScissor without ScopedScissorRect interacts with the existing uses of ScopedScissorRect. You may want to check that and leave a comment.
Comment 11•10 years ago
|
||
Comment on attachment 8486820 [details] [diff] [review] Restrict OGL compositing to the invalid region Review of attachment 8486820 [details] [diff] [review]: ----------------------------------------------------------------- If we do this we have to be really sure about the invalid region. Markus correct points out that this will block debugging code. Are you sure the invalid region is being calculated properly with async-video, OMTA, APZ?
Comment 12•10 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #10) > Looks good! I assume most of the challenges I encountered when trying this > in bug 940977 comment 4 don't apply any more? You'll probably still need to > make some changes to include the titlebar in the invalid region if something > changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS > display is enabled, it probably wouldn't be a bad idea to include that as > well. > I'm not sure how directly calling fScissor without ScopedScissorRect > interacts with the existing uses of ScopedScissorRect. You may want to check > that and leave a comment. Some of them! Others still apply, I'll look into them. (In reply to Benoit Girard (:BenWa) from comment #11) > Comment on attachment 8486820 [details] [diff] [review] > Restrict OGL compositing to the invalid region > > Review of attachment 8486820 [details] [diff] [review]: > ----------------------------------------------------------------- > > If we do this we have to be really sure about the invalid region. Markus > correct points out that this will block debugging code. > > Are you sure the invalid region is being calculated properly with > async-video, OMTA, APZ? I'm not sure, no. But we do this for BasicCompositor, and HWC already uses some of the output from this, so if it's not correct then we need to fix it. It's certainly setup to handle compositor side changes, we'll find out if there are bugs.
Comment 13•10 years ago
|
||
> Looks good! I assume most of the challenges I encountered when trying this
> in bug 940977 comment 4 don't apply any more? You'll probably still need to
> make some changes to include the titlebar in the invalid region if something
> changed there (ShadowLayerForwarder::mWindowOverlayChanged). And if the FPS
> display is enabled, it probably wouldn't be a bad idea to include that as
> well.
> I'm not sure how directly calling fScissor without ScopedScissorRect
> interacts with the existing uses of ScopedScissorRect. You may want to check
> that and leave a comment.
OSX titlebar drawing disables the scissor rect. I guess we could not do that, and make sure to include the titlebar in the dirty rect when it changes but I doubt its worth it.
GPU switching seems to work fine, I guess OSX preserves the contents of the framebuffer when switching.
ScopedScissorRect just does automatically restores the scissor rect when it goes out of scope. It shouldn't have any effect on us just setting the scissor in BeginFrame. I can add a comment if you like.
FPS display is broken! I'll fix that.
Comment 14•10 years ago
|
||
I take the bit about GPU switching back, had the wrong prefs set.
Comment 15•10 years ago
|
||
Hrm, looks like we don't even get a drawRect() call to nsChildView when we switch GPU. I think we need to add a call to CGDisplayRegisterReconfigurationCallback (one per nsCocoaWindow? nsChildView?) and have the callback notify the compositor that we need to composite the entire window. We have ClientLayerManager::SendInvalidRegion already for the latter. Markus, does that sound reasonable?
Flags: needinfo?(mstange)
Comment 16•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #13) > OSX titlebar drawing disables the scissor rect. Hmm. So the titlebar gets composited in whole on every composite, but the background behind the titlebar might not? Say there's an animation somewhere outside the titlebar. Will that slowly turn the anti-aliased pixels around the titlebar buttons opaque? I have the same question for the other callers of glScissor, or at least for those that apply the scissor rect to the screen framebuffer. CompositorOGL::DrawQuad seems to be the only one, actually. Is the scissor rect that's set there always contained in the "root" scissor rect you're adding? > ScopedScissorRect just does automatically restores the scissor rect when it > goes out of scope. It shouldn't have any effect on us just setting the > scissor in BeginFrame. I can add a comment if you like. Somehow I was under the impression that it was encouraged to only change scissor rects through the scoped helpers, and that things might break if we don't do that. But ScopedScissorRect is obviously not an option in BeginFrame. And what you say makes sense, I don't see how setting the scissor rect directly can break anything. So this looks fine, and the comment is probably not needed. (In reply to Matt Woodrow (:mattwoodrow) from comment #15) > Hrm, looks like we don't even get a drawRect() call to nsChildView when we > switch GPU. > > I think we need to add a call to CGDisplayRegisterReconfigurationCallback > (one per nsCocoaWindow? nsChildView?) and have the callback notify the > compositor that we need to composite the entire window. > > We have ClientLayerManager::SendInvalidRegion already for the latter. > > Markus, does that sound reasonable? Sounds very reasonable to me. One per nsChildView, I'd say.
Flags: needinfo?(mstange)
Comment 17•10 years ago
|
||
Ok, this version handles GPU switches fine, should handle titlebar changes (we request a drawRect call which will call SendInvalidRegion), and only changes behaviour on mac (to limit the potential regressions since I haven't tested it elsewhere).
Attachment #8486820 -
Attachment is obsolete: true
Attachment #8486820 -
Flags: review?(bgirard)
Attachment #8488414 -
Flags: review?(mstange)
Updated•10 years ago
|
Attachment #8488414 -
Flags: review?(bgirard)
Comment 18•10 years ago
|
||
Comment on attachment 8488414 [details] [diff] [review] Restrict OGL compositing to the invalid region v2 Review of attachment 8488414 [details] [diff] [review]: ----------------------------------------------------------------- ::: widget/cocoa/nsChildView.mm @@ +3719,5 @@ > + nsIntRegion region = [self nativeDirtyRegionWithBoundingRect:aRect]; > + if (mGeckoChild->GetLayerManager()->GetBackendType() == LayersBackend::LAYERS_CLIENT) { > + ClientLayerManager *client = static_cast<ClientLayerManager*>(mGeckoChild->GetLayerManager()); > + client->SendInvalidRegion(region); > + } Can you call this from nsChildView::NotifyDirtyRegion instead? I think there was a reason why calling NotifyDirtyRegion from viewWillDraw was better than calling it from drawRect, and the same reason probably applies here. Also, does this make the WindowOverlayChanged unnecessary? You can remove that then.
Attachment #8488414 -
Flags: review?(mstange) → review+
Updated•10 years ago
|
Attachment #8488414 -
Flags: review?(bgirard) → review+
Updated•10 years ago
|
Comment 19•10 years ago
|
||
Is this ready to checkin? We're getting on in the 35 beta cycle so need to know if this will be ready and low-risk enough to uplift by Mon Dec 29th.
status-firefox35:
--- → affected
status-firefox36:
--- → affected
status-firefox37:
--- → affected
tracking-firefox36:
--- → +
tracking-firefox37:
--- → +
Flags: needinfo?(mstange)
Flags: needinfo?(milan)
We have the two r+, but there is the comment 18 sounds potentially serious enough. It would definitely have us miss the Beta though, by the time people come back and this gets sorted out, it'll be the end of next week. Jeff, thoughts?
Flags: needinfo?(milan) → needinfo?(jmuizelaar)
Comment 21•9 years ago
|
||
I'm not sure this is still a problem. Boris, can you still reproduce this?
Flags: needinfo?(jmuizelaar) → needinfo?(bzbarsky)
Comment 22•9 years ago
|
||
I was not able to reproduce on 10.9 with a debug build
Comment 23•9 years ago
|
||
Between comment 20 and 22 it sounds like waiting, if not considering this WFM, is best. Wontfixing for 35.
Reporter | ||
Updated•9 years ago
|
Attachment #8485214 -
Attachment is patch: true
Attachment #8485214 -
Attachment mime type: message/rfc822 → text/plain
Reporter | ||
Comment 24•9 years ago
|
||
I can't reproduce anymore either. It's possible one of the many other followups to bug 982338 helped here...
Flags: needinfo?(bzbarsky)
Comment 25•9 years ago
|
||
At some point we decided to not go ahead with the approach in this bug because it's very brittle and because it doesn't fix the real bug. I'll mark the patches as obsolete. Why is it brittle? Because we're making assumptions about the contents of the window framebuffer at the time of compositing. Our logic will break as soon as the GL context doesn't use exactly two buffers for double-buffering and alternatingly makes us paint into each one. Consulting GLX_EXT_buffer_age instead of blindly assuming double-bufferedness would give us more certainty here, but it's unclear whether it's worth implementing. What's the real bug? Something to do with our Mac event loop. Bug 1102302 has much better steps to reproduce for a similar bug that involves a hang during a modal dialog.
Flags: needinfo?(mstange)
Comment 26•9 years ago
|
||
Comment on attachment 8488414 [details] [diff] [review] Restrict OGL compositing to the invalid region v2 see comment 25
Attachment #8488414 -
Flags: review+ → review-
Comment 27•9 years ago
|
||
Seems like it is also a wontfix for 36 & 37!
Comment 28•9 years ago
|
||
Matt, any update here? Seems like this stalled 10 months ago.
Flags: needinfo?(matt.woodrow)
Comment 29•9 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #28) > Matt, any update here? Seems like this stalled 10 months ago. Not really, Comment 25 still reflects the current state afaik.
Flags: needinfo?(matt.woodrow)
Updated•8 years ago
|
Version: unspecified → 35 Branch
Updated•6 years ago
|
Priority: -- → P3
Comment 30•2 years ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Assignee: matt.woodrow → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•