Closed
Bug 947227
Opened 12 years ago
Closed 11 years ago
Hang in mozilla::gfx::SurfaceStream_TripleBuffer_Async::WaitForCompositor()
Categories
(Core :: Graphics: CanvasWebGL, defect)
Core
Graphics: CanvasWebGL
Tracking
()
People
(Reporter: BenWa, Assigned: mattwoodrow)
References
Details
Attachments
(4 files)
116.32 KB,
text/plain
|
Details | |
894 bytes,
patch
|
snorp
:
review+
jgilbert
:
review-
snorp
:
feedback+
|
Details | Diff | Splinter Review |
992 bytes,
patch
|
jgilbert
:
review+
Sylvestre
:
approval-mozilla-aurora+
bajaj
:
approval-mozilla-b2g28+
|
Details | Diff | Splinter Review |
3.08 KB,
patch
|
jgilbert
:
review+
snorp
:
review+
|
Details | Diff | Splinter Review |
See attachment for where the threads are during the hang. The main thread is waiting for the compositor but the compositor is waiting for events. So we're waiting for mStaging. Either the compositor doesn't signal mMonitor or the compositor never clears it.
Reporter | ||
Comment 1•12 years ago
|
||
Flagging as tracking. Hangs are bad and I triggered this a few times while doing a lot of webgl.
tracking-firefox29:
--- → ?
Reporter | ||
Comment 2•12 years ago
|
||
I just got this again with the webgl app in the background.
Comment 4•12 years ago
|
||
Bug 944823 has stacks of all threads. Happens on linux with GL layers, too.
OS: Mac OS X → All
Hardware: x86 → All
Updated•12 years ago
|
Keywords: regressionwindow-wanted,
steps-wanted
Comment 5•12 years ago
|
||
The SurfaceStream has a frame that needs compositing in mStaging, but the compositor is not working on that. It looks like maybe the layer transaction didn't happen, for whatever reason.
Comment 6•12 years ago
|
||
Is this happening with all WebGL apps or just some? Trying to better understand how big of an impact this will have on our users.
Flags: needinfo?(snorp)
Comment 7•12 years ago
|
||
(In reply to Benjamin Kerensa [:bkerensa] from comment #6)
> Is this happening with all WebGL apps or just some? Trying to better
> understand how big of an impact this will have on our users.
It doesn't look specific at all. I just got it during webgl mochitest in a try run.
Flags: needinfo?(snorp)
Comment 8•12 years ago
|
||
We will track this since the impact on WebGL apps is high.
Updated•12 years ago
|
status-firefox29:
--- → affected
Assignee | ||
Comment 9•11 years ago
|
||
I just hit this twice in the last two days.
I think it occurs only when WebGL is running in a background tab.
I guess the canvas is still producing frames, but the compositing isn't reading them since it's not displaying that tab. Or something along those lines.
I guess ideally we'd make sure this never happens, but maybe we should add a time limit for waiting on the compositor to prevent hanging.
Reporter | ||
Comment 10•11 years ago
|
||
For me it mostly happens when I had weblg in a background tab and tried to switch back to the tab. I received a hang before the first frame of the tab was shown on the screen.
This happened a lot while developing a small webgl app, say every 10-15 mins as I would switch tab to look up documentation.
Assignee | ||
Comment 11•11 years ago
|
||
What do you think?
Attachment #8377276 -
Flags: review?(jgilbert)
Comment 12•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Review of attachment 8377276 [details] [diff] [review]:
-----------------------------------------------------------------
It looks like this should work. Its caller seems to handle it fine. But, doesn't this code get executed on the main thread? Won't we be blocking that?
Comment 13•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Maybe this is ok for now, but we should really get on bug 854421 for a proper fix.
Attachment #8377276 -
Flags: feedback+
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #12)
> Comment on attachment 8377276 [details] [diff] [review]
> Don't hang
>
> Review of attachment 8377276 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> It looks like this should work. Its caller seems to handle it fine. But,
> doesn't this code get executed on the main thread? Won't we be blocking that?
Yes, this happens on the main thread. But this bug is about where we get in a bad state and block the main thread *forever*. This patch just caps the hang to 100ms and makes us throw a frame away instead (which seems like a huge improvement in experience).
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #13)
> Comment on attachment 8377276 [details] [diff] [review]
> Don't hang
>
> Maybe this is ok for now, but we should really get on bug 854421 for a
> proper fix.
We should, but It doesn't really affect this bug. This is about us hanging forever, not just waiting for the compositor to catch up.
Comment 15•11 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> (In reply to Jeff Gilbert [:jgilbert] from comment #12)
> > Comment on attachment 8377276 [details] [diff] [review]
> > Don't hang
> >
> > Review of attachment 8377276 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > It looks like this should work. Its caller seems to handle it fine. But,
> > doesn't this code get executed on the main thread? Won't we be blocking that?
>
> Yes, this happens on the main thread. But this bug is about where we get in
> a bad state and block the main thread *forever*. This patch just caps the
> hang to 100ms and makes us throw a frame away instead (which seems like a
> huge improvement in experience).
>
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #13)
> > Comment on attachment 8377276 [details] [diff] [review]
> > Don't hang
> >
> > Maybe this is ok for now, but we should really get on bug 854421 for a
> > proper fix.
>
> We should, but It doesn't really affect this bug. This is about us hanging
> forever, not just waiting for the compositor to catch up.
We should take the extra day or so to figure out how to actually solve this, rather than band-aiding it with something this kludgey. The simplest fix is actually just to use the old TripleBuffer SurfStream, which doesn't bother with this. Is there a reason we don't want to do that?
Assignee | ||
Comment 16•11 years ago
|
||
Well then we won't have anything stopping us from overproducing. But yes, maybe that is the right thing to do, and we can just prevent overcompositing with bug 854421.
Assignee | ||
Comment 17•11 years ago
|
||
Attachment #8377276 -
Attachment is obsolete: true
Attachment #8377276 -
Flags: review?(jgilbert)
Attachment #8378724 -
Flags: review?(jgilbert)
Assignee | ||
Comment 18•11 years ago
|
||
Nothing uses this now, any reason to keep it around?
Attachment #8378725 -
Flags: review?(jgilbert)
Updated•11 years ago
|
Attachment #8378724 -
Flags: review?(jgilbert) → review+
Updated•11 years ago
|
Attachment #8378725 -
Flags: review?(snorp)
Attachment #8378725 -
Flags: review?(jgilbert)
Attachment #8378725 -
Flags: review+
Comment 19•11 years ago
|
||
Let's check with snorp.
Comment 20•11 years ago
|
||
Comment on attachment 8378725 [details] [diff] [review]
Remove Async surface stream
Review of attachment 8378725 [details] [diff] [review]:
-----------------------------------------------------------------
You can remove SurfaceStream_TripleBuffer::WaitForCompositor() too
Attachment #8378725 -
Flags: review?(snorp) → review+
Assignee | ||
Comment 21•11 years ago
|
||
Comment 22•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/6db7636b8b19
https://hg.mozilla.org/mozilla-central/rev/42996f3b6b75
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment 23•11 years ago
|
||
Matt, could you fill the uplift request to make sure it is in 29? Thanks
Flags: needinfo?(matt.woodrow)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 24•11 years ago
|
||
Comment on attachment 8378724 [details] [diff] [review]
Stop using the Async surface stream
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unsure
User impact if declined: Hangs when using WebGL in a background tab
Testing completed (on m-c, etc.): Been on m-c for a week.
Risk to taking this patch (and alternatives if risky): Low risk, just stops us waiting on the compositor.
String or IDL/UUID changes made by this patch: None
Attachment #8378724 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
Attachment #8378724 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 25•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/8191c8499760
https://hg.mozilla.org/releases/mozilla-aurora/rev/396c0660cfda
status-firefox30:
--- → fixed
Comment 26•11 years ago
|
||
Flagging for verification in case the QA team could reproduce this issue.
Keywords: verifyme
Comment 27•11 years ago
|
||
This seems to have caused a significant regression for pages such as http://azarask.in/screensaver/shaders/ (note: unless Aza has updated that page to not be webkit only, you will need to paste this into the console for it to run https://pastebin.mozilla.org/4409399 - it replaces the WebAudio calls to the standard Firefox supports)
Nightly from 2/21 runs it "fine" at something like 25-30 fps on my computer, while the one from 2/22 can only run it at around 1-3 fps
Comment 29•11 years ago
|
||
According to https://bugzilla.mozilla.org/show_bug.cgi?id=974807#c95 this patch is required in order to resolve bug 974807. That bug is 1.3+, so I'm nominating this bug as well.
blocking-b2g: --- → 1.3?
Comment 30•11 years ago
|
||
Matt,
Could you comment on the riskiness of this patch and whether it would be safe to uplift to 1.3 to fix bug 974807?
Flags: needinfo?(matt.woodrow)
Updated•11 years ago
|
Keywords: regressionwindow-wanted,
steps-wanted
Updated•11 years ago
|
blocking-b2g: 1.3? → 1.3+
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to David Flanagan [:djf] from comment #30)
> Matt,
>
> Could you comment on the riskiness of this patch and whether it would be
> safe to uplift to 1.3 to fix bug 974807?
Yes, it should be safe to uplift, it's very low risk.
Flags: needinfo?(matt.woodrow)
Comment 33•11 years ago
|
||
Please request approval-mozilla-b2g28 on this patch if it's ready for uplift to v1.3.
Assignee | ||
Comment 34•11 years ago
|
||
Comment on attachment 8378724 [details] [diff] [review]
Stop using the Async surface stream
NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unsure
User impact if declined: Hangs when using WebGL in a background tab
Testing completed (on m-c, etc.): Been on m-c for a week.
Risk to taking this patch (and alternatives if risky): Low risk, just stops us waiting on the compositor.
String or IDL/UUID changes made by this patch: None
Attachment #8378724 -
Flags: approval-mozilla-b2g28?
Flags: needinfo?(matt.woodrow)
Updated•11 years ago
|
Attachment #8378724 -
Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Comment 35•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/293d64f74d46
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/0900d08ed8c7
status-firefox28:
--- → wontfix
Updated•11 years ago
|
status-b2g-v1.3T:
--- → fixed
Comment 36•11 years ago
|
||
I tried reproducing the bug using the Firefox 28 release without success.
Without proper steps to reproduce the bug I'm unable to verify the bug.
Keywords: verifyme
Assignee | ||
Comment 37•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Do you want to review this, or should we wait for Jeff?
Attachment #8377276 -
Attachment is obsolete: false
Attachment #8377276 -
Flags: review?(snorp)
Comment 38•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Review of attachment 8377276 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good to me, but can't hurt to have jgilbert weigh in.
Attachment #8377276 -
Flags: review?(snorp)
Attachment #8377276 -
Flags: review?(jgilbert)
Attachment #8377276 -
Flags: review+
Comment 39•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Review of attachment 8377276 [details] [diff] [review]:
-----------------------------------------------------------------
Important nit.
::: gfx/gl/SurfaceStream.cpp
@@ +459,5 @@
> PROFILER_LABEL("SurfaceStream_TripleBuffer_Async", "WaitForCompositor");
>
> // We are assumed to be locked
> + while (mStaging) {
> + if (!NS_SUCCEEDED(mMonitor.Wait(PR_MillisecondsToInterval(100)))) {
Please comment that Monitor::Wait returns FAILURE on timeout or interrupt. A number of related functions return SUCCESS on timeout.
Attachment #8377276 -
Flags: review?(jgilbert) → review+
Comment 40•11 years ago
|
||
Comment on attachment 8377276 [details] [diff] [review]
Don't hang
Review of attachment 8377276 [details] [diff] [review]:
-----------------------------------------------------------------
Actually, this is confusing enough I'm going to r- it in this bug. Make a new bug for this, please!
Attachment #8377276 -
Flags: review+ → review-
Comment 41•11 years ago
|
||
In particular, I can no longer find where we use or would use WaitForCompositor(), so I can't check its usage.
Assignee | ||
Comment 42•11 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #41)
> In particular, I can no longer find where we use or would use
> WaitForCompositor(), so I can't check its usage.
This is because we deleted all that code...
We'd have to back out the changes made by this bug thus far, and then apply the new patch.
Assignee | ||
Comment 43•11 years ago
|
||
Also, we have about a week to get this landed and uplifted to beta before we end up shipping a severe WebGL performance regression.
Flags: needinfo?(jgilbert)
Comment 44•11 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #43)
> Also, we have about a week to get this landed and uplifted to beta before we
> end up shipping a severe WebGL performance regression.
Where is the code though? I don't see it in any of the patches in this bug. It's hard to r+ this if I can't have the whole cset here in front of me. In particular, the callers of WaitForCompositor() will need to handle a non-true retval properly, and we've had issues with callers ignoring 'always true' retvals in the past in code very similar to this.
It doesn't help anyone if the fix here doesn't end up fixing the problem. This patch *by itself* will not fix the problem if WaitForCompositor is never called, or not called properly. This is what the patch r?me'd does, so I don't feel safe r+ing it.
In general, it's confusing to do fixes to regressions inside the same bug that caused them. The normal procedure is to make a new bug for resolving the regression. This would keep things *much* clearer here, as well. This would also make it clear that I need an actual cset to review, not just half a cset and promise of a correct backout. :)
We can both fix this correctly, and fix it quickly.
Flags: needinfo?(jgilbert)
Assignee | ||
Comment 45•11 years ago
|
||
I'm not convinced that backing the entirety of this bug out and starting out is the same as fixing a regression, but filed bug 995065.
Comment 46•11 years ago
|
||
This was backed out from Aurora30 and Beta29 in bug 995065.
status-b2g-v2.0:
--- → fixed
status-firefox31:
--- → fixed
Comment 47•11 years ago
|
||
And b2g28.
You need to log in
before you can comment on or make changes to this bug.
Description
•