Closed
Bug 970916
Opened 10 years ago
Closed 8 years ago
Compositor sometimes hangs in CompositorOGL::DrawQuad
Categories
(Core :: Graphics, defect, P5)
Tracking
()
People
(Reporter: snorp, Unassigned)
References
Details
According to hang reports (http://darchons.github.io/hang-telemetry-dashboard/bhr.html), we are frequently hanging the compositor in CompositorOGL::DrawQuad. I don't know the duration of time (tool is having trouble atm), but that shouldn't really happen.
Reporter | ||
Updated•10 years ago
|
status-firefox30:
--- → affected
tracking-firefox30:
--- → ?
Comment 1•10 years ago
|
||
CJ, what do you think about somebody from your team investigating this as a good Android introduction?
Flags: needinfo?(cku)
Reporter | ||
Comment 2•10 years ago
|
||
This might affect b2g too, we just don't have hang detection there yet.
Comment 3•10 years ago
|
||
Looks like we didn't enable OMTC on Android. Prepare environment to check.
Flags: needinfo?(pchang)
Comment 4•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #3) > Looks like we didn't enable OMTC on Android. Prepare environment to check. Why do you say that?
Comment 5•10 years ago
|
||
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #4) > (In reply to peter chang[:pchang][:peter] from comment #3) > > Looks like we didn't enable OMTC on Android. Prepare environment to check. > > Why do you say that? The reason I said because I saw the callstack of several cases from hung report system. And I though it was dump from main thread, but actually I'm wrong. [one hung case] Timer.Fire: Startup.XRE_Main: Gecko: [compositor hung case] CompositorOGL.DrawQuad: ThebesLayerComposite.RenderLayer: LayerManagerComposite.Render: CompositorParent.Composite: Compositor: Bug 725095 already enabled OMTC on fennec. BTW, are we able to know which line caused this hang problem?
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #5) > > BTW, are we able to know which line caused this hang problem? Not right now, and I doubt we'll ever have that information unless someone is able to reproduce it under gdb. Is that right, Jim?
Flags: needinfo?(nchen)
Comment 7•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #6) > (In reply to peter chang[:pchang][:peter] from comment #5) > > > > BTW, are we able to know which line caused this hang problem? > > Not right now, and I doubt we'll ever have that information unless someone > is able to reproduce it under gdb. Is that right, Jim? That's right for now, but once bug 938157 (new unwinding library) lands, we will be able to get frame-by-frame hang stacks. BTW, the hang times plot in the dashboard is now fixed.
Flags: needinfo?(nchen)
Comment 8•10 years ago
|
||
(In reply to Jim Chen [:jchen :nchen] from comment #7) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #6) > > (In reply to peter chang[:pchang][:peter] from comment #5) > > > > > > BTW, are we able to know which line caused this hang problem? > > > > Not right now, and I doubt we'll ever have that information unless someone > > is able to reproduce it under gdb. Is that right, Jim? > > That's right for now, but once bug 938157 (new unwinding library) lands, we > will be able to get frame-by-frame hang stacks. > > BTW, the hang times plot in the dashboard is now fixed. With the line info, it would be great. Jim, I got a question that how do we define app become "hang" and where is the thread stack from in comment 5. For fennec, is it from ANR log? If yes, how could I get the original ANR log? If so, where can I get the
Comment 9•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #8) > > Jim, I got a question that how do we define app become "hang" and where is > the thread stack from in comment 5. For fennec, is it from ANR log? If yes, > how could I get the original ANR log? For the Compositor thread, a hang is defined as any event that takes more than 128ms to execute. The stack is from the pesudo-stack used by the Gecko profiler [1]. It is an internal Gecko structure and it is not related to the ANR traces. [1] https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Native_stack_vs._Pseudo_stack
Comment 10•10 years ago
|
||
Could bug 974054 be related?
Comment 11•10 years ago
|
||
I just checked the last week data from now and didn't see the hang for DrawQuad. I will keep monitoring one more week to see this is still an issue or not.
Comment 12•10 years ago
|
||
Based on Comment 11 not going to track this now. Peter is the data in the next week changes to show the hang and you believe it needs tracking please re-nominate.
Reporter | ||
Comment 13•10 years ago
|
||
:jchen indicates that the data might not be terribly accurate this week, so I think we should continue looking into this bug.
Updated•10 years ago
|
Comment 14•10 years ago
|
||
Do you think we might have so more data to look at by EOW so we can make a decision on tracking?
Flags: needinfo?(snorp)
Reporter | ||
Comment 15•10 years ago
|
||
(In reply to Benjamin Kerensa [:bkerensa] from comment #14) > Do you think we might have so more data to look at by EOW so we can make a > decision on tracking? I don't really know. Jim, is that fixed now?
Flags: needinfo?(snorp) → needinfo?(nchen)
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Andreas Gal :gal from comment #10) > Could bug 974054 be related? Hah, that is funny. I wonder if that did indeed fix this.
Comment 17•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #15) > (In reply to Benjamin Kerensa [:bkerensa] from comment #14) > > Do you think we might have so more data to look at by EOW so we can make a > > decision on tracking? > > I don't really know. Jim, is that fixed now? Yeah it's mostly fixed now and data from last week are in. Unfortunately DrawQuad still seems to be the top hang for Fennec. Looking at the build ids, it's hard to see, but there doesn't seem to be a decrease in hangs after bug 974054 landed. :(
Flags: needinfo?(nchen)
Comment 18•10 years ago
|
||
James, I guess its unclear from Comment 17 whether this has resolved the issue enough that it no longer warrants tracking. Do you believe this should be a blocker?
Flags: needinfo?(snorp)
Reporter | ||
Comment 19•10 years ago
|
||
(In reply to Benjamin Kerensa [:bkerensa] from comment #18) > James, > > I guess its unclear from Comment 17 whether this has resolved the issue > enough that it no longer warrants tracking. Do you believe this should be a > blocker? We need to track this. Comment #17 is saying that the hang tool is fixed now, not that this bug is fixed. The latest hang data from last week indicates we still frequently have a hang here, with about half of them taking longer than 255ms.
Flags: needinfo?(snorp)
Updated•10 years ago
|
tracking-fennec: --- → 30+
Reporter | ||
Comment 21•10 years ago
|
||
Peter, are you working on this?
Assignee: nobody → pchang
tracking-fennec: 30+ → ---
Updated•10 years ago
|
tracking-fennec: --- → +
Updated•10 years ago
|
tracking-fennec: + → 30+
Comment 22•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #21) > Peter, are you working on this? James, I just created bug 983540 for b2g compositor high CPU usage. And the related CPU bottlenecks are matched the report from http://darchons.github.io/hang-telemetry-dashboard/bhr.html. Do you think they are related?
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #22) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #21) > > Peter, are you working on this? > > James, I just created bug 983540 for b2g compositor high CPU usage. And the > related CPU bottlenecks are matched the report from > http://darchons.github.io/hang-telemetry-dashboard/bhr.html. > > Do you think they are related? Possible, I suppose.
Comment 24•10 years ago
|
||
Assuming that Peter is indeed working on this. Please flip flags accordingly if that's not the case.
Status: NEW → ASSIGNED
Comment 25•10 years ago
|
||
(In reply to Jim Chen [:jchen :nchen] from comment #7) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #6) > > (In reply to peter chang[:pchang][:peter] from comment #5) > > > > > > BTW, are we able to know which line caused this hang problem? > > > > Not right now, and I doubt we'll ever have that information unless someone > > is able to reproduce it under gdb. Is that right, Jim? > > That's right for now, but once bug 938157 (new unwinding library) lands, we > will be able to get frame-by-frame hang stacks. > > BTW, the hang times plot in the dashboard is now fixed. I would like to wait for bug 938157 landed to see which line cause the hang issue. Because from Bug 983540 comment 0, the stack for DrawQuad looks like related to driver implementation, also the hang of CompositorOGL::EndFrame.
Comment 26•10 years ago
|
||
(In reply to peter chang[:pchang][:peter] from comment #25) > (In reply to Jim Chen [:jchen :nchen] from comment #7) > > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #6) > > > (In reply to peter chang[:pchang][:peter] from comment #5) > > > > > > > > BTW, are we able to know which line caused this hang problem? > > > > > > Not right now, and I doubt we'll ever have that information unless someone > > > is able to reproduce it under gdb. Is that right, Jim? > > > > That's right for now, but once bug 938157 (new unwinding library) lands, we > > will be able to get frame-by-frame hang stacks. > > > > BTW, the hang times plot in the dashboard is now fixed. > > I would like to wait for bug 938157 landed to see which line cause the hang > issue. > > Because from Bug 983540 comment 0, the stack for DrawQuad looks like related > to driver implementation, also the hang of CompositorOGL::EndFrame. Peter - bug 938157 landed to FF31 and we're a couple of weeks from shipping F30 - is this on your radar for this week?
Flags: needinfo?(pchang)
Comment 27•10 years ago
|
||
(In reply to Jim Chen [:jchen :nchen] from comment #7) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #6) > > (In reply to peter chang[:pchang][:peter] from comment #5) > > > > > > BTW, are we able to know which line caused this hang problem? > > > > Not right now, and I doubt we'll ever have that information unless someone > > is able to reproduce it under gdb. Is that right, Jim? > > That's right for now, but once bug 938157 (new unwinding library) lands, we > will be able to get frame-by-frame hang stacks. > > BTW, the hang times plot in the dashboard is now fixed. Jim, I just checked the latest hang report but I didn't see the lin info. Do you know where I can get the line info from hang report?
Flags: needinfo?(pchang) → needinfo?(nchen)
Comment 28•10 years ago
|
||
Sorry, we don't have native stack support yet. I just filed bug 1016629 and I'll be working on it soon.
Flags: needinfo?(nchen)
Comment 29•10 years ago
|
||
We're now too late to get this into Firefox 30. Marking affected for 31/32 and carrying forward tracking to try and target a landing in FF31.
status-firefox31:
--- → affected
status-firefox32:
--- → affected
tracking-firefox31:
--- → +
tracking-firefox32:
--- → +
Updated•10 years ago
|
tracking-fennec: 30+ → 31+
Comment 31•10 years ago
|
||
Not much progress so far, but I'm going back to working on it today.
Flags: needinfo?(nchen)
Comment 33•10 years ago
|
||
Untracking. We won't block the 31 release because of this bug. However, we could take a patch for 32.
Updated•10 years ago
|
tracking-fennec: 31+ → 32+
Comment 34•10 years ago
|
||
To really get this going, we at least need bug 1016629, which just landed on the trunk, so unless we can gleam as to why this is happening some other way, I think that means it's at least 34 where we could act on this?
Comment 35•10 years ago
|
||
I'm marking tracking- for Firefox 32/33 based on comment 34. With the current state, I think it's unrealistic to expect that we're going to ship a fix in 32. I have left 33 as affected in case there is a way to uplift.
status-firefox34:
--- → affected
tracking-firefox34:
--- → +
Comment 36•10 years ago
|
||
Looking at the latest BHR data, it seems the hangs are happening in the graphics driver, so I'm not sure there's much we can do at the moment.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(nchen)
Resolution: --- → WONTFIX
Comment 37•10 years ago
|
||
I'm not convinced we should just close this if there is a hang in the driver. Maybe we're getting the driver int a bad state or giving it bad input. Even if not, we should at least open a driver bug and point this to it. Unless the volume of hangs has gone down, and now we don't care?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 38•10 years ago
|
||
I couldn't load the result of last week from http://darchons.github.io/hang-telemetry-dashboard/bhr.html . Is anyone able to load above page?
Comment 39•10 years ago
|
||
The dashboard is now at http://telemetry.mozilla.org/hang/bhr/. So for CompositorOGL::DrawQuad, the native stack shows it's inside the graphics driver.
Reporter | ||
Updated•10 years ago
|
tracking-fennec: 32+ → +
Comment 41•10 years ago
|
||
This bug is now 9+ months old. We have shipped many releases with this bug. The Android team has assigned this priority P5. I don't see the value in continuing to track this bug.
tracking-firefox34:
+ → ---
Comment 42•10 years ago
|
||
Unassign myself since I don't have time to work on this bug now.
Assignee: pchang → nobody
Comment 43•9 years ago
|
||
It's now been a year since we stopped tracking this. I suggest this bug report should get closed if we're not going to fix it.
Comment 44•8 years ago
|
||
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #43) > It's now been a year since we stopped tracking this. I suggest this bug > report should get closed if we're not going to fix it. I am finally closing this bug report as it's been several months with no objections to the above. Please reopen if we're going to fix it.
Status: NEW → RESOLVED
Closed: 10 years ago → 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•