Closed
Bug 921323
Opened 11 years ago
Closed 11 years ago
crash in OpenGL@0x32af (MakeCurrent from OMTC)
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla27
People
(Reporter: BenWa, Assigned: mstange)
References
Details
(Keywords: crash, topcrash-mac)
Crash Data
Attachments
(1 file)
798 bytes,
patch
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-0b8faad2-fab1-48a2-bf30-aa22e2130926.
=============================================================
Comment 1•11 years ago
|
||
I think this a dup of bug 899044.
Lots of different stack traces, but always crashing in MakeCurrent.
Reporter | ||
Comment 2•11 years ago
|
||
Probably but it points back to the fullscreen crash on 10.6 for which the test is disabled on 10.7+. We're not handling some of the widget changes to the windows and it seems to be messing with our context. It's preventing us from shipping OMTC on 10.6, giving us random test failures on orange and is a mac top crasher.
Comment 3•11 years ago
|
||
Can you explain how it leads to the fullscreen crash for 10.6? That's probably an important link, but I don't see it :)
Reporter | ||
Comment 4•11 years ago
|
||
I'm going into 890997 but here it goes anyways:
Looks for call to DestroyNativeWindow:
http://mxr.mozilla.org/mozilla-central/search?string=DestroyNativeWindow
For example we call that from here:
http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsCocoaWindow.mm#539
I'm guessing destroying the window without syncing with the compositor that has a context attached to that window isn't good. I think we need to crack down on these problems cause they aren't only affecting 10.6. We also probably want to audit for more bad things else where. Perhaps we can find the internal symbol for when a context gets dettached from the widget and trace the points where it happens.
Comment 5•11 years ago
|
||
Right. I was more wondering if there was any evidence to suggest that is what is causing this crash bug.
Comment 6•11 years ago
|
||
I can't find any correlation between the crashes and windows closing, but if that is the case, then this should help.
Attachment #810944 -
Flags: review?(bgirard)
Reporter | ||
Comment 7•11 years ago
|
||
Note that entering fullscreen or changing some the window properties recreate the window.
I'm not sure I understand how this patch is trying to help. Can you explain how you're hoping it will work. In the case where we temporarily destroy the widget we should aim to just unwind and rebind our GLContext from the old to the new widget instead of destroying and recreating all our resources.
Comment 8•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #7)
> Note that entering fullscreen or changing some the window properties
> recreate the window.
>
> I'm not sure I understand how this patch is trying to help. Can you explain
> how you're hoping it will work. In the case where we temporarily destroy the
> widget we should aim to just unwind and rebind our GLContext from the old to
> the new widget instead of destroying and recreating all our resources.
That function gets called on nsChildView children of the nsCocoaWindow when we call DestroyNativeWindow. My change makes us synchronously tear down the compositor/GLContext.
I'm much more interested in not crashing than I am in optimal behaviour at this point.
Reporter | ||
Comment 9•11 years ago
|
||
At the risk of being unreasonable (in which case I'd be ok if you found someone else to review this change), I feel like this change is just a shot in the dark without studying the problem for which we have can reliably reproduce. Not only is it a shot in the dark but it's a fairly invasive change with poor justification.
In the future it's going to be hard to maintain because it's a hack. Why is this compositor specific? Why isn't it going through the shutdown part like everything else? With this poor understanding we wont be able to remove/maintain this line because we don't even understand why it's there in the first place.
(In reply to Matt Woodrow (:mattwoodrow) from comment #8)
> I'm much more interested in not crashing than I am in optimal behaviour at
> this point.
Then how much is this hurting us? It's a non trivial change. OMTC itself is just an optimization so if we prefer stability over performance (which we do) then we should disable OMTC over a hack that make performance worse and for which we're not sure fixes the initial problem.
Comment 10•11 years ago
|
||
Well I'm just entirely confused at this point.
(In reply to Benoit Girard (:BenWa) from comment #9)
> At the risk of being unreasonable (in which case I'd be ok if you found
> someone else to review this change), I feel like this change is just a shot
> in the dark without studying the problem for which we have can reliably
> reproduce.
We can? I only see an intermittent orange bug, and some low volume crash reports, without useful comments. Being able to reproduce this would make life much easier.
> Not only is it a shot in the dark but it's a fairly invasive
> change with poor justification.
>
> In the future it's going to be hard to maintain because it's a hack. Why is
> this compositor specific? Why isn't it going through the shutdown part like
> everything else? With this poor understanding we wont be able to
> remove/maintain this line because we don't even understand why it's there in
> the first place.
Isn't this what you asked me to do? You said that the crash reports pointed at it being the same issue as why we have it disabled on 10.6. I asked why you thought that, and you pointed to a particular set of calls that might cause these problems.
This patch should effectively fix that path, but as I said, I can't see any correlation between these MakeCurrent crashes and the 10.6 ones (they appear to be happening during a transition to/from fullscreen and crash in glDrawArrays). I can't see anything that would suggest windows being created/destroyed or fullscreen happening from the TBPL logs, or the crash reports.
So, if you *do* think this is the problem, then we should take the patch and find out if it changes anything. If you don't (which is fine, because I don't either), then we're back to the drawing board.
>
> (In reply to Matt Woodrow (:mattwoodrow) from comment #8)
> > I'm much more interested in not crashing than I am in optimal behaviour at
> > this point.
>
> Then how much is this hurting us? It's a non trivial change. OMTC itself is
> just an optimization so if we prefer stability over performance (which we
> do) then we should disable OMTC over a hack that make performance worse and
> for which we're not sure fixes the initial problem.
I don't think performance when switching to fullscreen or closing a window is in any way comparable to doubling scrolling throughput (which is what has been recorded for OMTC). One is an edge case, and the other is basically our biggest perf target.
Comment 11•11 years ago
|
||
Milan, do we have anyone that actually knows the cocoa widget code that could help out here?
I think OMTC is a big enough performance win that we should do everything we can to prevent it being turned off (to the point of living with a low-volume crasher, potentially).
As BenWa suggested, I'm just taking shots in the dark here with no STR and no knowledge of the coca widget code or API.
I suspect he's right that we're doing something wrong with our asynchronous interaction between GL and cocoa, but I haven't been able to find out exactly what thus far.
I can/will continue looking at this, any assistance would be appreciated though.
Flags: needinfo?(milan)
Markus, can you help?
Flags: needinfo?(mstange)
Updated•11 years ago
|
Flags: needinfo?(milan)
Comment 15•11 years ago
|
||
overall, not a topcrasher.
but for Mac its:
#2 on Fx25
#4 on Fx26
#12 on Fx27
Flags: needinfo?(twalker)
Reporter | ||
Comment 16•11 years ago
|
||
Ahh ok, I though that classified as a top crasher for the platform.
Comment 17•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #16)
> Ahh ok, I though that classified as a top crasher for the platform.
Yes, it does. Sorry, I forgot to add the keyword when posting the stats.
Keywords: topcrash
Comment 18•11 years ago
|
||
Thanks!
Comment 19•11 years ago
|
||
Another bug that confirms we should consider having topcrash-$platform keywords :)
Reporter | ||
Comment 20•11 years ago
|
||
Comment on attachment 810944 [details] [diff] [review]
Destroy compositor when we close the window
Review of attachment 810944 [details] [diff] [review]:
-----------------------------------------------------------------
Let's wait and see if Markus can help out here before taking this on.
Attachment #810944 -
Flags: review?(bgirard)
Comment 21•11 years ago
|
||
Looks like Matt's on this, assigning to him.
Assignee: nobody → matt.woodrow
Assignee | ||
Comment 22•11 years ago
|
||
I'm pretty confident that the work in bug 886999, bug 914437, bug 923114 and bug 923133 will fix this, so assigning this to myself.
Assignee: matt.woodrow → mstange
Status: NEW → ASSIGNED
Comment 23•11 years ago
|
||
Current Ranking (Mac only):
* Release: N/A
* Beta: #2 (0.63%)
* Aurora: #8 (1.27%)
* Nightly: #22 (0.25%)
Reporter | ||
Comment 24•11 years ago
|
||
Me and mstange discussed this yesterday. We're going to let is slide for Beta since it's too late for the patches. This should be fixed on Nightly as of today or yesterday. We're going to uplift to Aurora after a few more days of baking.
Comment 25•11 years ago
|
||
Thanks Benoit. I'm going to call this wontfix for Fx25 given that info.
Comment 26•11 years ago
|
||
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrash → topcrash-mac
Comment 27•11 years ago
|
||
Adding the signature that is showing up on 10.9 as well, since it appears to be the same crash.
Crash Signature: [@ OpenGL@0x32af] → [@ OpenGL@0x32af]
[@ OpenGL@0x3ac8]
Comment 28•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #24)
> Me and mstange discussed this yesterday. We're going to let is slide for
> Beta since it's too late for the patches. This should be fixed on Nightly as
> of today or yesterday. We're going to uplift to Aurora after a few more days
> of baking.
Benoit - did anything land on nightly that can be nominated for uplift on this bug?
Flags: needinfo?(bgirard)
Updated•11 years ago
|
Crash Signature: [@ OpenGL@0x32af]
[@ OpenGL@0x3ac8] → [@ OpenGL@0x32af ]
[@ OpenGL@0x3ac8 ]
[@ OpenGL@0x4133 ]
Reporter | ||
Comment 29•11 years ago
|
||
Should we uplift your patches for this to aurora now?
Flags: needinfo?(bgirard) → needinfo?(mstange)
Comment 30•11 years ago
|
||
Too late now for speculative uplifts to Beta, marking wontfix for 26.
status-firefox28:
--- → affected
Assignee | ||
Comment 31•11 years ago
|
||
This was fixed for Firefox 27 by one or more patches in this push (probably those from bug 914437): https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=30c25c68bb3e
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Depends on: 914437
Flags: needinfo?(mstange)
Resolution: --- → FIXED
Updated•11 years ago
|
Target Milestone: --- → mozilla27
Comment 33•11 years ago
|
||
No crashes were recorded on 27 beta based on Socorro in the last 3 weeks for:
- [@ OpenGL@0x32af ] signature: http://goo.gl/1tZToV
- [@ OpenGL@0x3ac8 ] signature: http://goo.gl/yqQsYm
- [@ OpenGL@0x4133 ] signature: http://goo.gl/NiYaV2
No crashes were recorded on Aurora 28.0a2 based on Socorro in the last 3 weeks for:
- first signature: http://goo.gl/kJRbAX
- second signature: http://goo.gl/Uetq2r
- third signature: http://goo.gl/Lpyiha
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in
before you can comment on or make changes to this bug.
Description
•