Closed
Bug 884047
Opened 12 years ago
Closed 11 years ago
crash in mozilla::gl::GLLibraryEGL::fMakeCurrent @ libEGL
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: scoobidiver, Unassigned)
References
Details
(Keywords: crash, regression, topcrash-android-armv7, Whiteboard: [native-crash][fixed in 28+ by bug 925608])
Crash Data
Attachments
(1 file)
6.87 KB,
patch
|
blassey
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
It first showed up in 24.0a1/20130610. The regression range might be (low volume):
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=efbe547a7972&tochange=252b1ac4d537
Signature mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*) More Reports Search
UUID 23f50911-f491-4d7c-939f-638952130617
Date Processed 2013-06-17 07:07:29
Uptime 70
Install Age 14.2 hours since version was first installed.
Install Time 2013-06-16 16:53:17
Product FennecAndroid
Version 24.0a1
Build ID 20130616031139
Release Channel nightly
OS Android
OS Version 0.0.0 Linux 3.0.31-889555 #3 SMP PREEMPT Thu Jan 31 14:48:54 KST 2013 armv7l samsung/GT-I9100/GT-I9100:4.1.2/JZO54K/I9100XWLSD:user/release-keys
Build Architecture arm
Build Architecture Info ARMv0
Crash Reason SIGSEGV
Crash Address 0x640064
App Notes
AdapterDescription: 'ARM -- Mali-400 MP -- OpenGL ES 2.0 -- Model: GT-I9100, Product: GT-I9100, Manufacturer: samsung, Hardware: smdk4210'
GL Layers! EGL? EGL+ GL Context? GL Context+ GL Layers+
samsung GT-I9100
samsung/GT-I9100/GT-I9100:4.1.2/JZO54K/I9100XWLSD:user/release-keys
Processor Notes sp-processor01_phx1_mozilla_com_15474:2012; non-integer value of "SecondsSinceLastCrash"; exploitability tool: ERROR: unable to analyze dump
EMCheckCompatibility True
Adapter Vendor ID ARM
Adapter Device ID Mali-400 MP
Device samsung GT-I9100
Android API Version 16 (REL)
Android CPU ABI armeabi-v7a
Frame Module Signature Source
0 libEGL.so libEGL.so@0xabf4
1 libEGL.so libEGL.so@0xa9e3
2 libEGL.so libEGL.so@0x2212a
3 libEGL.so libEGL.so@0x22106
4 libEGL.so libEGL.so@0xc995
5 libEGL.so libEGL.so@0x22106
6 libEGL.so libEGL.so@0xc8a7
7 libxul.so mozilla::gl::GLLibraryEGL::fMakeCurrent gfx/gl/GLLibraryEGL.h:164
8 libxul.so mozilla::gl::GLContextEGL::RenewSurface gfx/gl/GLContextProviderEGL.cpp:547
9 libxul.so mozilla::layers::CompositorOGL::Resume gfx/layers/opengl/CompositorOGL.cpp:1396
10 libxul.so mozilla::layers::CompositorOGL::Pause gfx/layers/opengl/CompositorOGL.cpp:1388
11 libxul.so mozilla::layers::CompositorParent::ResumeComposition gfx/layers/ipc/CompositorParent.cpp:299
12 libxul.so mozilla::layers::CompositorParent::SetEGLSurfaceSize gfx/layers/ipc/CompositorParent.cpp:333
13 libxul.so RunnableMethod<mozilla::ipc::AsyncChannel, void ipc/chromium/src/base/tuple.h:400
14 libxul.so mozilla::layers::CompositorParent::SetEGLSurfaceSize gfx/layers/ipc/CompositorParent.cpp:333
15 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:337
16 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:345
17 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:445
18 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:23
19 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:219
20 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:212
21 libxul.so base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:160
More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Agl%3A%3AGLLibraryEGL%3A%3AfMakeCurrent%28void*%2C+void*%2C+void*%2C+void*%29
Reporter | ||
Comment 1•12 years ago
|
||
More reports also at:
https://crash-stats.mozilla.com/report/list?product=FennecAndroid&signature=_ZN7android13egl_display_t11loseCurrentEPNS_13egl_context_tE
Crash Signature: [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)] → [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)]
[@ _ZN7android13egl_display_t11loseCurrentEPNS_13egl_context_tE ]
Summary: crash in mozilla::gl::GLLibraryEGL::fMakeCurrent @ libEGL → crash in mozilla::gl::GLLibraryEGL::fMakeCurrent @ _ZN7android13egl_display_t11loseCurrentEPNS_13egl_context_tE
Reporter | ||
Comment 2•12 years ago
|
||
I am not sure both signatures are related so I am removing the added one.
Crash Signature: [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)]
[@ _ZN7android13egl_display_t11loseCurrentEPNS_13egl_context_tE ] → [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)]
status-firefox25:
--- → affected
Summary: crash in mozilla::gl::GLLibraryEGL::fMakeCurrent @ _ZN7android13egl_display_t11loseCurrentEPNS_13egl_context_tE → crash in mozilla::gl::GLLibraryEGL::fMakeCurrent @ libEGL
Reporter | ||
Comment 3•12 years ago
|
||
It's #5 crasher in 24.0a2 and #15 in 25.0a1.
Comment 4•12 years ago
|
||
Kairo looks like "Device samsung GT-I9100" would be one of the hardware here are there any other device/OS correlations for QA to try and reproduce here ?
Also adding needsinfo on :snorp to see if anything stands out from changeset as a culprit or if there were any major graphic related landings in FX24 which may have caused this.
Flags: needinfo?(snorp)
Flags: needinfo?(kairo)
Comment 5•12 years ago
|
||
Nope I don't know of anything specifically, though I think maybe there were some compositor pause/resume changes? Kats?
Flags: needinfo?(snorp) → needinfo?(bugmail.mozilla)
![]() |
||
Comment 6•12 years ago
|
||
Crashes in a week of Aurora:
mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*) 6
Samsung GT-I9300 2
Samsung GT-I9100 2
Unknown Android 1
HTC One 1
Flags: needinfo?(kairo)
Updated•12 years ago
|
Assignee: nobody → bugmail.mozilla
tracking-fennec: ? → 24+
Comment 7•12 years ago
|
||
There's nothing related in the regression range in comment 0, as far as I can tell. I don't think anything related to compositor pause/resume landed in the aurora timeframe, actually.
I looked at the crash dump and the code and the code still makes sense to me (i.e. it has changed in unexpected ways). The UI thread should be blocked until the compositor renews the surface (or fails to do so) so the surface shouldn't disappear while in the middle of the compositor renew call. I also looked at a couple of submitted logcats to see if there was anything weird there and I didn't see anything that stood out.
I can stick some asserts and such in to try to get a better feel for what's going on but it's going to be slow and painful debugging via crash-stats. If we can get STR on this it would help a lot.
Updated•12 years ago
|
Flags: needinfo?(bugmail.mozilla)
Keywords: qawanted → steps-wanted
Comment 9•12 years ago
|
||
Hey Kats,
I've looked at the crash-reports and there are no URLS or bug comments that would help QA to repro or get STR here.Do we have any specific ideas as to under what conditions this code path may be hit so QA can invest sometime in those scenario's else they have nothing else to get the steps here ?
Updated•12 years ago
|
Comment 10•12 years ago
|
||
The code that is shown in comment 0 runs pretty much only when the GL surface that web content is shown on is resized or created/destroyed. So in terms of testing I would try various combinations of:
1) Rotating the device (this resizes the surface)
2) Showing/hiding the virtual keyboard (this also resizes the surface)
3) Putting web content in the background, either by putting Fennec in the background, or by bringing up other content on top of it, like the awesomescreen or the settings pane. The surface is destroyed when the activity is in the background and is re-created when it comes back to the foreground.
Combinations of the above, probably with very little time in between, would be the best to try and reproduce this.
Reporter | ||
Updated•12 years ago
|
Crash Signature: [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)] → [@ mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*)]
[@ @0x0 | mozilla::gl::GLLibraryEGL::fMakeCurrent(void*, void*, void*, void*) ]
Reporter | ||
Comment 11•12 years ago
|
||
Here is a comment: "Attempt to load phone number as page. Get error page. Attempt to Google for number using suggest picker. Crash."
Comment 12•12 years ago
|
||
Adding qawanted to give this bug a shot for steps-wanted given comment #10/11.
Keywords: qawanted
Comment 14•12 years ago
|
||
No such luck having spent more than two hours using Beta on my GT-I9100 surfing, rotating and trying to trigger this crash.
Flags: needinfo?(aaron.train)
Comment 15•12 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> There's nothing related in the regression range in comment 0, as far as I
> can tell. I don't think anything related to compositor pause/resume landed
> in the aurora timeframe, actually.
>
> I looked at the crash dump and the code and the code still makes sense to me
> (i.e. it has changed in unexpected ways). The UI thread should be blocked
> until the compositor renews the surface (or fails to do so) so the surface
> shouldn't disappear while in the middle of the compositor renew call. I also
> looked at a couple of submitted logcats to see if there was anything weird
> there and I didn't see anything that stood out.
>
> I can stick some asserts and such in to try to get a better feel for what's
> going on but it's going to be slow and painful debugging via crash-stats. If
> we can get STR on this it would help a lot.
Kats, given QA is tried hard but is unable to reproduce this :(, can we please try some investigation by sticking some asserts as you suggest.
Comment 16•12 years ago
|
||
Is it possible for a nested event loop (on the main thread) to cause the state issue you're seeing?
Updated•12 years ago
|
Flags: needinfo?(bugmail.mozilla)
Comment 17•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #16)
> Is it possible for a nested event loop (on the main thread) to cause the
> state issue you're seeing?
No, I don't think so. The Java UI thread should be blocked for the surface resume to finish on the compositor thread so even if the main thread goes and does other things it shouldn't unblock the UI thread, and the surface shouldn't disappear.
(In reply to bhavana bajaj [:bajaj] from comment #15)
> Kats, given QA is tried hard but is unable to reproduce this :(, can we
> please try some investigation by sticking some asserts as you suggest.
Ok, I'll spend some time putting together a patch for this.
Flags: needinfo?(bugmail.mozilla)
Comment 18•12 years ago
|
||
Attachment #799117 -
Flags: review?(snorp)
Updated•12 years ago
|
Attachment #799117 -
Flags: review?(snorp) → review+
Comment 19•12 years ago
|
||
Whiteboard: [native-crash] → [native-crash][leave open]
Updated•12 years ago
|
Keywords: steps-wanted
Updated•12 years ago
|
Comment 21•12 years ago
|
||
Note to self: none of the crashes with this signature up to Sep 11, 2013 18:26 on builds with this patch have logcat info.
Comment 22•12 years ago
|
||
Comment on attachment 799117 [details] [diff] [review]
Diagnostic logging
We want to uplift this to 25 (which goes to beta next week) so that hopefully the higher user population there will give us more data faster.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: we'll have to wait longer to get useful info from this patch
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): low risk, mobile only
String or IDL/UUID changes made by this patch: none
Attachment #799117 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #799117 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 23•12 years ago
|
||
Comment 24•12 years ago
|
||
Thanks!
Comment 25•12 years ago
|
||
kats, did you get the data you needed? What are the next steps here?
Flags: needinfo?(bugmail.mozilla)
Comment 26•12 years ago
|
||
I looked through all the crash reports in the last week and found a few with logcat:
https://crash-stats.mozilla.com/report/index/10bad6e2-9181-42eb-ab3c-09ffc2131003
https://crash-stats.mozilla.com/report/index/7494eef9-3221-444f-b97f-decd92131006
https://crash-stats.mozilla.com/report/index/ead10324-24de-4664-b004-08b692131003
https://crash-stats.mozilla.com/report/index/f97fe05b-34d3-4f23-a3f8-76e692131002
https://crash-stats.mozilla.com/report/index/ebcd6f31-f4a2-40fa-ac32-bbee92131002
A couple of these have a bunch of GL error 0x506 messages but many do not. The first crash report listed above is the simplest, in that things are working fine until we get one call to resumeCompositor and then it crashes shortly after. The resumeCompositor codepath is run because the surface changes size. This is something we normally handle just fine, and so based on the code and output I don't think this is a problem in the Java code managing the compositor. Somebody more familiar with the C++ GL code needs to take over here, I think. bjacob, do you have any spare cycles to look at this?
Flags: needinfo?(bugmail.mozilla) → needinfo?(bjacob)
Comment 27•12 years ago
|
||
I have to say that I'm not at all an expert of the surface-renewal code around here and I just happened to be the sucker who worked on bug 90020 ;-) !
With that said, I made a debug build and tried reproducing but ran into an assertion in the JS engine, which I filed as bug 925024.
At the moment I'm trying to debug past that by disabling that assertion, but, no promises.
Comment 29•12 years ago
|
||
I tried really hard to reproduce in a non-optimized debug build following the suggestions in comment 10. I really can't reproduce.
Comment 30•12 years ago
|
||
I don't think we'll be able to reproduce (removing qawanted), which is why we landed the logcat patch. Any hardening or speculative fixes we can try? Or does the baton need to be passed?
Keywords: qawanted
Comment 31•12 years ago
|
||
Can I see the logcat for some crash? Comment 26 alludes to that, but I don't understand how to view the logcat.
Comment 32•12 years ago
|
||
(As discussed on IRC you can view the logcat from the raw JSON dump on the "raw dump" tab of the crash report. You need to be logged in to crash-stats and have some magic permission bit set in your LDAP.)
Comment 33•12 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #30)
> Any hardening or speculative fixes we can try?
I got nothing :(
Comment 34•12 years ago
|
||
So, one thing that is suspiscious here is that in GLContextEGL::RenewSurface we are exiting without doing anything if we already had a surface, and that is happening in particular on orientation change.
But I don't currently understand whether that is normal or not, and I still can't reproduce crashes. So the way forward is probably to first understand how these things work and what we should be doing in theory, and what we are doing in practice. Filed bug 925608 for that. Once we understand that clearly, maybe we'll have ideas of patches to try here.
Comment 35•12 years ago
|
||
Just filed bug 925645 which may or may not be related.
Comment 36•11 years ago
|
||
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrash → topcrash-android-armv7
Updated•11 years ago
|
Updated•11 years ago
|
tracking-fennec: 24+ → ?
Updated•11 years ago
|
tracking-fennec: ? → +
![]() |
||
Comment 37•11 years ago
|
||
This is one of the major regressions that has made our crash rate worsen between the 23 and 24 releases, and it's now the #5 topcrash on 25 release. Any chance for progress here?
Comment 38•11 years ago
|
||
Benoit Jacob has been looking into a family of related problems with our surface renewal. I expect that his changes in bug 925608 will fix this.
Assignee: bugmail.mozilla → nobody
Comment 39•11 years ago
|
||
Now that bug 925608 is fixed, I would appreciate testing on the latest Nightly build to see if the issues reported here still persist.
![]() |
||
Comment 40•11 years ago
|
||
We'll need to watch crash-stats to verify, of course, as this was filed "just" based on data.
Comment 41•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #40)
> We'll need to watch crash-stats to verify, of course, as this was filed
> "just" based on data.
Any news here? Is this still a topcrash worth tracking?
Flags: needinfo?(kairo)
Comment 42•11 years ago
|
||
I don't see this crash on 28 or 27 but that is not helpful as only Firefox 28 has bug 925608 fixed. So this would need to be in at least beta to get the numbers of users to rank this crash.
Flags: needinfo?(kairo)
Comment 43•11 years ago
|
||
We're now heading to final Beta so we'll have to wontfix this over taking last-minute speculative landings to the branch. Since it did ship in the past two releases as well, we know we can live with this until we ship the fix in bug 925608 (which should probably be uplifted to Aurora).
![]() |
||
Comment 44•11 years ago
|
||
Actually, according to https://bugzilla.mozilla.org/show_bug.cgi?id=925608#c43 we will not uplift to Aurora and let it ride on 28.
Comment 45•11 years ago
|
||
According to crash-stats there are no crashes on 28.0b*, 29* or 30* in the last 4 weeks with fMakeCurrent in the signature. I think we can call this fixed by 925608.
Status: NEW → RESOLVED
Closed: 11 years ago
status-firefox27:
--- → wontfix
status-firefox28:
--- → fixed
No longer depends on: 925024
Resolution: --- → FIXED
Whiteboard: [native-crash][leave open] → [native-crash][fixed in 28+ by bug 925608]
Updated•11 years ago
|
Target Milestone: --- → mozilla28
You need to log in
before you can comment on or make changes to this bug.
Description
•