Closed Bug 763175 Opened 8 years ago Closed 8 years ago
Crash in Android
GLController::Provide EGLSurface during Compositor Parent::Resume Composition
This bug tracks crashes not fixed by bug 759162 with a patch that landed in 14.0b6. With combined signatures in 14.0b6, there are 47 crashes with that stack amongst 1323 crashes, i.e. 3.6% of all crashes. It makes it #2 top crasher in 14.0b6. Signature AndroidGLController::ProvideEGLSurface More Reports Search UUID be1f3e1b-0c74-42d7-8dec-c34922120609 Date Processed 2012-06-09 09:43:48 Uptime 111 Last Crash 3.4 days before submission Install Age 2.9 hours since version was first installed. Install Time 2012-06-09 06:47:20 Product FennecAndroid Version 14.0 Build ID 20120605235323 Release Channel beta OS Linux OS Version 0.0.0 Linux 22.214.171.124-gc9932b1 #1 SMP PREEMPT Thu May 10 00:22:08 CST 2012 armv7l Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x8 App Notes AdapterVendorID: endeavoru, AdapterDeviceID: HTC One X. AdapterDescription: 'Model: 'HTC One X', Product: 'endeavoru', Manufacturer: 'HTC', Hardware: 'endeavoru''. HTC HTC One X htc_europe/endeavoru/endeavoru:4.0.3/IML74K/62864.11:user/release-keys EMCheckCompatibility True Frame Module Signature Source 0 libdvm.so libdvm.so@0x576f6 1 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0x2bb77e 2 libxul.so AndroidGLController::ProvideEGLSurface jni.h:706 3 libxul.so mozilla::AndroidBridge::ProvideEGLSurface widget/android/AndroidBridge.cpp:1217 4 libxul.so mozilla::gl::GLContextEGL::RenewSurface gfx/gl/GLContextProviderEGL.cpp:479 5 libxul.so mozilla::layers::CompositorParent::ResumeComposition gfx/layers/ipc/CompositorParent.cpp:157 6 libxul.so mozilla::layers::CompositorParent::ResumeCompositionAndResize gfx/layers/ipc/CompositorParent.cpp:168 7 libxul.so RunnableMethod<mozilla::layers::CompositorParent, void , Tuple2<int, int> >::Run ipc/chromium/src/base/tuple.h:400 8 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:318 9 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:326 10 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:426 11 libxul.so base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:23 12 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:208 13 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:201 14 libxul.so base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:156 15 libxul.so ThreadFunc ipc/chromium/src/base/platform_thread_posix.cc:26 16 libc.so libc.so@0x1319e 17 libc.so libc.so@0x12cd6 More reports at: https://crash-stats.mozilla.com/report/list?signature=AndroidGLController%3A%3AProvideEGLSurface https://crash-stats.mozilla.com/report/list?signature=JNI_GetCreatedJavaVMs+|+AndroidGLController%3A%3AProvideEGLSurface
The two biggest obstacles to making further progress here are that we don't have steps to reproduce this, and this hasn't crashed on Nightly on recent builds (the most recent crash I could find on Nightly is from the May 16th build, though some crashes could be hiding behind other signatures); this means that we can't tell how much a potential fix is going to help before we uplift it. What seemed to be the most likely causes of this crash have been addressed by previous patches, but there's still at least one more thing we can try. If this crash is happening when the window is resized (as a result of device rotation or the keyboard being shown/hidden) rather then when the app is resumed from the background, we can try to not call ProvideEGLSurface at all when resizing (and instead retain the previous surface). This would make our behaviour consistent with the current version of Android's own GLSurfaceView class (older versions of this class created a new surface for every resize, as we currently do). The fact that we'd be making our behaviour match GLSurfaceView mitigates the risk of making such a change; nevertheless, this would represent a significant change in our surface handling, and such a change has the potential to introduce rendering glitches (primarily black screens) and driver crashes.
(In reply to Ali Juma [:ajuma] from comment #1) > What seemed to be the most likely causes of this crash have been addressed > by previous patches, but there's still at least one more thing we can try. ajuma, which branch did the previous patches land on? This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16.
(In reply to Chris Peterson (:cpeterson) from comment #2) > (In reply to Ali Juma [:ajuma] from comment #1) > > What seemed to be the most likely causes of this crash have been addressed > > by previous patches, but there's still at least one more thing we can try. > > ajuma, which branch did the previous patches land on? They landed on trunk, and were uplifted to 14 (either when 14 was aurora or when 14 was beta). > This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16. Yes, this is very curious. Either there's some other patch that's having an effect here, or we're simply seeing the result of differences in user behaviour across branches (e.g. maybe there just aren't that many people using 15 or 16 as their main browser right now).
(In reply to Chris Peterson (:cpeterson) from comment #2) > no longer seems to appear in 15 or 16. 14.0a2 has 2000 ADU while 15.0a2 has only 900 ADU, in Nightly with 500 ADU it crashes from time to time, so it's too soon to conclude they are unaffected.
At this point, we can only block on explosive crashes, or ones that completely break a feature; as unfortunate as it is, we can't block ship on this. However, we will block .N+ on this, at least tentatively, so we can monitor it and see whether we can come up with a fix for a dot release.
blocking-fennec1.0: ? → .N+
This patch changes our behaviour on surfaceChanged events so that instead of always creating a new EGL surface, we instead only do so if we don't already have one (that is, with this patch, we create a new EGL surface on startup and when resuming from the background, but not when simply resizing the screen). This matches the current behaviour of Android's GLSurfaceView class (http://androidxref.com/source/xref/frameworks/base/opengl/java/android/opengl/GLSurfaceView.java). When resizing, GLSurfaceView releases and rebinds the surface and context by making calls to eglMakeCurrent; this patch makes us do this too. Also, GLSurfaceView ensures that 2 frames are rendered before returning from surfaceChanged. This was a workaround added in 2009, and it's not clear whether it's really still needed. This patch makes us render 1 frame before returning from surfaceChanged; we should continue to keep an eye out for visual glitches that suggest we really need to render 2 frames here.
Attachment #631977 - Flags: review?(bgirard)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb12b67efbd This bug should be left open since we won't know if the crashes are fixed until the patch gets uplifted (if we decide to uplift it).
Whiteboard: [native-crash][gfx] → [native-crash][gfx][leave open]
Target Milestone: --- → Firefox 16
URLs for both signatures: Total Count URL 3 about:home 1 http://mobile.smashingmagazine.com/2012/06/01/getting-to-know-android/ 1 about:empty 1 http://washingtondc.craigslist.org/ 1 http://activate.viber.com/viber/activate.php 1 https://play.google.com/store/apps/ 1 http://blogs.scientificamerican.com/observations/2012/06/07/this-psychedelic-shr 1 http://m.getjar.com/instructions?ref=0&lvt=1339439650&sid=59h500004z42czcw&lang= 1 http://www.google.com/m?safe=on&gl=ae&client=ms-android-samsung&source=android-u 1 http://www.noticias3d.com/ 1 http://www.wechatapp.com/cgi-bin/readtemplate?uin=&stype=&fr=www.wechatapp.com&l 1 https://readermobileandroid.echosign.com/public/unregisteredSend?projectid=AP9XF 1 https://www.google.com/m?q=friestube&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-U
(In reply to Ali Juma [:ajuma] from comment #7) > https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb12b67efbd > > This bug should be left open since we won't know if the crashes are fixed > until the patch gets uplifted (if we decide to uplift it). Please nominate if you need it uplifted. Is the crash volume on trunk not high enough to justify closing it out?
(In reply to Brad Lassey [:blassey] from comment #10) > (In reply to Ali Juma [:ajuma] from comment #7) > Is the crash volume on trunk not high enough to justify closing it out? It seems not to be. This wasn't crashing on trunk even before the patches landed, so we can't tell what effect the patches had on the crash.
Comment on attachment 631977 [details] [diff] [review] Only create a new EGL surface when a new Android surface is created. [Approval Request Comment] User impact if declined: Possible crashes. Testing completed (on m-c, etc.): On m-c since June 12th. I'd like for it to get a few more days of testing on m-c before uplifting, but going ahead and nom-ing preemptively. Risk to taking this patch (and alternatives if risky): Medium risk. This make a significant change to our handling of EGL surfaces, and thus has the *potential* to cause crashes and visual glitches (like black screens). However, this change does make our EGL surface handling consistent with that of Android's GLSurfaceView class. String or UUID changes made by this patch: None.
Attachment #631977 - Flags: approval-mozilla-aurora?
Comment on attachment 631977 [details] [diff] [review] Only create a new EGL surface when a new Android surface is created. [Triage Comment] Because we're considering this for 14.0.1 we'll take this for Aurora to help with evaluation.
(In reply to Chris Peterson (:cpeterson) from comment #2) > This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16. It occurred at least one build out of two in 14.0a2 and hasn't happened at all in 15.0a2 for the last 8 builds. The working range is: http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=6b1e65dae1d5&tochange=77cf49ff4b57 The latest non-startup crash in Nightly happened in 15.0a1/20120427030500, bp-4e427d0f-91ff-4c48-ba07-0dc192120502, before the patch of bug 759162 landed. (In reply to Lukas Blakk [:lsblakk] from comment #13) > [Triage Comment] > Because we're considering this for 14.0.1 we'll take this for Aurora to help > with evaluation. Uplifting this patch to Aurora makes no sense because crashes are fixed in Aurora and likely the trunk. If the goal of the patch is only to prevent crashes, it should also be backed out from m-c.
(In reply to Scoobidiver from comment #14) > (In reply to Lukas Blakk [:lsblakk] from comment #13) > > [Triage Comment] > > Because we're considering this for 14.0.1 we'll take this for Aurora to help > > with evaluation. > Uplifting this patch to Aurora makes no sense because crashes are fixed in > Aurora and likely the trunk. > If the goal of the patch is only to prevent crashes, it should also be > backed out from m-c. The changes made by this patch are good to have on m-c even if the crashes are already fixed, since it changes our behaviour to match GLSurfaceView, and, as a result, means we call ProvideEGLSurface much less frequently (or, in other words, we don't call it when we don't actually need to call it). So we shouldn't back this out of m-c. I'm fine with not uplifting this to Aurora if we can identify the patch that fixed the crashes on Aurora (so we can uplift that to Beta). Otherwise, uplifting this to Aurora as a first step towards uplifting this to Beta makes sense (since, by calling ProvideEGLSurface far less often, it has the potential to fix the crashes on Beta).
Comment on attachment 631977 [details] [diff] [review] Only create a new EGL surface when a new Android surface is created. Re-nomming so that Comments 14 and 15 can be taken into account when making a decision.
Attachment #631977 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora?
It looks very much like Bug 750272 is what fixed this on Aurora and Nightly. Bug 750272 is a much safer patch to uplift to Beta than this one. It does have the potential to cause visual glitches (since it causes ProvideEGLSurface to return a null surface when it would have otherwise crashed). However, if such glitches occurs, they will be in situations where we would otherwise have crashed, so the tradeoff seems reasonable for 14.0.1. I think we should still uplift the patch for this bug to Aurora, since it reduces the risk of getting the visual glitches potentially introduced by the patch for Bug 750272.
Comment on attachment 631977 [details] [diff] [review] Only create a new EGL surface when a new Android surface is created. Compelling. Land'er on Aurora.
Since we now have good evidence that the crashes are actually fixed on trunk (by Bug 750272), let's close this now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
There are currently no crashes in 14.0b10 (bug 750272 landed in this build).
Whiteboard: [native-crash][gfx][leave open] → [native-crash][gfx]
You need to log in before you can comment on or make changes to this bug.