Last Comment Bug 763175 - Crash in AndroidGLController::ProvideEGLSurface during CompositorParent::ResumeComposition
: Crash in AndroidGLController::ProvideEGLSurface during CompositorParent::Resu...
Status: RESOLVED FIXED
[native-crash][gfx]
: crash
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: 14 Branch
: ARM Android
: -- critical (vote)
: Firefox 16
Assigned To: Ali Juma [:ajuma]
:
: Sebastian Kaspari (:sebastian)
Mentors:
Depends on: 750272
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-09 03:47 PDT by Scoobidiver (away)
Modified: 2012-07-03 09:55 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
fixed
fixed
.N+


Attachments
Only create a new EGL surface when a new Android surface is created. (2.13 KB, patch)
2012-06-11 12:15 PDT, Ali Juma [:ajuma]
b56girard: review+
joe: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-06-09 03:47:51 PDT
This bug tracks crashes not fixed by bug 759162 with a patch that landed in 14.0b6.
With combined signatures in 14.0b6, there are 47 crashes with that stack amongst 1323 crashes, i.e. 3.6% of all crashes. It makes it #2 top crasher in 14.0b6.

Signature 	AndroidGLController::ProvideEGLSurface More Reports Search
UUID	be1f3e1b-0c74-42d7-8dec-c34922120609
Date Processed	2012-06-09 09:43:48
Uptime	111
Last Crash	3.4 days before submission
Install Age	2.9 hours since version was first installed.
Install Time	2012-06-09 06:47:20
Product	FennecAndroid
Version	14.0
Build ID	20120605235323
Release Channel	beta
OS	Linux
OS Version	0.0.0 Linux 2.6.39.4-gc9932b1 #1 SMP PREEMPT Thu May 10 00:22:08 CST 2012 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x8
App Notes 	
AdapterVendorID: endeavoru, AdapterDeviceID: HTC One X.
AdapterDescription: 'Model: 'HTC One X', Product: 'endeavoru', Manufacturer: 'HTC', Hardware: 'endeavoru''.
HTC HTC One X
htc_europe/endeavoru/endeavoru:4.0.3/IML74K/62864.11:user/release-keys
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 	libdvm.so 	libdvm.so@0x576f6 	
1 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x2bb77e 	
2 	libxul.so 	AndroidGLController::ProvideEGLSurface 	jni.h:706
3 	libxul.so 	mozilla::AndroidBridge::ProvideEGLSurface 	widget/android/AndroidBridge.cpp:1217
4 	libxul.so 	mozilla::gl::GLContextEGL::RenewSurface 	gfx/gl/GLContextProviderEGL.cpp:479
5 	libxul.so 	mozilla::layers::CompositorParent::ResumeComposition 	gfx/layers/ipc/CompositorParent.cpp:157
6 	libxul.so 	mozilla::layers::CompositorParent::ResumeCompositionAndResize 	gfx/layers/ipc/CompositorParent.cpp:168
7 	libxul.so 	RunnableMethod<mozilla::layers::CompositorParent, void , Tuple2<int, int> >::Run 	ipc/chromium/src/base/tuple.h:400
8 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:318
9 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:326
10 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:426
11 	libxul.so 	base::MessagePumpDefault::Run 	ipc/chromium/src/base/message_pump_default.cc:23
12 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:208
13 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:201
14 	libxul.so 	base::Thread::ThreadMain 	ipc/chromium/src/base/thread.cc:156
15 	libxul.so 	ThreadFunc 	ipc/chromium/src/base/platform_thread_posix.cc:26
16 	libc.so 	libc.so@0x1319e 	
17 	libc.so 	libc.so@0x12cd6

More reports at:
https://crash-stats.mozilla.com/report/list?signature=AndroidGLController%3A%3AProvideEGLSurface
https://crash-stats.mozilla.com/report/list?signature=JNI_GetCreatedJavaVMs+|+AndroidGLController%3A%3AProvideEGLSurface
Comment 1 Ali Juma [:ajuma] 2012-06-09 11:22:20 PDT
The two biggest obstacles to making further progress here are that we don't have steps to reproduce this, and this hasn't crashed on Nightly on recent builds (the most recent crash I could find on Nightly is from the May 16th build, though some crashes could be hiding behind other signatures); this means that we can't tell how much a potential fix is going to help before we uplift it.

What seemed to be the most likely causes of this crash have been addressed by previous patches, but there's still at least one more thing we can try. If this crash is happening when the window is resized (as a result of device rotation or the keyboard being shown/hidden) rather then when the app is resumed from the background, we can try to not call ProvideEGLSurface at all when resizing (and instead retain the previous surface). This would make our behaviour consistent with the current version of Android's own GLSurfaceView class (older versions of this class created a new surface for every resize, as we currently do). The fact that we'd be making our behaviour match GLSurfaceView mitigates the risk of making such a change; nevertheless, this would represent a significant change in our surface handling, and such a change has the potential to introduce rendering glitches (primarily black screens) and driver crashes.
Comment 2 Chris Peterson [:cpeterson] 2012-06-11 11:04:25 PDT
(In reply to Ali Juma [:ajuma] from comment #1)
> What seemed to be the most likely causes of this crash have been addressed
> by previous patches, but there's still at least one more thing we can try.

ajuma, which branch did the previous patches land on?

This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16.
Comment 3 Ali Juma [:ajuma] 2012-06-11 11:16:32 PDT
(In reply to Chris Peterson (:cpeterson) from comment #2)
> (In reply to Ali Juma [:ajuma] from comment #1)
> > What seemed to be the most likely causes of this crash have been addressed
> > by previous patches, but there's still at least one more thing we can try.
> 
> ajuma, which branch did the previous patches land on?

They landed on trunk, and were uplifted to 14 (either when 14 was aurora or when 14 was beta).

> This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16.

Yes, this is very curious. Either there's some other patch that's having an effect here, or we're simply seeing the result of differences in user behaviour across branches (e.g. maybe there just aren't that many people using 15 or 16 as their main browser right now).
Comment 4 Scoobidiver (away) 2012-06-11 11:33:55 PDT
(In reply to Chris Peterson (:cpeterson) from comment #2)
> no longer seems to appear in 15 or 16.
14.0a2 has 2000 ADU while 15.0a2 has only 900 ADU, in Nightly with 500 ADU it crashes from time to time, so it's too soon to conclude they are unaffected.
Comment 5 Joe Drew (not getting mail) 2012-06-11 11:51:39 PDT
At this point, we can only block on explosive crashes, or ones that completely break a feature; as unfortunate as it is, we can't block ship on this.

However, we will block .N+ on this, at least tentatively, so we can monitor it and see whether we can come up with a fix for a dot release.
Comment 6 Ali Juma [:ajuma] 2012-06-11 12:15:53 PDT
Created attachment 631977 [details] [diff] [review]
Only create a new EGL surface when a new Android surface is created.

This patch changes our behaviour on surfaceChanged events so that instead of always creating a new EGL surface, we instead only do so if we don't already have one (that is, with this patch, we create a new EGL surface on startup and when resuming from the background, but not when simply resizing the screen). This matches the current behaviour of Android's GLSurfaceView class (http://androidxref.com/source/xref/frameworks/base/opengl/java/android/opengl/GLSurfaceView.java).

When resizing, GLSurfaceView releases and rebinds the surface and context by making calls to eglMakeCurrent; this patch makes us do this too.

Also, GLSurfaceView ensures that 2 frames are rendered before returning from surfaceChanged. This was a workaround added in 2009, and it's not clear whether it's really still needed. This patch makes us render 1 frame before returning from surfaceChanged; we should continue to keep an eye out for visual glitches that suggest we really need to render 2 frames here.
Comment 7 Ali Juma [:ajuma] 2012-06-12 07:27:35 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb12b67efbd

This bug should be left open since we won't know if the crashes are fixed until the patch gets uplifted (if we decide to uplift it).
Comment 9 Matt Brubeck (:mbrubeck) 2012-06-12 18:33:27 PDT
https://hg.mozilla.org/mozilla-central/rev/1bb12b67efbd
Comment 10 Brad Lassey [:blassey] (use needinfo?) 2012-06-14 14:45:05 PDT
(In reply to Ali Juma [:ajuma] from comment #7)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb12b67efbd
> 
> This bug should be left open since we won't know if the crashes are fixed
> until the patch gets uplifted (if we decide to uplift it).

Please nominate if you need it uplifted.

Is the crash volume on trunk not high enough to justify closing it out?
Comment 11 Ali Juma [:ajuma] 2012-06-14 14:56:42 PDT
(In reply to Brad Lassey [:blassey] from comment #10)
> (In reply to Ali Juma [:ajuma] from comment #7)
> Is the crash volume on trunk not high enough to justify closing it out?

It seems not to be. This wasn't crashing on trunk even before the patches landed, so we can't tell what effect the patches had on the crash.
Comment 12 Ali Juma [:ajuma] 2012-06-14 15:08:24 PDT
Comment on attachment 631977 [details] [diff] [review]
Only create a new EGL surface when a new Android surface is created.

[Approval Request Comment]
User impact if declined: Possible crashes.
Testing completed (on m-c, etc.): On m-c since June 12th. I'd like for it to get a few more days of testing on m-c before uplifting, but going ahead and nom-ing preemptively.
Risk to taking this patch (and alternatives if risky): Medium risk. This make a significant change to our handling of EGL surfaces, and thus has the *potential* to cause crashes and visual glitches (like black screens). However, this change does make our EGL surface handling consistent with that of Android's GLSurfaceView class.
String or UUID changes made by this patch: None.
Comment 13 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-15 16:25:49 PDT
Comment on attachment 631977 [details] [diff] [review]
Only create a new EGL surface when a new Android surface is created.

[Triage Comment]
Because we're considering this for 14.0.1 we'll take this for Aurora to help with evaluation.
Comment 14 Scoobidiver (away) 2012-06-16 14:38:31 PDT
(In reply to Chris Peterson (:cpeterson) from comment #2)
> This is a topcrash in 14.0b6, but no longer seems to appear in 15 or 16.
It occurred at least one build out of two in 14.0a2 and hasn't happened at all in 15.0a2 for the last 8 builds.
The working range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=6b1e65dae1d5&tochange=77cf49ff4b57
The latest non-startup crash in Nightly happened in 15.0a1/20120427030500, bp-4e427d0f-91ff-4c48-ba07-0dc192120502, before the patch of bug 759162 landed.

(In reply to Lukas Blakk [:lsblakk] from comment #13)
> [Triage Comment]
> Because we're considering this for 14.0.1 we'll take this for Aurora to help
> with evaluation.
Uplifting this patch to Aurora makes no sense because crashes are fixed in Aurora and likely the trunk.
If the goal of the patch is only to prevent crashes, it should also be backed out from m-c.
Comment 15 Ali Juma [:ajuma] 2012-06-16 14:47:01 PDT
(In reply to Scoobidiver from comment #14)
> (In reply to Lukas Blakk [:lsblakk] from comment #13)
> > [Triage Comment]
> > Because we're considering this for 14.0.1 we'll take this for Aurora to help
> > with evaluation.
> Uplifting this patch to Aurora makes no sense because crashes are fixed in
> Aurora and likely the trunk.
> If the goal of the patch is only to prevent crashes, it should also be
> backed out from m-c.

The changes made by this patch are good to have on m-c even if the crashes are already fixed, since it changes our behaviour to match GLSurfaceView, and, as a result, means we call ProvideEGLSurface much less frequently (or, in other words, we don't call it when we don't actually need to call it). So we shouldn't back this out of m-c.

I'm fine with not uplifting this to Aurora if we can identify the patch that fixed the crashes on Aurora (so we can uplift that to Beta). Otherwise, uplifting this to Aurora as a first step towards uplifting this to Beta makes sense (since, by calling ProvideEGLSurface far less often, it has the potential to fix the crashes on Beta).
Comment 16 Ali Juma [:ajuma] 2012-06-16 14:48:16 PDT
Comment on attachment 631977 [details] [diff] [review]
Only create a new EGL surface when a new Android surface is created.

Re-nomming so that Comments 14 and 15 can be taken into account when making a decision.
Comment 17 Ali Juma [:ajuma] 2012-06-19 08:25:05 PDT
It looks very much like Bug 750272 is what fixed this on Aurora and Nightly. Bug 750272 is a much safer patch to uplift to Beta than this one. It does have the potential to cause visual glitches (since it causes ProvideEGLSurface to return a null surface when it would have otherwise crashed). However, if such glitches occurs, they will be in situations where we would otherwise have crashed, so the tradeoff seems reasonable for 14.0.1.

I think we should still uplift the patch for this bug to Aurora, since it reduces the risk of getting the visual glitches potentially introduced by the patch for Bug 750272.
Comment 18 Joe Drew (not getting mail) 2012-06-19 11:42:06 PDT
Comment on attachment 631977 [details] [diff] [review]
Only create a new EGL surface when a new Android surface is created.

Compelling. Land'er on Aurora.
Comment 20 Ali Juma [:ajuma] 2012-06-19 11:57:10 PDT
Since we now have good evidence that the crashes are actually fixed on trunk (by Bug 750272), let's close this now.
Comment 21 Scoobidiver (away) 2012-06-30 11:58:55 PDT
There are currently no crashes in 14.0b10 (bug 750272 landed in this build).

Note You need to log in before you can comment on or make changes to this bug.