Last Comment Bug 780699 - Firefox hangs with "Don't keep activities" developer option enabled
: Firefox hangs with "Don't keep activities" developer option enabled
Status: VERIFIED FIXED
: regression
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: 17 Branch
: ARM Android
: -- normal (vote)
: Firefox 17
Assigned To: Kartikaya Gupta (email:kats@mozilla.com)
:
:
Mentors:
Depends on:
Blocks: 769269 black-jellybean
  Show dependency treegraph
 
Reported: 2012-08-06 13:18 PDT by Michael Comella (:mcomella) [not actively working on fennec: expect slow responses]
Modified: 2016-07-29 14:29 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified
verified
verified


Attachments
Patch (1.25 KB, patch)
2012-08-08 07:59 PDT, Kartikaya Gupta (email:kats@mozilla.com)
ajuma.bugzilla: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Michael Comella (:mcomella) [not actively working on fennec: expect slow responses] 2012-08-06 13:18:47 PDT
0) Turn on "Don't keep activities" option (Settings -> Developer Options")
1) Open Nightly

Expected: Nightly is usable and loads pages.
Actual: Nightly hangs and nothing loads. Occasionally, after some wiggling, a page may load (most notably on a clean startup) but clicking the awesomebar and selecting something will cause Firefox to hang again.

Tested on Galaxy Nexus, Android 4.0.4, and Asus Transformer Prime TF201, Android 4.0.3. Nightly 8/6.

Can anyone else confirm?
Comment 1 Aaron Train [:aaronmt] 2012-08-06 16:24:16 PDT
Confirmed (TF201/4.0.4, Nightly 08/06).
Comment 2 :Margaret Leibovic 2012-08-07 07:59:42 PDT
Is this actually a regression? Does it work in 14/15/16?
Comment 3 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-07 13:59:15 PDT
This might be a regression from bug 770047, in which case we have a problem since they're both beta-blocker bugs.
Comment 4 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-07 14:00:01 PDT
err I mean tracking-15+ bugs. or whatever those bugs are called that need to go into 15...
Comment 5 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 07:50:19 PDT
This is happening because of a deadlock condition in the code. The UI thread is here (part of the stack I filled in by hand):

at MonitorAutoLock(mResumeCompositorMonitor)::Wait
at CompositorParent::ScheduleResumeOnCompositorThread
at nsWindow::ScheduleResumeComposition
at org.mozilla.gecko.GeckoAppShell.scheduleResumeComposition
at org.mozilla.gecko.gfx.GeckoLayerClient.compositionResumeRequested(GeckoLayerClient.java:472)
at org.mozilla.gecko.gfx.GLController.resumeCompositorIfValid(GLController.java:78)
at org.mozilla.gecko.GeckoAppShell.setLayerClient(Native Method)
at org.mozilla.gecko.GeckoAppShell.setLayerClient(Native Method)
at org.mozilla.gecko.GeckoAppShell.setLayerClient(Native Method)
at org.mozilla.gecko.GeckoApp.initialize(GeckoApp.java:1788)
at org.mozilla.gecko.GeckoApp.onWindowFocusChanged(GeckoApp.java:2011)
at com.android.internal.policy.impl.PhoneWindow$DecorView.onWindowFocusChanged(PhoneWindow.java:2423)
at android.view.View.dispatchWindowFocusChanged(View.java:7321)
at android.view.ViewGroup.dispatchWindowFocusChanged(ViewGroup.java:933)
at android.view.ViewRootImpl$ViewRootHandler.handleMessage(ViewRootImpl.java:2881)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137) 
at android.app.ActivityThread.main(ActivityThread.java:4745)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:511)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:786)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:553)
at dalvik.system.NativeStart.main(Native Method)

This thread is waiting for the compositor-resume on the compositor thread to finish executing. However it is a Java thread that is in the middle of executing GLController.resumeCompositorIfValid, which is a function synchronized on the GLController instance.

The compositor thread is trying to reacquire a surface via a call to AndroidBridge::ProvideEGLSurface, which tries to call GLController.waitForValidSurface. However that function is also synchronized on the GLController instance, and so it cannot even enter the waitForValidSurface function, resulting in a deadlock.
Comment 6 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 07:59:32 PDT
Created attachment 650120 [details] [diff] [review]
Patch

This fixes the problem. I don't think the GLController needs to hold on to the lock when calling the listener function. There might be better fixes though, it seems kind of wrong to me that the java UI thread gets blocked on the compositor thread, so maybe shunting some of that code onto a different thread would work better?
Comment 7 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 08:10:05 PDT
This is indeed a regression from bug 770047, and is on beta as of
http://hg.mozilla.org/releases/mozilla-beta/rev/b40643c774f6
Comment 8 Ali Juma [:ajuma] 2012-08-08 08:22:09 PDT
Comment on attachment 650120 [details] [diff] [review]
Patch

(In reply to Kartikaya Gupta (:kats) from comment #6)
> Created attachment 650120 [details] [diff] [review]
> Patch
> 
> This fixes the problem. I don't think the GLController needs to hold on to
> the lock when calling the listener function. There might be better fixes
> though, it seems kind of wrong to me that the java UI thread gets blocked on
> the compositor thread, so maybe shunting some of that code onto a different
> thread would work better?

In this particular instance, we don't need the java UI thread blocked until the compositor resumes, so we could indeed move this code onto a different thread. However, the simple fix here seems to be the best approach for something we're going to want to uplift.

Also, note that there are other situations where the java UI thread *needs* to block on the compositor thread (e.g. during a surfaceDestroyed event), so we cannot eliminate all instances of the java thread blocking on the compositor thread.
Comment 9 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 08:35:56 PDT
Comment on attachment 650120 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 770047
User impact if declined: when the "don't keep activities" developer option is checked, fennec won't load pages properly from the awesomebar
Testing completed (on m-c, etc.): locally so far
Risk to taking this patch (and alternatives if risky): mobile only. should be fairly low risk. another option is to back out bug 770047 but then we get the black flash back.
String or UUID changes made by this patch: none
Comment 10 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 09:11:38 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/add211f29374
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-08-08 18:25:47 PDT
https://hg.mozilla.org/mozilla-central/rev/add211f29374
Comment 12 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-09 10:15:51 PDT
Comment on attachment 650120 [details] [diff] [review]
Patch

mobile only, so willing to take this low risk patch for branches.
Comment 14 Adrian Tamas (:AdrianT) 2012-09-27 07:00:54 PDT
Unable to reproduce the issue on:

Firefox Mobile 16.0b5 / Firefox Mobile 15
Samsung Galaxy Nexus (Android 4.1.1)

Marking as verified on Firefox Mobile 16 and 15

Note You need to log in before you can comment on or make changes to this bug.