Last Comment Bug 770047 - (black-jellybean) Quick black screen flash on startup with 4.1 (Jelly Bean)
(black-jellybean)
: Quick black screen flash on startup with 4.1 (Jelly Bean)
Status: VERIFIED FIXED
[jellybean]
:
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: Trunk
: ARM Android
: -- normal (vote)
: Firefox 17
Assigned To: Kartikaya Gupta (email:kats@mozilla.com)
:
Mentors:
: 769903 775247 (view as bug list)
Depends on: 783597 780699 781676 781907
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-01 12:50 PDT by Aaron Train [:aaronmt]
Modified: 2013-12-27 14:24 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
verified
verified
verified
+


Attachments
WIP - attempt at a fix (7.50 KB, patch)
2012-07-27 09:00 PDT, Kartikaya Gupta (email:kats@mozilla.com)
no flags Details | Diff | Review
Patch (8.24 KB, patch)
2012-07-27 13:53 PDT, Kartikaya Gupta (email:kats@mozilla.com)
sriram.mozilla: review+
akeybl: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Review
WIP (38.55 KB, patch)
2012-07-28 01:23 PDT, Sriram Ramasubramanian [:sriram]
bugmail.mozilla: review-
Details | Diff | Review

Description Aaron Train [:aaronmt] 2012-07-01 12:50:37 PDT
On startup of Nightly/Aurora/Beta there is a quick black flash seen immediately after one see's about:home.

Tracking for 15?

--
Samsung Galaxy Nexus (Android 4.1)
Nightly (07/01)
Comment 1 Aaron Train [:aaronmt] 2012-07-01 12:51:27 PDT
*** Bug 769903 has been marked as a duplicate of this bug. ***
Comment 2 Rob Campbell [:rc] (:robcee) 2012-07-12 11:56:41 PDT
confirmed here. Android 4.1.1, Galaxy Nexus, Nightly 07/12.
Comment 3 Chris Lord [:cwiiis] 2012-07-12 16:37:26 PDT
+1 - It only happens on application startup, not when switching to it after it's started up. It happens whether launching with a link or launching the app directly.
Comment 4 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-18 13:12:24 PDT
*** Bug 775247 has been marked as a duplicate of this bug. ***
Comment 5 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-18 13:13:12 PDT
Updated title to make this bug easier to find
Comment 6 JP Rosevear [:jpr] 2012-07-19 06:16:12 PDT
(In reply to Chris Lord [:cwiiis] from comment #3)
> +1 - It only happens on application startup, not when switching to it after
> it's started up. It happens whether launching with a link or launching the
> app directly.

Here too obviously, doesn't affect functionality, but *super* ugly.
Comment 7 Aaron Train [:aaronmt] 2012-07-20 13:26:20 PDT
Nexus S users just got 4.1.1 [1], just got it on my phone. Confirming same issue on my Nexus S.

[1] http://www.androidcentral.com/android-411-jelly-bean-ota-now-appearing-some-nexus-s-variations
Comment 8 Sriram Ramasubramanian [:sriram] 2012-07-20 13:29:48 PDT
Hypothesis:
The background texture is specified in windowBackground -- which is responsible for showing a screen right when the application icon is tapped in launcher.
The setContentView() happens "after" Gecko initialization currently.
So, the windowBackground and the setContentView() arent working in tandem!

One possible solution:
Remove windowBackground attribute.
Loss: Faster showing of the screen when icon is tapped.
Comment 9 Joe Drew (not getting mail) 2012-07-24 11:51:14 PDT
This happens well before Gecko starts up.
Comment 10 Joe Drew (not getting mail) 2012-07-24 15:36:46 PDT
I know this happens before Gecko starts up because it happens before loadGeckoLibs runs. I can start this in gdb, and the black flash happens before I even get a command prompt, which makes me think it's in Java.

I have also entirely failed to get the Android developer tools in Eclipse up and running so that I can debug this before Firefox starts on my handset. More trying tomorrow...
Comment 11 Joe Drew (not getting mail) 2012-07-25 16:00:47 PDT
I gave up on debugging from the first launch in Eclipse, and just put a 30s delay at the beginning of GeckoApp.initialize().

I now know some things it definitely isn't, but I don't know what it is. Sriram had a theory that it's from quickly opening and closing the profile migration dialog, but I commented out that code and it didn't fix anything.

I thought this might have something to do with about:home, so I hooked my Galaxy Nexus up to an Eideticker box (thanks mdas!) and captured the output. Frequently, but not every time, about:home appears immediately after the black flash - i.e., the very next frame is about:home.

At this point, I am running up against the limits of my Android knowledge. I'm happy to learn more and continue debugging this, but I also wouldn't be offended if someone took this bug off my hands.
Comment 12 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-26 07:05:48 PDT
With absolutely no evidence to back this up, I would try disabling the GfxInfoThread code and see if that helps any. My half-baked theory is that the triple-buffering in JB tries to do something clever if it detects the app create a GL context and that causes some reinit to happen resulting in the black screen.
Comment 13 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-26 07:11:20 PDT
Actually, scratch that. The flash happens on 14 as well so it can't be because of that.
Comment 14 Joe Drew (not getting mail) 2012-07-26 15:25:01 PDT
Some other things it isn't:
 - GfxInfoThread :(
 - Flash
 - Form Assistant popup

I give up :(
Comment 15 Sriram Ramasubramanian [:sriram] 2012-07-27 00:59:56 PDT
Confirming that this black screen is from GL layer initialization.
I did a trace view and the time of flashing was just around GL layer initialization (took some 40 ms or so).
I suspected it and commented out code here: https://hg.mozilla.org/mozilla-central/file/tip/mobile/android/base/GeckoApp.java#l1827 to avoid GL layer initialization. The app started without any screen flickering on JB. (And the app crashed ofcourse :P).
Comment 16 Sriram Ramasubramanian [:sriram] 2012-07-27 01:00:22 PDT
Adding kats...
Comment 17 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-27 06:53:43 PDT
Commenting out that line kills a lot more than just GL layer initialization, it kills all of gecko startup...
Comment 18 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-27 07:39:24 PDT
I isolated the flash to this line of code:

https://hg.mozilla.org/mozilla-central/file/399aff2fb74f/mobile/android/base/GeckoApp.java#l1869

Commenting it out makes it go away. I suspect it is GL-related, and android is doing something that causes the flash when we add the SurfaceView to the layout. I can dive through the android code to try to see what's going on, but I suspect the best way to fix it will likely involve moving that line of code so that the flash happens at a less noticeable time (either earlier during startup, or when we navigate to the first non-about:home page)
Comment 19 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-27 09:00:33 PDT
Created attachment 646597 [details] [diff] [review]
WIP - attempt at a fix

I tried to fix this by moving the layerview directly into the XML layout, so that hopefully the flash happens way earlier on startup and isn't noticeable. However for some reason this patch is throwing an NPE because the findViewById(R.id.layer_view) is returning null. I have no idea why. Sriram, any thoughts?

Also note that the Log.e I added to GeckoViewsFactory exposes some other errors that should be fixed.
Comment 20 Sriram Ramasubramanian [:sriram] 2012-07-27 11:02:29 PDT
NPE is because of this:
When LayoutInflater inflates, it requires this constructor:
CustomView(Context, AttributeSet);

You have supplied this: return new LayerView(context);

Please change it to take AttributeSet. The AttributeSet passed from GeckoViewsFactory can be null -- which is not a problem. (But I guess, in your case, it's better to pass the attrs you get to the LayerView, and the LayerView can call its super with attrs, so that the layout_params are set properly).
Comment 21 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-27 13:53:44 PDT
Created attachment 646712 [details] [diff] [review]
Patch
Comment 22 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-27 13:57:16 PDT
Pushed to try at https://tbpl.mozilla.org/?tree=Try&rev=68b3ae18a8f3
Comment 23 Sriram Ramasubramanian [:sriram] 2012-07-27 14:03:37 PDT
Comment on attachment 646712 [details] [diff] [review]
Patch

Review of attachment 646712 [details] [diff] [review]:
-----------------------------------------------------------------

This is awesome! :)
I'm loving LayerView :)
Comment 24 Sriram Ramasubramanian [:sriram] 2012-07-27 23:45:28 PDT
I have some doubts:
Why should GeckoApp holds mLayerController? Why can't it hold mLayerView and get the controller from it. After this patch -- LayerController.setView() and LayerView.connect() -- I feel we can restructure it this way.

In GeckoApp.java:
LayerView mLayerView; is stored.

getLayoutController() {
  return mLayerView.getLayoutController();
}

In LayerView.java:
public LayerView() {
  LayerController = new LayerController(this);
}

In LayerController.java: 
public LayerController(LayerView view) {
   setView(view);
}

-----
Basically GeckoApp holds mLayerView instead of mLayerController -- thereby it feels like holding a view than a controller for the view.
Comment 25 Sriram Ramasubramanian [:sriram] 2012-07-27 23:46:02 PDT
On a side note: Do we need those calls to getLayoutController() in other places? Is there a way to avoid it?
Comment 26 Sriram Ramasubramanian [:sriram] 2012-07-27 23:57:00 PDT
Looking into the code again:
1. Most of the places we do: layerController.getView().doSomething(). Instead layerView.doSomething() would be better.
2. getViewportMetrics() can be at LayerView and inturn return mController.getViewportMetrics().
3. convertLayerPointToViewPoint() and its converse can be at View level.

Basically the Controller and Renderer are more like "presentation" of the LayerView and can be known only to the view. The Activity and other dependent Views can get required values from LayerView.
Comment 27 Sriram Ramasubramanian [:sriram] 2012-07-28 01:23:35 PDT
Created attachment 646825 [details] [diff] [review]
WIP

I couldn't stop myself from experimenting this :P
This places LayerController and LayerClient inside LayerView. GeckoApp knows only about LayerView (just like a SurfaceView). And required calls are through LayerView.

GeckoLayerClient initialization happens at LayerView's creation -- not sure if this is right.

There are so many checks for LayerController being null. What's the possibility of this happening -- given that initialization of LayerController happens at creation of LayerView -- which inturn happens during inflation of XML -- which is during onCreate -- which makes me feel LayerController can never be null.
Comment 28 Sriram Ramasubramanian [:sriram] 2012-07-28 01:24:06 PDT
Note: Robocop tests might need to be changed to make sure it's not dependent on getLayerClient().
Comment 29 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-28 07:13:59 PDT
These are some good ideas but they should be done in a separate bug since they are refactoring that should not be uplifted, whereas this bug will need to be uplifted.

Also I have a long series of changes in the dependency list of bug 776030 that I've been working on which changes a lot of the gfx code. I will try to do this on top of that work, it will be much easier. One of the patches merges LayerController into GeckoLayerClient, for example.
Comment 30 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-28 09:36:48 PDT
Comment on attachment 646825 [details] [diff] [review]
WIP

Minusing this for now, but I can move it to a new bug and rebase it on top of bug 777351.
Comment 31 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-30 05:37:50 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/14ed511f2d6b
Comment 32 Aaron Train [:aaronmt] 2012-07-30 10:06:54 PDT
Looks good on mozilla-inbound
Comment 33 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-30 10:22:39 PDT
Comment on attachment 646712 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: there is an unsightly black flash when starting fennec
Testing completed (on m-c, etc.): on inbound
Risk to taking this patch (and alternatives if risky): mobile only. in theory it shouldn't be too risky by itself, but these types of changes (which change the order in which things are initialized) have a history of exposing more prominently races that already exist in the code. this might result in a higher crash rate, for example. i suggest letting bake on nightly/aurora for a few days before uplifting to beta.
String or UUID changes made by this patch: none
Comment 34 Alex Keybl [:akeybl] 2012-07-30 12:20:55 PDT
Comment on attachment 646712 [details] [diff] [review]
Patch

[Triage Comment]
Approving for Aurora 16, but I don't think we need to risk any new regressions on Beta (esp. given the fact that 14 was affected).
Comment 35 Sriram Ramasubramanian [:sriram] 2012-07-30 12:39:06 PDT
Giving that there are many jelly bean devices now -- and even Galaxy Nexus S got it! :O -- this is going to affect a lot of users. Why don't we want this for beta? The release is one month away, if I'm right.
Comment 36 Alex Keybl [:akeybl] 2012-07-30 16:36:57 PDT
(In reply to Sriram Ramasubramanian [:sriram] from comment #35)
> Giving that there are many jelly bean devices now -- and even Galaxy Nexus S
> got it! :O -- this is going to affect a lot of users. Why don't we want this
> for beta? The release is one month away, if I'm right.

It's the tradeoff between a black Flash and the possibility for bad race conditions that possibly aren't caught by the crash reporter. We'd need somebody from product to weigh in on the necessity of this fix.
Comment 37 Alex Keybl [:akeybl] 2012-07-30 16:38:40 PDT
(this feels cosmetic)
Comment 38 Asa Dotzler [:asa] 2012-07-30 16:44:47 PDT
This is the first thing I noticed on upgrading to Jelly Bean and it's super-ugly. It's viscerally bad. 

I don't know what our testing coverage is likely to catch or not catch so it's hard to make a call here. Can we get it into _a_ beta and pull it from the final beta if we're seeing a spike in crashes?

Also, is the risk as limited to Jelly Bean as the reward is?
Comment 39 Matt Brubeck (:mbrubeck) 2012-07-30 16:51:08 PDT
I think this is not purely cosmetic but is also a perceived-performance issue, since our startup no longer feels instant and seamless.
Comment 40 Sriram Ramasubramanian [:sriram] 2012-07-30 16:52:55 PDT
The change that happened here is that the View in inflated in XML (like all other views) and then accessed in Java -- instead of creating in java and inflating into exisiting view.

The "creating in java" caused the black screen (of not death) in Jelly Bean. Now that it's in XML, we won't have any issues.
Comment 41 Mark Finkle (:mfinkle) (use needinfo?) 2012-07-30 21:19:07 PDT
I'd like to see this patch in the next beta. That means we can observe Nightly and Aurora for a few days.

Matt is right about this being a perceived-performance issue and that is the _only_ reason I'd consider it for Fx15. I don't care about the super-ugly factor. We would just let it ride the trains if that was it.

Also, the patch is not too risky. Let's just watch Nightly and Aurora for any issues.
Comment 42 Asa Dotzler [:asa] 2012-07-30 22:49:27 PDT
(In reply to Mark Finkle (:mfinkle) from comment #41)
> Matt is right about this being a perceived-performance issue and that is the
> _only_ reason I'd consider it for Fx15. I don't care about the super-ugly
> factor. We would just let it ride the trains if that was it.

The perceived-performance issue is the super-ugly factor. If feels (and is) a broken start-up experience. That's just ugly.
Comment 43 Ed Morley [:emorley] 2012-07-31 06:13:36 PDT
https://hg.mozilla.org/mozilla-central/rev/14ed511f2d6b
Comment 44 Kartikaya Gupta (email:kats@mozilla.com) 2012-07-31 07:21:56 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/4c7fd2c69d1a
Comment 45 Alex Keybl [:akeybl] 2012-08-01 10:07:53 PDT
(In reply to Mark Finkle (:mfinkle) from comment #41)
> Also, the patch is not too risky. Let's just watch Nightly and Aurora for
> any issues.

OK - risk of startup regressions (without feedback from beta users) is what I'm most worried about. Sounds like this may be worth the risk, and we think we'll catch most new regressions through crash reporter.

We'll leave this nominated for beta and check back before the next build.
Comment 46 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-03 11:30:28 PDT
Comment on attachment 646712 [details] [diff] [review]
Patch

Let's get it into beta 4 and see what happens, with time to backout if needed.
Comment 47 Karen Rudnitski [:kar] 2012-08-05 11:27:43 PDT
Jumping into this thread now, for what it's worth. I'd really like to see this make it into the next beta so there's time to pull it if crash rates spike as a result. Whether it feels cosmetic or not, it's a very bad user experience and taints overall perception of the browser upon startup, not a place we want to be. Jelly Bean has been getting lots of press and upgrades are happening all around, so we want to do what we can to get the best experience we can.
Comment 48 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-06 20:06:18 PDT
This is approved for mozilla-beta and needs to be landed before EOD Pacific tomorrow to make it into the next beta.  Please land asap.
Comment 49 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-07 07:19:49 PDT
Sorry for the delay, I was gone for the long weekend. Landed on beta:

https://hg.mozilla.org/releases/mozilla-beta/rev/b40643c774f6
Comment 50 Kartikaya Gupta (email:kats@mozilla.com) 2012-08-08 08:12:44 PDT
Please note that this patch caused bug 769269 to be un-fixed by exposing an error in the code. Bug 780699 is tracking the regression and I have a patch that seems to work, but may need to be tweaked and tested more.
Comment 51 Geoff Brown [:gbrown] 2012-08-10 13:11:57 PDT
A Talos Ts regression on Beta was noted this week and blamed on this patch: see bug 781907.

Note You need to log in before you can comment on or make changes to this bug.