Closed Bug 1494748 Opened Last year Closed 11 months ago

Content area frozen until force-kill and restart

Categories

(Firefox for Android :: General, defect, P1, major)

Firefox 64
All
Android
defect

Tracking

()

VERIFIED FIXED
Firefox 66
Tracking Status
geckoview64 --- wontfix
firefox62 - wontfix
firefox63 + wontfix
firefox64 + wontfix
firefox65 + verified
firefox66 + verified

People

(Reporter: dietrich, Assigned: JanH)

References

Details

(Keywords: regression, Whiteboard: [priority:high][geckoview:p1])

Attachments

(3 files, 2 obsolete files)

Nightly, Sept 25, Pixel 2

For a couple of weeks I've been seeing the following a few times per day:

1. open Nightly by touching the icon on the homescreen

2. notice content area is a white screen, even though last url is in the url bar

3. type in search terms or url and hit enter, or try the reload button

Expected: page loads

Actual: nothing happens, and the browser doesn't work until i force quit and start it again.

The app chrome is responsive, but content area isn't.
Interesting we will do some further investigation into this to see if it's caused by anything in particular.

If you could help by answering a few questions:
1. Is this reproducible with certain pages only or is the webpage irrelevant? If it's reproducible with certain pages only could you please provide a link to one of them?
2. When opening nightly are you starting it from the background or as a new instance(app not running in the background)?
3. Do you have a large number of tabs opened beside the initial page that seems to not work?
Flags: needinfo?(dietrich)
OS: Unspecified → Android
Hardware: Unspecified → ARM64
Version: unspecified → Firefox 64
Adding a NI on Mira as a reminder to investigate this further during the next week.
Flags: needinfo?(mirabela.lobontiu)

> 1. Is this reproducible with certain pages only or is the webpage
> irrelevant? If it's reproducible with certain pages only could you please
> provide a link to one of them?

Any page, doesn't matter which

> 2. When opening nightly are you starting it from the background or as a new
> instance(app not running in the background)?

By my recollection, this only happens when I open the app from the homescreen icon.

However, Focus is my default browser, so it could just be that the way I use browsers means I'm not seeing the bug when opening links from other apps - because they open in Focus.

I do open from homescreen enough that I'm 99.99% sure it is already running in background, and not opening from a cold start, when this bug occurs.

> 3. Do you have a large number of tabs opened beside the initial page that
> seems to not work?

Usually have 7 - 15 tabs open. When the browser is in this state, none of them work. The reload button is disabled, the url bar contents can be modified but has no effect, etc.
Flags: needinfo?(dietrich)
Attached file logcat
Hi Dietrich,

I was able to reproduce the issue today, with Nightly 64.0a1 (2018-09-30) on Google Pixel (Android 9).

STR, not 100% reproducible:
 1. From Gmail app tap on link https://daleharvey.github.io/testapp/ and perform the steps, in Custom tab.
 2. Complete the "Username" field.

Expected results:
 The fields are completed as asked.

Actual results:
 Nightly crashed. After I restarted Nightly, the screen was white. I could see the last URL in the URL bar, but nothing worked. I had to force quit.

Note:
 I could reproduce only once the issue.
 I managed to obtain a logcat (attached).
Flags: needinfo?(mirabela.lobontiu)
Duplicate of this bug: 1495837
(In reply to Bogdan Surd, QA [:BogdanS, PTO 1st Oct > 5th Oct] from comment #1)
> Interesting we will do some further investigation into this to see if it's
> caused by anything in particular.
> 
> If you could help by answering a few questions:
> 1. Is this reproducible with certain pages only or is the webpage
> irrelevant? If it's reproducible with certain pages only could you please
> provide a link to one of them?
> 2. When opening nightly are you starting it from the background or as a new
> instance(app not running in the background)?
> 3. Do you have a large number of tabs opened beside the initial page that
> seems to not work?

I can reproduce this in release 62 (see about:support text in Bug 1495837) reliably. Basically every day I have to close all tabs and reload everything manually. Pixel 2, started with 62 release.

Open new tabs renders properly, this seems to have something to do with painting of older tabs. Not sure what we do with tabs when the app falls into the background. The bug appears to be in that area of the code.
[Tracking Requested - why for this release]:
Major regression for latest Google Pixel 2 devices in 62.
Susheel can you have someone look into this?
Flags: needinfo?(sdaswani)
Hi all,

I have retested, leaving Nightly in background overnight, and I was not able to reproduce the issue.

Tested with:
Browser / Version: Firefox Release 62.0.3
Operating System: Google Pixel (Android 9)

Thanks!
QA Contact: sdaswani
Flags: needinfo?(sdaswani)
Whiteboard: --do_not_change--[priority:high]
Keywords: regression
Adam, have you seen this on your Pixel 2?
Flags: needinfo?(astevenson)
Nope, I haven't noticed any issues with release or Nightly.
Flags: needinfo?(astevenson)
This is wontfix for 62 at this stage.
Mirabela does SV not have a pixel 2?
Flags: needinfo?(mirabela.lobontiu)
We do have. Will check
Flags: needinfo?(mirabela.lobontiu) → needinfo?(ioana.chiorean)
I'm still seeing this every day, multiple times. Happy to do debugging of any kind. Can try to catch something in logcat, whatever you recommend.
I am not reproducing this at this point with regular browsing behavior (as mentioned by Dietrich) on Pixel 2 with Android 9 with Nightly 64.0b1 2018.10.11. I do remember seeing it once or twice per quarter but with heavy profiles after visiting pages with many plugins and after a quit and reopen would work perfectly. 
I will take the device home and try the steps from Bug 1495837) with both the release and nightly

Dietrich - for sure any logs you might get will be useful. Also - not to break your privacy - are the pages u visited heavily in plugings/videos content?  or just news page ( I know it might sound funny the question but curios if they affects this) 
Another question - if you pan down the page - will show the content of the page behind under the url in an curve?  Thanks!
Wontfix for 63 as we are in RC week and we are still at the investigation/reproduction phase for this regression.
This is something that I would consider blocking or taking in a dot release for if we figure it out. Breaking the browser until the user reboots or force kills Firefox is a showstopper.
Flags: needinfo?(pascalc)
Kevin, I am going to mark it as fix-optional for 63 so as to keep it on the radar for a potential dot release, but given that there is no patch so far, no assignee, that we already shipped 62 with this regression, that the bug is hard to reproduce and that we ship 63 in less than a week, I don't consider it as a bug blocking the release. I would of course consider a patch for rc2 if it would land today or tomorrow.

Susheel, do you have a status update on this bug?
Flags: needinfo?(pascalc) → needinfo?(sdaswani)
pascal i don't think the softvision devs have even picked it up yet. i'll ask Andrei to look at it ASAP.
Flags: needinfo?(sdaswani) → needinfo?(andrei.a.lazar)
Assignee: nobody → andrei.a.lazar
Flags: needinfo?(andrei.a.lazar)
See Also: → 1496684
Duplicate of this bug: 1502178
I also get this on my Essential PH-1.
This is with Android P.

It's being suggested that not killing the app on swipe-to-kill is due to Location Services ("stumble") being enabled, which seems plausible.
I'll turn off Location Services and see if I get the content area freezing issue tomorrow morning.

FWIW, *new* content tabs seem to work, but reloading the frozen tab doesn't. Loading the frozen tab's url in a new tab seems to work.

Interesting:
1. Switch to a working tab, it shows the page
2. Switch to a frozen tab, it shows as white
3. Home button to minimize the browser, works
4. Open Fennec by clicking on its icon, url bar displays as for the current (frozen) tab, but content paints the last working tab that was rendered!
You will still get the content freezing. Disabling location services will allow swipe closing Firefox, instead of needing to go into the app settings and choosing to stop the Firefox process. I believe the new always present notification for location services causes the main problem of the failure to draw the content to be magnified.
See Also: → 1503739
Hi Pascal and David, we think this issues is on the Gecko side. Petru said:
"We’ve tried to link the most important ones in https://bugzilla.mozilla.org/show_bug.cgi?id=1496684 where I’ve pinged :snorp and now it seems that this issue is investigated on Gecko side.
The `See also` section for it contains
https://bugzilla.mozilla.org/show_bug.cgi?id=1494748
https://bugzilla.mozilla.org/show_bug.cgi?id=1497963

https://bugzilla.mozilla.org/show_bug.cgi?id=1500925

 

Other tickets for probably the same issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1495837

https://bugzilla.mozilla.org/show_bug.cgi?id=1502178


Similar issue observed on Focus:
https://github.com/mozilla-mobile/focus-android/issues/3679"

Unassigning from Andrei.
Assignee: andrei.a.lazar → nobody
Dylan, any intuition on whether this is related to bug 1496684?
Flags: needinfo?(droeh)
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #26)
> Dylan, any intuition on whether this is related to bug 1496684?

Judging from the comments, I think this is the same bug as described in comment 13 of bug 1496684, yes (note: there might be two separate issues being discussed in that bug at the moment, trying to figure that out right now). Essentially if Fennec gets killed while a custom tab is open, re-launching Fennec puts it in a bad state that's exactly what is described here: a white screen and no working interaction on all existing tabs, but opening new tabs seems to work fine.
Flags: needinfo?(droeh)
Priority: -- → P1
Whiteboard: --do_not_change--[priority:high] → [geckoview:klar] [priority:high]
Duplicate of this bug: 1505447
I can replicate this issue.
Phone: Lg G6
Android version: 8.0.0
Android security patch level: September 1, 2018
Kernel version: 3.18.71
Play Services: 14.3.66
Can reproduce issue on: Firefox android 64, Firefox android 63
(In reply to Kevin Brosnan [:kbrosnan] from comment #24)
> You will still get the content freezing. Disabling location services will
> allow swipe closing Firefox, instead of needing to go into the app settings
> and choosing to stop the Firefox process. I believe the new always present
> notification for location services causes the main problem of the failure to
> draw the content to be magnified.

So interestingly I don't believe I've seen content freezing since disabling Locations Services. I suspect that Fennec is being fully killed by the memory reclaimer now, so it (re)starts cleanly now when selected, instead of exhibiting content freezing.

(needinfo for fyi)
Flags: needinfo?(kbrosnan)
Flags: needinfo?(kbrosnan)
Removing [geckoview:klar] whiteboard tag because Focus does not seem to be affected. Focus doesn't use Fennec's Location Services and the Focus issue mentioned in comment 25 had a different cause (and has been fixed).
Hardware: ARM64 → All
Whiteboard: [geckoview:klar] [priority:high] → [priority:high]
(In reply to Jeff Gilbert [:jgilbert] from comment #30)
> (In reply to Kevin Brosnan [:kbrosnan] from comment #24)
> > You will still get the content freezing. Disabling location services will
> > allow swipe closing Firefox, instead of needing to go into the app settings
> > and choosing to stop the Firefox process. I believe the new always present
> > notification for location services causes the main problem of the failure to
> > draw the content to be magnified.
> 
> So interestingly I don't believe I've seen content freezing since disabling
> Locations Services. I suspect that Fennec is being fully killed by the
> memory reclaimer now, so it (re)starts cleanly now when selected, instead of
> exhibiting content freezing.
> 
> (needinfo for fyi)

I turned off location services a few days ago to test, can confirm same - haven't seen this bug since.
(In reply to Dylan Roeh (:droeh) from comment #27)
> (In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment
> #26)
> > Dylan, any intuition on whether this is related to bug 1496684?
> 
> Judging from the comments, I think this is the same bug as described in
> comment 13 of bug 1496684, yes (note: there might be two separate issues
> being discussed in that bug at the moment, trying to figure that out right
> now). Essentially if Fennec gets killed while a custom tab is open,
> re-launching Fennec puts it in a bad state that's exactly what is described
> here: a white screen and no working interaction on all existing tabs, but
> opening new tabs seems to work fine.

Taking the custom tabs case as an example, what I think happens is that swiping the main browser UI activity (BrowserApp) away completely destroys it and also clears its saved instance state. However because the Recents list contains another task for the app (the task with the custom tab activity), Android [1] doesn't kill the process, so Gecko keeps running and all app-wide singletons remain alive, too, which explains why the tabs in the UI reappear when Firefox is launched again - the Tabs manager is such a singleton.

Now as far as I can tell, BrowserApp/GeckoApp itself doesn't store very much as part of the saved instance state (and in any case nothing that looks vital), however it might be possible that some crucial bit of GeckoView state is lost and not correctly recreated when the activity starts afresh, but with Gecko already running.


[1] Ignoring any manufacturers who might have customised the behaviour here in weird ways.
(In reply to Jan Henning [:JanH] from comment #33)
> (In reply to Dylan Roeh (:droeh) from comment #27)
> > (In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment
> > #26)
> > > Dylan, any intuition on whether this is related to bug 1496684?
> > 
> > Judging from the comments, I think this is the same bug as described in
> > comment 13 of bug 1496684, yes (note: there might be two separate issues
> > being discussed in that bug at the moment, trying to figure that out right
> > now). Essentially if Fennec gets killed while a custom tab is open,
> > re-launching Fennec puts it in a bad state that's exactly what is described
> > here: a white screen and no working interaction on all existing tabs, but
> > opening new tabs seems to work fine.
> 
> Taking the custom tabs case as an example, what I think happens is that
> swiping the main browser UI activity (BrowserApp) away completely destroys
> it and also clears its saved instance state. However because the Recents
> list contains another task for the app (the task with the custom tab
> activity), Android [1] doesn't kill the process, so Gecko keeps running and
> all app-wide singletons remain alive, too, which explains why the tabs in
> the UI reappear when Firefox is launched again - the Tabs manager is such a
> singleton.
> 
> Now as far as I can tell, BrowserApp/GeckoApp itself doesn't store very much
> as part of the saved instance state (and in any case nothing that looks
> vital), however it might be possible that some crucial bit of GeckoView
> state is lost and not correctly recreated when the activity starts afresh,
> but with Gecko already running.
> 
> 
> [1] Ignoring any manufacturers who might have customised the behaviour here
> in weird ways.

Yeah, I'm pretty much in agreement with all of this -- anything that keeps the Fennec process alive after swipe-to-kill will have the same (broken) result, and any time the Fennec process is killed we recover without issue.
I'm not familiar with how exactly GeckoView works in that regard, but I think the problem is that with the saved instance state missing, GeckoApp has its GeckoView create a new browser <window> instead of reconnecting to the existing one containing all the previous tabs.
Duplicate of this bug: 1506165
See Also: → 1506357
Per discussion with snorp, I think the best approach here is just to make sure the process is killed when a user swipes-to-kill Fennec; that should guarantee we can recover correctly the next time the user opens Fennec (or a custom tab or PWA).
Assignee: nobody → droeh
Attachment #9024494 - Flags: review?(snorp)
Comment on attachment 9024494 [details] [diff] [review]
Kill Fennec process when Fennec is swiped-to-kill

Review of attachment 9024494 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not objecting to the idea itself of killing the process after swiping away *any* associated task (instead of the default Android behaviour of only killing after the *last* task has been removed), however the implementation here seems rather unfortunate:

We've only just got rid of the MediaControlService running all the time in bug 1416169, because having a service running continuously (even a service that is not a foreground service) means that Android keeps us unnecessarily alive while in background under memory pressure, which amongst other things is unfair to other apps, but now this patch proposed to reintroduce just such a service.

Is there no possibility of storing another reference to the required data for GeckoView somewhere off the Application object, so we can fall back to that if GeckoApp is launched into an already running process (which had already hosted GeckoApp before - the sequence of custom tab starting first and GeckoApp starting second is already working fine today), but the normal saved instance state has been lost?

::: mobile/android/base/java/org/mozilla/gecko/FennecKiller.java
@@ +10,5 @@
> +import android.os.IBinder;
> +import android.os.Process;
> +import android.util.Log;
> +
> +public class FennecKiller extends Service {

You're not overriding onStartCommand to return START_NOT_STICKY, so the service will use the default behaviour of starting as a sticky service. Doesn't this mean that Android might want to try re-starting the service again after it has been killed?
Since bug 1496684 is marked as a blocking issue for 64 release, tracking this as a blocker also.
Severity: normal → major
I think that trying to store required data for the GV -- assuming that's sufficient -- would entail adding GV API that we don't need or want, which I'd rather avoid. 

Thanks for the feedback on onStartCommand, though; I thought START_NOT_STICKY was the default behavior, I'll address that along with whatever snorp brings up in review.
Dylan, does this bug affect GV or just Fennec?
Flags: needinfo?(droeh)
Whiteboard: [priority:high] → [priority:high][geckoview]
(In reply to Chris Peterson [:cpeterson] from comment #41)
> Dylan, does this bug affect GV or just Fennec?

As far as I can tell just Fennec; I've heard occasional reports of similar behavior in Focus, but nothing I've been able to reproduce personally, and the causes I suspect here are very Fennec-specific.
Flags: needinfo?(droeh)
(In reply to Jan Henning [:JanH] from comment #38)
> 
> Is there no possibility of storing another reference to the required data
> for GeckoView somewhere off the Application object, so we can fall back to
> that if GeckoApp is launched into an already running process (which had
> already hosted GeckoApp before - the sequence of custom tab starting first
> and GeckoApp starting second is already working fine today), but the normal
> saved instance state has been lost?
> 

You mean, can we hook up to the existing window that Fennec uses, eliminating the whole process killing stuff? It might be possible, but IIRC, the biggest problem here is the various init messages that the Java frontend expects. We'd need to add some sort of reinitialization stuff in browser.js, I'd think.
Comment on attachment 9024494 [details] [diff] [review]
Kill Fennec process when Fennec is swiped-to-kill

Review of attachment 9024494 [details] [diff] [review]:
-----------------------------------------------------------------

While this is certainly not optimal as Jan has said, I don't think we have a better solution right now. r+ with the START_NOT_STICKY change.

::: mobile/android/base/java/org/mozilla/gecko/FennecKiller.java
@@ +10,5 @@
> +import android.os.IBinder;
> +import android.os.Process;
> +import android.util.Log;
> +
> +public class FennecKiller extends Service {

Yeah, we should do this.

@@ +22,5 @@
> +    }
> +
> +    @Override
> +    public void onTaskRemoved(Intent intent) {
> +        if (intent != null && intent.getComponent() != null && 

whitespace
Attachment #9024494 - Flags: review?(snorp) → review+
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #43)
> (In reply to Jan Henning [:JanH] from comment #38)
> > 
> > Is there no possibility of storing another reference to the required data
> > for GeckoView somewhere off the Application object, so we can fall back to
> > that if GeckoApp is launched into an already running process (which had
> > already hosted GeckoApp before - the sequence of custom tab starting first
> > and GeckoApp starting second is already working fine today), but the normal
> > saved instance state has been lost?
> > 
> 
> You mean, can we hook up to the existing window that Fennec uses,
> eliminating the whole process killing stuff? It might be possible, but IIRC,
> the biggest problem here is the various init messages that the Java frontend
> expects. We'd need to add some sort of reinitialization stuff in browser.js,
> I'd think.

We're already in a similar situation if BrowserApp/GeckoApp has been destroyed with the saved instance state still intact [1], aren't we? As I already mentioned, BrowserApp/GeckoApp themselves store very little as part of the saved instance state and I believe that therefore the main bit that is lost is GeckoView's memory that there already is an existing window it could connect to.
So I think that if we could get Fennec's GeckoView to connect to the already open window even without the saved instance state available, things should probably just work.

[1] I.e. with "Don't keep activities" enabled, respectively after an background OOM kill.
Wontfix for 63 as we are unlikely to have another dot release before 64 ships in 3 weeks from now and we would need the patch to land and be uplifted to beta first.
Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/555521c70c0f
Add a service to kill the Fennec process when the user swipes-to-kill Fennec. r=snorp
https://hg.mozilla.org/mozilla-central/rev/555521c70c0f
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → Firefox 65
Depends on: 1509342
Sorina is this something you or Ioana can help verify? Thanks!
Flags: qe-verify+
Flags: needinfo?(sorina.florean)
Hi Liz,

 I was able to reproduce this on Google Pixel (Android 9) and the latest Beta build following the steps:
1. Enable "Mozilla Location Service" and "Always restore";
2. Search for cnn.com and wait for the page to load; 
3. Close Firefox Beta and open again.

The issue is not reproducible on the latest Nightly (65.0a1 - 2018/11/28), on the same device, following the same steps. Cnn.com is loaded and the content is visible, also reloading the page works. I will mark 65:Verified.
Flags: qe-verify+
Flags: needinfo?(sorina.florean)
Flags: needinfo?(ioana.chiorean)
Dylan, is this ready for uplift (presumably with 1509342), or do you think this is risky and we should better wait until this has baked in nightly and consider it for a 64 dot release instead?
Flags: needinfo?(droeh)
(In reply to Julien Cristau [:jcristau] from comment #51)
> Dylan, is this ready for uplift (presumably with 1509342), or do you think
> this is risky and we should better wait until this has baked in nightly and
> consider it for a 64 dot release instead?

Since we've already seen one crash with 1509342's patch applied, I think it's better to hold back at the moment.
Flags: needinfo?(droeh)
Thanks Dylan.  I'm assuming the same is true for bug 1496684, and the hope was that this patch here would fix that issue too?  Or should we treat them separately?
Flags: needinfo?(droeh)
(In reply to Julien Cristau [:jcristau] from comment #53)
> Thanks Dylan.  I'm assuming the same is true for bug 1496684, and the hope
> was that this patch here would fix that issue too?  Or should we treat them
> separately?

It's likely that there will be one fix for both this and bug 1496684 as they are different manifestations of the same underlying issue, but I'd like to keep them separate until I know for certain.
Flags: needinfo?(droeh)
Was backed out in bug 1510587:
https://hg.mozilla.org/mozilla-central/rev/e9a34518f078
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'll keep tracking this for 64, just not as a release blocker.
So, further research on this shows it isn't actually just a failure due to missing Activity/View lifecycle callbacks -- even in cases when we get them (onDestroy for the Activity, onSaveInstanceState for the View), we still fail in the same way when reopening Fennec.

A simple fix here is killing the process in BrowserApp.onDestroy(), with a fallback in onCreate() that kills and restarts the process if we detect we've been created twice without an onDestroy() call in between. Thoughts, snorp?
Attachment #9024494 - Attachment is obsolete: true
Attachment #9029713 - Flags: feedback?(snorp)
Comment on attachment 9029713 [details] [diff] [review]
Don't recreate BrowserApp without killing the Fennec process

Review of attachment 9029713 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Dylan Roeh (:droeh) from comment #57)
> So, further research on this shows it isn't actually just a failure due to
> missing Activity/View lifecycle callbacks -- even in cases when we get them
> (onDestroy for the Activity, onSaveInstanceState for the View), we still
> fail in the same way when reopening Fennec.

At the risk of sounding like a broken record, the problem happens when Android clears the savedInstanceState (because the user swipes the activity away from the task switcher), but doesn't kill the app process because there's still a service running (the stumbler), respectively other activities of ours (like custom tabs or PWAs) are still present in the task switcher.
Hence all of our singletons (including most importantly the Tabs manager) are still alive, as is Gecko with the window containing all of our tabs.
So in the case of BrowserApp for example, the next time it starts it finds the Gecko is already running and Tabs still has a number of tabs open, but because the savedInstanceState has been lost, its GeckoView instance doesn't know how to reconnect to the already open Gecko window and instead opens a new one, which makes all previous tabs inaccessible and presumably also confuses the EventDispatcher, which is why context menu are broken as well.

> A simple fix here is killing the process in BrowserApp.onDestroy(), with a
> fallback in onCreate() that kills and restarts the process if we detect
> we've been created twice without an onDestroy() call in between. Thoughts,
> snorp?

The only big drawback I can think of is that this will make Firefox rather unusable with "Don't keep activities" enabled, because the app will be killed each time BrowserApp is backgrounded - and most importantly, this is also what happens when opening our settings. On the other hand "Don't keep activities" *is* a developer option, so hopefully not too many people have activated it without knowing what they're doing. If we can discount that scenario, then this might indeed be acceptable as a temporary workaround.

::: mobile/android/base/java/org/mozilla/gecko/BrowserApp.java
@@ +618,5 @@
>      public void onCreate(Bundle savedInstanceState) {
>          final Context appContext = getApplicationContext();
>  
> +        if (sIsCreated) {
> +            GeckoApplication.shutdown(new Intent(Intent.ACTION_MAIN, null, appContext, getClass()));

In case you weren't planning to do this anyway, I think the production version of this patch should preserve the original launch intent instead of throwing it away.
(In reply to Jan Henning [:JanH] from comment #58)
> Comment on attachment 9029713 [details] [diff] [review]
> Don't recreate BrowserApp without killing the Fennec process
> 
> Review of attachment 9029713 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to Dylan Roeh (:droeh) from comment #57)
> > So, further research on this shows it isn't actually just a failure due to
> > missing Activity/View lifecycle callbacks -- even in cases when we get them
> > (onDestroy for the Activity, onSaveInstanceState for the View), we still
> > fail in the same way when reopening Fennec.
> 
> At the risk of sounding like a broken record, the problem happens when
> Android clears the savedInstanceState (because the user swipes the activity
> away from the task switcher), but doesn't kill the app process because
> there's still a service running (the stumbler), respectively other
> activities of ours (like custom tabs or PWAs) are still present in the task
> switcher.
> Hence all of our singletons (including most importantly the Tabs manager)
> are still alive, as is Gecko with the window containing all of our tabs.
> So in the case of BrowserApp for example, the next time it starts it finds
> the Gecko is already running and Tabs still has a number of tabs open, but
> because the savedInstanceState has been lost, its GeckoView instance doesn't
> know how to reconnect to the already open Gecko window and instead opens a
> new one, which makes all previous tabs inaccessible and presumably also
> confuses the EventDispatcher, which is why context menu are broken as well.

Thanks, I had actually misunderstood what was going on here -- I thought it simply boiled down to missing some lifecycle callbacks.

> > A simple fix here is killing the process in BrowserApp.onDestroy(), with a
> > fallback in onCreate() that kills and restarts the process if we detect
> > we've been created twice without an onDestroy() call in between. Thoughts,
> > snorp?
> 
> The only big drawback I can think of is that this will make Firefox rather
> unusable with "Don't keep activities" enabled, because the app will be
> killed each time BrowserApp is backgrounded - and most importantly, this is
> also what happens when opening our settings. On the other hand "Don't keep
> activities" *is* a developer option, so hopefully not too many people have
> activated it without knowing what they're doing. If we can discount that
> scenario, then this might indeed be acceptable as a temporary workaround.

Agreed.

> ::: mobile/android/base/java/org/mozilla/gecko/BrowserApp.java
> @@ +618,5 @@
> >      public void onCreate(Bundle savedInstanceState) {
> >          final Context appContext = getApplicationContext();
> >  
> > +        if (sIsCreated) {
> > +            GeckoApplication.shutdown(new Intent(Intent.ACTION_MAIN, null, appContext, getClass()));
> 
> In case you weren't planning to do this anyway, I think the production
> version of this patch should preserve the original launch intent instead of
> throwing it away.

That's a good point, thanks.
Duplicate of this bug: 1512850
Depends on: 1496684
See Also: 1496684
Per discussion with snorp on slack, we're going to see if stopping the stumbler service when Fennec is killed will reduce the frequency of this bug so that we can work on a better solution than the patch I currently have up.
Attachment #9030818 - Flags: review?(snorp)
Comment on attachment 9030818 [details] [diff] [review]
Stop stumbler service when Fennec is killed

Review of attachment 9030818 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, you can just do 'android:stopWithTask="true"' in the service xml.
Attachment #9030818 - Flags: review?(snorp) → review-
Comment on attachment 9030818 [details] [diff] [review]
Stop stumbler service when Fennec is killed

Review of attachment 9030818 [details] [diff] [review]:
-----------------------------------------------------------------

Dylan explained that with this method we don't end up killing the stumbler when we're in a custom tab or something, which I guess seems reasonable.
Attachment #9030818 - Flags: review- → review+
Whiteboard: [priority:high][geckoview] → [priority:high][geckoview:p1]
Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f32605134927
Stop stumbler service when Fennec is killed. r=snorp
Flagging as leave-open because the above patch only mitigates the frequency rather than fixing it.
Keywords: leave-open
gv64=unaffected because this issue shouldn't affect GV in Focus. Whether we want to uplift to a Fennec 64.0.x dot release, I don't know.
Duplicate of this bug: 1514575
Duplicate of this bug: 1514586
Having same bug and it's best recorded here, in this bug duplicate: https://bug1514586.bmoattachments.org/attachment.cgi?id=9032089

It was introduces in FF for Android 63, at same time geolocation nagging was introduced and longpress on links stopped working. FF for Android was by far worse Mozilla release and almost forced me to stop using damn Firefox.

You can get around this issue by closing tab that won't load any new pages and open new tab.

Samsung S9
Android 8.0.0
Firefox for Android 63 and 64.
Duplicate of this bug: 1515583
I experience this opening nytimes.com/video and a new tab (Nightly Home) 

Without Location Services, Custom Tabs, PWA's or Session Restore.

See also: Bug 1510340
(In reply to Bogdan Tara[:bogdan_tara] from comment #67)
> https://hg.mozilla.org/mozilla-central/rev/f32605134927

Should we take this on Beta to at least help with the frequency on 65?
Flags: needinfo?(droeh)
Target Milestone: Firefox 65 → ---
(In reply to Ryan VanderMeulen [:RyanVM] from comment #74)
> (In reply to Bogdan Tara[:bogdan_tara] from comment #67)
> > https://hg.mozilla.org/mozilla-central/rev/f32605134927
> 
> Should we take this on Beta to at least help with the frequency on 65?

Yeah, I think this is safe to uplift.
Flags: needinfo?(droeh)
Comment on attachment 9030818 [details] [diff] [review]
Stop stumbler service when Fennec is killed

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: This patch mitigates a bug that puts Fennec in a bad state; declining will allow that to continue happening at the same frequency.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: No

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Just ensures the stumbler service stops when Fennec is swipe-to-kill'ed.

String changes made/needed:
Attachment #9030818 - Flags: approval-mozilla-beta?
Comment on attachment 9030818 [details] [diff] [review]
Stop stumbler service when Fennec is killed

[Triage Comment]
Stops the stumbler service when Fennec is swipe-killed. It's a band aid to this bug, but it should at least help reduce the frequency. Approved for 65.0b8.
Attachment #9030818 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 9030818 [details] [diff] [review]
Stop stumbler service when Fennec is killed

Uplifted to Beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/8230dc950f98

Clearing the approval so the bug won't show up on the "approved, needs uplift" queries since we can't really call Fx65 fixed by this patch.
Attachment #9030818 - Flags: approval-mozilla-beta+
Duplicate of this bug: 1517161

Dylan, what's left on this bug?

Flags: needinfo?(droeh)

(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #80)

Dylan, what's left on this bug?

Talked on slack; since this is Fennec-specific and unlikely to benefit GV, I'm unassigning myself.

Assignee: droeh → nobody
Flags: needinfo?(droeh)
Assignee: nobody → jh+bugzilla
The arguably most interesting bit of state of BrowserApp/GeckoApp, namely the
currently open tabs, are living partly in Gecko and partly in the Tabs
manager singleton, the lifetimes of both of which are tied to the lifetime of
the app process.

If GeckoApp is therefore being restored into an app process in which it had
already executed earlier on, meaning that we have some open tabs, it relies on
the savedInstanceState in order to correctly reconnect its GeckoView instance
with the correct previous GeckoSession.

We can however end up in a state where we don't have a savedInstanceState (e.g.
because the user swiped away the BrowserApp activity in the task switcher), but
the app process keeps running throughout (if another activity of ours is still
present in the task switcher, e.g. a custom tab, or else if a service is active,
then standard Android keeps the process running even if the user swipes away an
activity).

Currently, this means if GeckoApp is subsequently recreated, the Android UI sees
all the Android-side tabs in the Tabs manager, and Gecko in fact still has the
Window open that is containing all those tabs, but without the savedInstance-
State GeckoApp doesn't know anything about that Window and proceeds to open a
fresh session instead.

This means that all previous tabs will appear unresponsive, while freshly opened
tabs will load, but they won't be correctly saved in the session store, their
context menu isn't working, etc., because we're not really expecting to handle
multiple Gecko-side Windows.

To fix this, we disable automatic state-saving for GeckoApp's GeckoView instance
and instead do it manually, so we can keep another reference to the saved state
in GeckoApplication, and therefore be able to retrieve it as long as the app
process keeps running.

Tested with
a) "Don't keep activities" enabled. In that case opening Fennec, going into our Settings and back again correctly restores GeckoApp with a savedInstanceState present.
b) "Don't keep activities" disabled. Open Fennec, open a custom tab, then swipe Fennec itself away. If your OEM didn't do any customisation in this area, this should keep the app process itself running - verify by switching to the custom tab and noting that it immediately appears and doesn't reload. Then relaunch Fennec - the contents of the current tab should reappear 1.) at all and 2.) without any signs of reloading. The context menu should work.
c) Swipe away any activities belonging to Fennec, then launch it again. Put it into the background and execute adb shell am kill <package name> to simulate an OOM-kill (Edit: Not sure how that might behave if the app actually has a service running, e.g. Nightly the Stumbler service. With a local build, which doesn't run the stumbler by default, it does work to simulate OOM-kills, though). Now launch it again. Everything should still work normally with a savedInstanceState present, but the process having been killed in the meantime.

My patch fixes b), while also not breaking a) or c).

Keywords: leave-open
Attachment #9029713 - Attachment is obsolete: true

Comment on attachment 9036137 [details]
Bug 1494748 - Ensure GeckoView saved state lives at least as long as the app process. r?snorp

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: unknown, but apparently some API26 changes re services made it worse

User impact if declined: in combination with custom tabs, standalone PWAs, or anything that somehow uses a (foreground?) service (e.g. the Stumbler, file downloads, ...), there is a risk that if the user swipe-closes Fennec (maybe in other circumstances, too), the next time they launch the app all existing tabs are unresponsive, context menus are broken, etc.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: No

Needs manual test from QE?: No

If yes, steps to reproduce:

List of other uplifts needed: none

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): The root cause of this problem has been finally understood and the fix doesn't require any large scale re-architecturing. We simply hook into Android's state saving system to get a second reference to GeckoApp's saved GeckoView state, so we can be sure that as long the app process isn't killed (and therefore Gecko is still running and all the user's tabs are open), GeckoApp will always have a saved state it can use to recreate its GeckoView's state if necessary.

String changes made/needed: none

Attachment #9036137 - Flags: approval-mozilla-beta?
Duplicate of this bug: 1518979
Pushed by mozilla@buttercookie.de:
https://hg.mozilla.org/integration/autoland/rev/025a428fbec4
Ensure GeckoView saved state lives at least as long as the app process. r=snorp

We should try to verify this fix ASAP after it goes out in Nightlies. I may spin a Fennec b12 build for this fix if everything looks good with it.

Flags: qe-verify+
Flags: needinfo?(andrei.bodea)
Status: REOPENED → RESOLVED
Closed: Last year11 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 66

Hello,
I will set the whole bug to verified fixed after I will check this bug on the beta uplift, for now I cleared the 66 flag as I re-tested this issue on the latest Nightly build 66.0a1 but I couldn't reproduce this issue.
Devices used: Samsung Galaxy Note 9 (Android 8.1.0), Google Pixel XL (Android 9/P), Google Pixel 3 XL(Android 9/P).

Note that I tried the following scenarios:

  • Mozilla Location Service and Always restore[enabled]

    -Open ~7-14 web pages
    -Swap between them and close Firefox
    -Open Firefox again

    -Open few web pages
    -On the last opened web page, tap o refresh button and close Firefox
    -Open Firefox again

    -Open few web pages
    -On the last opened web page, tap PWA button and add it to the springboard
    -Close Firefox
    -Open Firefox again

    -Open few web pages
    -Add as PWA few of them
    -Minimize the Firefox and let it in background for ~2 hours
    -Close/Swipe the Firefox
    -Relaunch Firefox

    -Open few web pages
    -Turn off the internet connection.
    -Turn on the internet connection.
    -Close/Swipe the Firefox
    -Relaunch Firefox

    -Launch Firefox
    -Close Firefox
    -Open Firefox again and open few web pages
    -On the last opened page tap on refresh button and close Firefox
    -Relaunch Firefox

    As Csheany said in Comment 73, I also included nytimes.com/videos web page in every scenario from above, played a lot of videos at once but I couldn't reproduce it.
    Also, I tried few scenarios with more than 50 tabs opened, but everything worked as expected when I launched the FF again

And I tried the same scenarios from above with the - Mozilla Location Service{disabled] and Always restore[enabled]

Thanks,
Andrei

Flags: qe-verify+
Flags: needinfo?(andrei.bodea)
Flags: qe-verify+

Comment on attachment 9036137 [details]
Bug 1494748 - Ensure GeckoView saved state lives at least as long as the app process. r?snorp

[Triage Comment]
This sounds like a pretty bad bug for affected users. Thanks for getting to the bottom of it, Jan! Approved for 65.0b12. Let's definitely get another QA pass of this on Beta since it's late in the cycle.

Attachment #9036137 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Verified as fixed on latest Beta build - 65.0b12 following the scenarios presented in comment 89.
Devices: Google Pixel (Android 9), Samsung Galaxy Note 8 (Android 8.0.0).

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Duplicate of this bug: 1522558
See Also: → 1572359
You need to log in before you can comment on or make changes to this bug.