Last Comment Bug 718240 - Fennec carries out a full session restore when killed
: Fennec carries out a full session restore when killed
Status: RESOLVED FIXED
:
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: Trunk
: ARM Android
: -- normal (vote)
: Firefox 14
Assigned To: Brian Nicholson (:bnicholson)
:
Mentors:
: 722409 (view as bug list)
Depends on: 722409 753625
Blocks: 697858 721008 730303
  Show dependency treegraph
 
Reported: 2012-01-14 15:40 PST by bug.zilla
Modified: 2012-05-09 23:24 PDT (History)
12 users (show)
nhirata.bugzilla: blocking‑fennec1.0?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed
+


Attachments
patch (2.96 KB, patch)
2012-02-27 15:26 PST, Brian Nicholson (:bnicholson)
mark.finkle: review-
Details | Diff | Review
patch v2 (4.12 KB, patch)
2012-03-16 17:25 PDT, Brian Nicholson (:bnicholson)
mark.finkle: review+
Details | Diff | Review

Description bug.zilla 2012-01-14 15:40:22 PST
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Build ID: 20111220165912

Steps to reproduce:

Kill Fennec using an Android task killer such as Advanced Task Killer


Actual results:

After killing and upon a restart of Fennec it carries out a session restore by restoring all the tabs of the last session. This can be extremely slow and resource heavy often leading to crashes.


Expected results:

Fennec should never carry out a session restore if the application is killed.

Please do not underestimate the popularity of these task killers. Advanced Task Killer alone has tens of millions of users (https://market.android.com/details?id=com.rechild.advancedtaskkiller&hl=en)

A lot of users set tasks / applications to be killed when the screen goes off.
What this means is that when Fennec is restarted it does a full session restore of the last session which can be very, very slow.
Comment 1 Martijn Wargers [:mwargers] (not working for Mozilla) 2012-01-15 02:35:36 PST
What I think is the problem here with Native Fennec is that it always restore all the tabs after a crash (or when it is killed, which is from the application's point of view, the same).
I don't think Fennec should be doing that, because if there is a page in there that causes Fennec to crash, then we keep on repeating the crash.
So basically you need something similar like desktop Firefox, where a session restore dialog page comes up after the second crash, that asks which ones of the tabs you want to restore.

bug.zilla@mail.com, do you agree with this or should I open a new bug about this?
Comment 2 Mark Finkle (:mfinkle) (use needinfo?) 2012-01-15 07:19:08 PST
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #1)
> What I think is the problem here with Native Fennec is that it always
> restore all the tabs after a crash (or when it is killed, which is from the
> application's point of view, the same).
> I don't think Fennec should be doing that, because if there is a page in
> there that causes Fennec to crash, then we keep on repeating the crash.
> So basically you need something similar like desktop Firefox, where a
> session restore dialog page comes up after the second crash, that asks which
> ones of the tabs you want to restore.

Sounds like bug 701092
Comment 3 bug.zilla 2012-01-15 08:17:16 PST
Hi Martijn,

ideally Fennec should be able to distinguish the difference between:

(i) A genuine crash = session restore dialogue box; and
(ii) Ending the process through a task killer = start a new session

I appreciate that from Fennec's point of view both (i) and (ii) look the same, so it will be difficult to distinguish.

Could you please open a new bug if possible? (or just merge with https://bugzilla.mozilla.org/show_bug.cgi?id=701092 if it's the same).

My concern is that with task killers quite so ubiquitous this will cause a real problem for users.
Comment 4 Gervase Markham [:gerv] 2012-02-10 08:28:57 PST
Doesn't the task killer at least send some sort of signal to the process that we can catch "you are about to be killed"?

Gerv
Comment 5 Brian Nicholson (:bnicholson) 2012-02-10 11:33:49 PST
The arguments against session restore that I see here are 1) we're too slow because we do a full session restore, and 2) restoring tabs can cause a crash loop. Case #1 has been fixed by lazily loading restored tabs (bug 721341), and case #2 has been fixed by limiting the restore attempts (bug 701092). Since the performance/UI problems have been fixed, I'm marking this bug as invalid.
Comment 6 bug.zilla 2012-02-21 10:48:35 PST
Brian,

I'm not suggesting re-opening this at the moment, but please be aware that even by enabling "Do not load tabs until selected" (i.e. https://bugzilla.mozilla.org/show_bug.cgi?id=721341), startup is still slow if the last viewed page is particularly "big" or script-heavy.

I've solved this by never killing Fennec, but I'm not convinced "regular" users will do that and instead will just think that Fennec is "slow".

Thanks
Comment 7 Brian Nicholson (:bnicholson) 2012-02-23 14:58:12 PST
I actually agree that we shouldn't do a restore for most kills (see https://bugzilla.mozilla.org/show_bug.cgi?id=722409#c4 for a discussion of when session restore takes place).

IMO, session restore should be used *only* for OOM kills; for all other cases, we can simply go to about:home. That is, if the user reboots, Force Quits, or a crash actually happens, we could always show about:home with "tabs from last time". Session restore would then only be used for seamlessly restoring from an OOM kill. I know this is completely different from desktop, but Force Quitting is relatively common on Android considering the popularity of task killers, and doing a session restore after a Force Quit feels awkward.

AFAIK, we can't differentiate between Force Quits and actual Fennec crashes, so we need to follow the same behavior for both. Unless we're completely incompetent, FQ's will be far more common than crashes, so I think we should give the best user experience for FQ's rather than crashes.
Comment 8 bug.zilla 2012-02-24 14:16:37 PST
Brian, very helpful, I agree.

Since there's no way to differentiate between FQs and crashes and since FQs are likely to be much, much more common due to:

(1) Task Killers (there must be millions of installations of these); and
(2) OOMs (and this is not the case just on low end phones)

then:

(1) If OOM occurs then Session Restore
(2) Everything else should be about:home and the usual "your tabs from last time"

I wouldn't worry about the inconsistency of treatment with the desktop version - what's important is not to make users think that Fennec is "slow".

Thanks again.
Comment 9 Brian Nicholson (:bnicholson) 2012-02-24 14:55:37 PST
Adding uiwanted for UX's opinion.
Comment 10 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-02-27 09:10:45 PST
Just a comment: I thought that this was the way it was suppose to be?  Quit will close all the tabs, kill would restore the tabs?
Comment 11 bug.zilla 2012-02-27 12:39:34 PST
I think we're saying that a kill should not restore the tabs (as it takes too long) and instead it should go to about:home.

Restoring tabs should only be used in case of OOMs / FQs.
Comment 12 Brian Nicholson (:bnicholson) 2012-02-27 13:00:12 PST
(In reply to bug.zilla from comment #11)
> I think we're saying that a kill should not restore the tabs (as it takes
> too long) and instead it should go to about:home.
> 
> Restoring tabs should only be used in case of OOMs / FQs.

Actually, I was suggesting that we restore tabs only for OOM kills only - not FQs.
Comment 13 Ian Barlow (:ibarlow) 2012-02-27 13:55:05 PST
From a UX perspective, I agree with Brian here. While there are still cases where we want to restore your session (like OOM), if someone deliberately kills the app we shouldn't bog them down with a full session restore.
Comment 14 bug.zilla 2012-02-27 15:03:46 PST
(In reply to Brian Nicholson (:bnicholson) from comment #12)
> (In reply to bug.zilla from comment #11)
> > I think we're saying that a kill should not restore the tabs (as it takes
> > too long) and instead it should go to about:home.
> > 
> > Restoring tabs should only be used in case of OOMs / FQs.
> 
> Actually, I was suggesting that we restore tabs only for OOM kills only -
> not FQs.

Sorry, you're right.
Comment 15 Brian Nicholson (:bnicholson) 2012-02-27 15:26:34 PST
Created attachment 601091 [details] [diff] [review]
patch

This patch disables session restore for crashes/force quits, so only OOM kills restore sessions.
Comment 16 Mark Finkle (:mfinkle) (use needinfo?) 2012-02-27 15:47:32 PST
Comment on attachment 601091 [details] [diff] [review]
patch

looks like you are disabling normal session restore altogether. what happens if we crash in the foreground?
Comment 17 Brian Nicholson (:bnicholson) 2012-02-27 15:55:14 PST
(In reply to Mark Finkle (:mfinkle) from comment #16)
> Comment on attachment 601091 [details] [diff] [review]
> patch
> 
> looks like you are disabling normal session restore altogether. what happens
> if we crash in the foreground?

We show about:home with tabs from last time for all crashes (background and foreground).
Comment 18 Mark Finkle (:mfinkle) (use needinfo?) 2012-02-29 17:24:15 PST
I do not like this approach. I do not want to lose restore for foreground crashes.

This patch disables the session restore for all cases except background OOM kills because we disable the gecko session restore preference and only use the "force restore" code that  uses the bundle to determine if we were OOM killed. Right?

If to, then I want to remind readers that we also have a timeout for session restore. If it has been more than 60 minutes since Firefox crashed or was task killed, we will not restore the session. If "60 minutes" became "60 seconds" we would be very close the to intentions of this patch. A user would have to quickly try to re-open Firefox for the session to be restored. OOM would always be restored because we force that from Java. Right?
Comment 19 Brad Lassey [:blassey] (use needinfo?) 2012-03-05 07:50:17 PST
(In reply to Brian Nicholson (:bnicholson) from comment #7)
> AFAIK, we can't differentiate between Force Quits and actual Fennec crashes,
> so we need to follow the same behavior for both.

When Android is going to OOM kill us, it asks us to save our instance to a bundle. As long as we always do that, the way to differentiate an OOM kill from any other kill is the existence of that bundle in onCreate().
Comment 20 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-03-05 08:38:30 PST
Just to note, I don't think I ever seen a Crash Reporter for a kill.  Might that be a way to distinguish the difference as well?
Comment 21 Brian Nicholson (:bnicholson) 2012-03-05 11:08:13 PST
(In reply to Brad Lassey [:blassey] from comment #19)
> (In reply to Brian Nicholson (:bnicholson) from comment #7)
> > AFAIK, we can't differentiate between Force Quits and actual Fennec crashes,
> > so we need to follow the same behavior for both.
> 
> When Android is going to OOM kill us, it asks us to save our instance to a
> bundle. As long as we always do that, the way to differentiate an OOM kill
> from any other kill is the existence of that bundle in onCreate().

Right, but is there any way to differentiate between the user killing Fennec manually (Force closing or task killer) and Fennec crashing? Those are the two cases that are tied together. If the user chooses to Force close Fennec, we don't want to do a session restore, but if Fennec actually crashes, we probably do.
Comment 22 Brad Lassey [:blassey] (use needinfo?) 2012-03-05 15:12:53 PST
sure, we catch crashes with the crash reporter. Some cookie could be left saying there was a crash. SIGKILL doesn't get caught by the crash reporter.
Comment 23 Brian Nicholson (:bnicholson) 2012-03-13 12:45:09 PDT
The current plan, as described in comment 18, is to have a very short (~30-60 second) window in which we will restore the session. The idea is that for the majority of crashes, users will reopen Fennec immediately and will want their sessions back. On the other hand, for most of the task killer/force quit cases, users will wait >30 seconds before reopening Fennec. This is a temporary compromise until bug 735399 is fixed.

Adding dependency on bug 722409 since this change requires the Java-side timeout introduced there.
Comment 24 Brian Nicholson (:bnicholson) 2012-03-16 17:25:51 PDT
Created attachment 606801 [details] [diff] [review]
patch v2

This patch adds a 30 second timeout for crash session restores. OOM kills still have no timeouts for restoring.

I can't test this on about:home due to bug 736644, but regular web pages work correctly.
Comment 25 Brian Nicholson (:bnicholson) 2012-03-16 17:29:47 PDT
*** Bug 722409 has been marked as a duplicate of this bug. ***
Comment 26 Mark Finkle (:mfinkle) (use needinfo?) 2012-03-16 19:20:52 PDT
Comment on attachment 606801 [details] [diff] [review]
patch v2


>diff --git a/mobile/android/base/GeckoProfile.java b/mobile/android/base/GeckoProfile.java

>+    public boolean shouldRestoreSession() {
>         Log.w(LOGTAG, "zerdatime " + SystemClock.uptimeMillis() + " - start check sessionstore.js exists");
>         File dir = getDir();
>-        boolean hasSession = (dir != null && new File(dir, "sessionstore.js").exists());
>+        File sessionFile = new File(dir, "sessionstore.js");
>+        boolean shouldRestore = (dir != null &&
>+                                 sessionFile.exists() &&
>+                                 (System.currentTimeMillis() - sessionFile.lastModified() < SESSION_TIMEOUT));

Let's do some early returns:

>         File dir = getDir();
>+        if (dir == null)
>+            return false;
>+
>+        File sessionFile = new File(dir, "sessionstore.js");
>+        if (!sessionFile.exists())
>+            return false;
>+
>+        boolean shouldRestore = (System.currentTimeMillis() - sessionFile.lastModified() < SESSION_TIMEOUT));

I'm old and this is easier to read and does slightly less work
Comment 27 Brian Nicholson (:bnicholson) 2012-03-19 11:21:41 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/f8b7d116a096
Comment 28 Mounir Lamouri (:mounir) 2012-03-20 04:00:16 PDT
https://hg.mozilla.org/mozilla-central/rev/f8b7d116a096
Comment 29 Marco Bonardo [::mak] 2012-03-22 06:44:58 PDT
This has been backed out, but nobody annotated that, please always mark backouts
https://hg.mozilla.org/mozilla-central/rev/c1762efb09cf
Comment 30 Brian Nicholson (:bnicholson) 2012-03-22 14:05:02 PDT
Sorry Marco.

Looks like this patch wasn't the cause of the Tp regression, so I relanded it on inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/3cf12ee16e7f
Comment 31 Marco Bonardo [::mak] 2012-03-23 05:50:07 PDT
https://hg.mozilla.org/mozilla-central/rev/3cf12ee16e7f
Comment 32 Mark Finkle (:mfinkle) (use needinfo?) 2012-03-24 08:57:36 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/29f7979e28c2

Note You need to log in before you can comment on or make changes to this bug.