Closed Bug 1263110 Opened 5 years ago Closed 5 years ago

Session restore can cause crash loops (again)

Categories

(Firefox for Android :: General, defect)

All
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
firefox48 --- affected
firefox49 --- fixed

People

(Reporter: JanH, Assigned: JanH)

References

Details

Attachments

(4 files, 1 obsolete file)

I've experienced a few media-related crashes (thankfully fixed now) where Firefox would crash immediately when loading that page. While session restore is usually very useful, in this case it makes it a bit difficult to break out of the resulting crash loop.

Therefore I propose to add on option to the crash reporter which optionally allows you to not automatically restore your tabs. Instead, your old session will appear on the "Tabs from last time" page from where you can then manually restore any tabs as needed.
mfinkle mentioned that we used to try and detect successive crashes and stop doing a session restore in such a case (see bug 701092) and that this probably somehow got lost when part of the session startup code was moved to Java.

I'll try reimplementing that approach first before resorting to adding another checkbox.
See Also: → 701092
Summary: Add crash reporter option to not restore your tabs automatically → Session restore can cause crash loops (again)
This proposed patch restores crash loop detection functionality and seems to be working quite nicely except for the fact that to get it working reliably, I had to turn on Context.MODE_MULTI_PROCESS for accessing GeckoApp's shared preferences.

This is because the - already existing - GeckoApp.PREFS_CRASHED flag is written to by the crash reporter, but read from and reset within GeckoApp, both of which are running in different processes. Without Context.MODE_MULTI_PROCESS, there can be some strange failures where each process will often fail to notice the changes made by the other process. The trouble with that solution however is that the usage of Context.MODE_MULTI_PROCESS has been deprecated as of API 23, so I've got no idea what approach to take.
See Also: → 1171213
Does it help to write the shared preference with commit() (synchronous) instead of apply() (asynchronous)? In theory the preference should always be written before the app is restarted so this shouldn't require multi-process access?
I'd already tried that for the crash reporter and have now also experimented with using this within GeckoApp as well, but without success.

The problem seems to be that if I keep the activity in the task switcher alive (i.e. I don't swipe it away before restarting Firefox, or choose the restart option in the crash reporter), the shared prefs won't be necessarily be reloaded from disc, but instead, some cached value seems to be used:
- Firefox crashes (http://people.mozilla.org/~tmielczarek/crashme/)
- The crash reporter sets PREFS_CRASHED to true (for local builds you need to export MOZ_CRASHREPORTER=1 and run ./mach configure if re-building to get the crash reporter)
- Firefox starts, but thinks PREFS_CRASHED is still false, which also seems to get written out to disc during one of the calls to editor.apply()/commit(), overwriting the previous PREFS_CRASHED = true, even though the reset code is never called.
- I crash Firefox again.
- The crash reporter runs, probably reads some sort of cached value and thinks PREFS_CRASHED is still true, so despite setting PREFS_CRASHED to true again, nothing gets written to disc because the shared prefs editors believes that in fact nothing has changed. The crash reporter also sets PREFS_WAS_STOPPED to true, but because GeckoApp's onPause() handler already set it to true, that operation is a no-op as well.
- And so on...

This also corresponds with the description of Context.MODE_MULTI_PROCESS in the SDK:
> SharedPreference loading flag: when set, the file on disk will be checked
> for modification even if the shared preferences instance is already
> loaded in this process.
After discussing this with Sebastian, it seems it might be possible to implement this by having the crash reporter sending a broadcast to a broadcast receiver running within the main process, however this could have unfortunate side effects and produce a crash loop of its own if we should ever find ourselves in a state where just running broadcasting to that broadcast receiver provokes another crash.
So instead I'm going to look at setting that shared pref crash flag from within the main process at the point the crash reporter is getting invoked, i.e. somewhere in GeckoAppShell and/or the CrashHandler.
The crash reporter runs in its own process but uses GeckoApp's shared prefs both to store its own settings and to signal to the main process that it has crashed, which can be somewhat problematic because each process might fail to notice settings changes made by the other process. As the simple solution of enabling Context.MODE_MULTI_PROCESS for accessing the shared prefs is now deprecated, we'll devise an alternative solution instead.

In Part 1 we move the settings that are used exclusively by the crash reporter into a separate shared prefs instance.

Review commit: https://reviewboard.mozilla.org/r/47227/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47227/
Attachment #8742770 - Flags: review?(mark.finkle)
Attachment #8742771 - Flags: review?(mark.finkle)
Attachment #8742772 - Flags: review?(mark.finkle)
Bug 701092 originally implemented some functionality to detect successive crashes and then turn off session restore for the next start, however that functionality got lost when parts of the startup session restore code were moved to Java.

This patch re-implements this functionality within the Java UI. Unlike the previous implementation, we don't reset the crash counter in onPause(), because often enough onPause() will execute even after a crash. Instead, we check in onResume() whether our last foreground activity cycle crashed or not.

To avoid cross-process writes and reads to shared preferences, the crash reporter no longer sets the relevant flags in GeckoApp's shared prefs directly, but instead writes an empty CRASHED file to the Mozilla directory as a flag, which is then checked for by the main process during startup.

Alternative solutions considered were:
- Using Context.MODE_MULTI_PROCESS for accessing the shared prefs. Works, but forces the shared preferences to always be re-read from storage and is also deprecated from API 23 onwards.
- Using a ContentProvider for managing the cross-process shared prefs as suggested in Google's documentation. Seems somewhat over-engineered for this use case.
- Sending a broadcast from the crash reporter to signal the main process, so it can update the relevant shared prefs from the correct process. Doesn't work reliably immediately after crashing - sometimes the broadcast never arrives.
- Setting the crash flags directly in the crash handling functions in GeckoAppShell. Could work even when not building the crash reporter, however doesn't work easily for native crashes, because those are handled internally by Gecko without going through the Java crash handling code.

Review commit: https://reviewboard.mozilla.org/r/47229/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47229/
The number of recent successive crashes is now tracked wholly within Java, so we can remove the old Gecko pref and the associated reset code.

Review commit: https://reviewboard.mozilla.org/r/47231/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47231/
It turns out that despite this promising function in GeckoAppShell (https://dxr.mozilla.org/mozilla-central/rev/1f16d3da9280e40ada252acf8110b91ee1edbb08/mobile/android/base/java/org/mozilla/gecko/GeckoAppShell.java#374) Gecko handles crashes in native code all by itself now without involving anything on the Java side (at least nothing that I could find).
I then experimented with using a broadcast receiver (the crash reporter sending a broadcast to the main process which would then in turn set the PREFS_CRASHED flag from the correct process), but it turns out that sending a broadcast to the main process immediately after a crash isn't very reliable - sometimes the broadcast would simply get lost.
So instead the crash reporter is now simply writing an empty CRASHED file into the Mozilla directory which is then checked for during startup.
Comment on attachment 8742772 [details]
MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47231/#review44585

::: widget/android/nsAppShell.cpp
(Diff revision 1)
>          // We really want to send a notification like profile-before-change,
>          // but profile-before-change ends up shutting some things down instead
>          // of flushing data
>          nsIPrefService* prefs = Preferences::GetService();
>          if (prefs) {
> -            // reset the crash loop state

Yes we should clean up this code.
Attachment #8742772 - Flags: review?(mark.finkle) → review+
Comment on attachment 8742771 [details]
MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47229/#review44587

Overall, this approach looks OK to me, but we should question the need for syncing the Gecko prefs.

I want Sebastian to take a better look at the details.

::: mobile/android/components/SessionStore.js:73
(Diff revision 1)
>  
>      this._loadState = STATE_STOPPED;
>  
>      this._interval = Services.prefs.getIntPref("browser.sessionstore.interval");
>      this._maxTabsUndo = Services.prefs.getIntPref("browser.sessionstore.max_tabs_undo");
> +

Do we really need to keep the Gecko prefs up to date?
Attachment #8742771 - Flags: review?(mark.finkle) → review+
Comment on attachment 8742770 [details]
MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47227/#review44589

I like the approach, but let's get Sebastian to look at the details.
Attachment #8742770 - Flags: review?(mark.finkle) → review+
Flags: needinfo?(s.kaspari)
Attachment #8739964 - Attachment is obsolete: true
https://reviewboard.mozilla.org/r/47229/#review44587

> Do we really need to keep the Gecko prefs up to date?

Maybe the comment needs rewording, but what I'm doing is actually the other way round: Copy changes from the Gecko settings into the Java shared prefs, so changes in about:config actually have some effect on the Java code. There's indeed no need for the reverse path, because those settings are never modified on the Java side and I consequentially haven't implemented it either.
Comment on attachment 8742770 [details]
MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47227/diff/1-2/
Comment on attachment 8742771 [details]
MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47229/diff/1-2/
Comment on attachment 8742772 [details]
MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47231/diff/1-2/
Comment on attachment 8742773 [details]
MozReview Request: DON'T LAND - Debug logging

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47233/diff/1-2/
Mozreview's handling of interdiffs involving a rebase is somewhat suboptimal at the moment, so here's what I've actually changed:
- Crash reporter in part 1: Directly instantiate the app prefs editor for the crash flags, as we're accessing them in a write-only manner.
- Part 2: Call incrementCrashCount directly from the if condition, reworded the comment in sessionstore.js.
I was wondering - should we add some sort of "Well, this is embarrassing" doorhanger to explain that session restoring has been temporarily turned off?
Flags: needinfo?(alam)
Comment on attachment 8742770 [details]
MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47227/#review46053
Attachment #8742770 - Flags: review+
Comment on attachment 8742771 [details]
MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47229/#review46055

::: mobile/android/base/java/org/mozilla/gecko/GeckoApp.java:1718
(Diff revision 2)
> +     * @return True if the crash reporter ran after the last session.
> +     */
> +    protected boolean updateCrashedState() {
> +        try {
> +            File crashFlag = new File(GeckoProfileDirectories.getMozillaDirectory(this), "CRASHED");
> +            if (crashFlag.exists() && crashFlag.delete()) {

Only returning true if we were able to delete the flag file is smart :)
Attachment #8742771 - Flags: review+
Comment on attachment 8742772 [details]
MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle,sebastian

https://reviewboard.mozilla.org/r/47231/#review46057
Attachment #8742772 - Flags: review+
LGTM.

(In reply to Jan Henning [:JanH] from comment #20)
> I was wondering - should we add some sort of "Well, this is embarrassing"
> doorhanger to explain that session restoring has been temporarily turned off?

Personally I don't think we should invest too much (and pile up code that is not running often) for this edge case (This only happens if we restore and crash again, right?).
In fact I still wonder if the added complexity for automatic "restore or not restore" is worth it. Especially if it would be much simpler (code-wise) to add a checkbox to the crash reporter (for the user to decide). But as we have it now and it's working: r+ :)
Flags: needinfo?(s.kaspari)
Comment on attachment 8742770 [details]
MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47227/diff/2-3/
Attachment #8742770 - Attachment description: MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle → MozReview Request: Bug 1263110 - Part 1 - Move crash reporter settings into their own shared pref. r=mfinkle,sebastian
Attachment #8742771 - Attachment description: MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle → MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle,sebastian
Attachment #8742772 - Attachment description: MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle → MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle,sebastian
Comment on attachment 8742771 [details]
MozReview Request: Bug 1263110 - Part 2 - Implement crash loop detection in Java. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47229/diff/2-3/
Comment on attachment 8742772 [details]
MozReview Request: Bug 1263110 - Part 3 - Remove the remains of the old crash loop detection. r=mfinkle,sebastian

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47231/diff/2-3/
Comment on attachment 8742773 [details]
MozReview Request: DON'T LAND - Debug logging

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47233/diff/2-3/
Had to re-add the NoMozillaDirectoryException import in GeckoApp just after ahunt had finally removed it.
(In reply to Jan Henning [:JanH] from comment #20)
> I was wondering - should we add some sort of "Well, this is embarrassing"
> doorhanger to explain that session restoring has been temporarily turned off?

I don't think this is really necessary. Our doorhangers only popup when there's an action that needs to be taken. We do one better here, by just stopping the crash loop so there's really nothing the user needs to do apart from pick which tabs they want to pick up from (if any).

Thanks JanH!
Flags: needinfo?(alam)
You need to log in before you can comment on or make changes to this bug.