Closed Bug 1298219 Opened 3 years ago Closed 3 years ago

about:sessionrestore shows a crashed tab after a failed shutdown

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 51
Tracking Status
e10s + ---
firefox50 + verified
firefox51 + verified

People

(Reporter: mstange, Assigned: mconley)

References

Details

(Keywords: regression)

Attachments

(4 files)

Attached file about:support output
STR:
 1. Have a session with multiple tabs open.
 2. Kernel panic your machine, or maybe force quit Firefox.
 3. Start Firefox again.
 4. Repeat step 2 and step 3 twice or maybe three times.

Actual results:
The browser launches with one tab, about:sessionrestore, whose content is "Bad news first: This tab has crashed".
I just reproduced this on my Windows machine after getting bit by bug 1263595. I didn't have to do a full system shutdown - just a failed Firefox shutdown appeared to be enough.
Summary: about:sessionrestore shows a crashed tab after force-quitting the machine twice → about:sessionrestore shows a crashed tab after a failed shutdown
Assignee: nobody → mconley
Hey RyanVM,

I think I might need SV help with this as well. Do you know if somebody could work with me to try to reproduce this reliably?
Flags: needinfo?(ryanvm)
Justin, interested in trying to help Mike out with another one today? :)
Flags: needinfo?(ryanvm) → needinfo?(jwilliams)
If you're not already aware of it, this add-on is useful for predictably crashing the browser: https://addons.mozilla.org/en-US/firefox/addon/crash-me-now-simple/
Will do.
Flags: needinfo?(jwilliams)
We were able to reproduce this issue twice on Ubuntu 16.04 after downloading all of the addons in the attached profile. This bug does not reproduce without the addons. I have no idea what may be causing this but Im inching towards one of the addons. We will continue looking at this until the eod.
Here's my list of enabled add-ons - let's see if there's a common culprit:

Name: Bugzilla Socorro Lens
Version: 0.3
Enabled: true
ID: bugzilla-socorro-lens@ashughes.com

Name: Chrome Store Foxified
Version: 2.3
Enabled: true
ID: Chrome-Store-Foxified@jetpack

Name: Cycle Collector Graph Analyzer, about:cc
Version: 0.0.1
Enabled: true
ID: cycle.collector.graph.analyzer@pettay.fi

Name: Differo
Version: 0.3.2
Enabled: true
ID: differo@grigory.ca

Name: DOM Inspector
Version: 2.0.16.1-signed
Enabled: true
ID: inspector@mozilla.org

Name: Emoji Cheatsheet for GitHub, Basecamp etc.
Version: 1.2.0
Enabled: true
ID: jid1-Xo5SuA6qc1DFpw@jetpack

Name: Fangs
Version: 1.0.8.1-signed.1-signed
Enabled: true
ID: {21D01944-2878-4eb3-A72A-83E8D1E6D4A6}

Name: FlyWeb
Version: 1.0.0
Enabled: true
ID: flyweb@mozilla.org

Name: geckoprofiler
Version: 1.16.20
Enabled: true
ID: jid0-edalmuivkozlouyij0lpdx548bc@jetpack

Name: Imgur Uploader
Version: 1.0.6.1-signed
Enabled: true
ID: giorgio@gilestro.tk

Name: Mass Password Reset
Version: 1.05.1-signed.1-signed
Enabled: true
ID: masspasswordreset@johnathan.nightingale

Name: Mozilla Tree Status
Version: 1.0.2
Enabled: true
ID: mozilla-tree-status@jsantell.com

Name: Multi-process staged rollout
Version: 1.0
Enabled: true
ID: e10srollout@mozilla.org

Name: No More 404s
Version: 1.5.6
Enabled: true
ID: wayback_machine@mozilla.org

Name: No Twitter Moments
Version: 1.1
Enabled: true
ID: no-twitter-moments@mikeconley.ca

Name: Pocket
Version: 1.0.4
Enabled: true
ID: firefox@getpocket.com

Name: RB DoubleClick
Version: 0.0.1
Enabled: true
ID: rb-doubleclick@mikeconley.ca

Name: Reviewboard Collapser
Version: 1.0
Enabled: true
ID: reviewboard-collapser@webextensions

Name: Statuser
Version: 0.1.4
Enabled: true
ID: @statuser

Name: tab-crasher
Version: 0.1.1-signed.1-signed
Enabled: true
ID: jid1-KrxrrWKPOoE4Mw@jetpack

Name: Test Pilot
Version: 0.8.5-tag-2016-08-12
Enabled: true
ID: @testpilot-addon

Name: Web Compat
Version: 1.0
Enabled: true
ID: webcompat@mozilla.org

Name: wontfix
Version: 1.0
Enabled: true
ID: jid1-SDCOUnA2RvYHFg@jetpack
Common add-ons:

Bugzilla Socorro Lens
DOM Inspector
FlyWeb (system add-on)
geckoprofiler
Multi-process staged rollout (system add-on)
Pocket (system add-on)
Web Compat
wontfix
(WebCompat is also a system add-on)

Bugzilla Socorro Lens and DOM Inspector should be in AMO.

geckoprofiler: https://github.com/bgirard/Gecko-Profiler-Addon/blob/master/geckoprofiler-signed.xpi?raw=true
wontfix: http://potch.github.io/wontfix/wontfix.xpi
geckoprofiler seems like the culprit but this is not reproducible every time.
Attached file about:support
I can also reliably reproduce this bug, as my about:support output shows, however I don't have gecko profiler or bugzilla socorro lens installed. I am running Fedora 24 x64.
50.0a2 is also affected. It doesn't seem to be caused by any specific addon, any large number will do (20+). Longer startup time might have something to do with it.

I narrowed it down to this regression window:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7c669d5d63ef&tochange=f44bb9de08ad

Most likely regressed by Bug 1261842.
[Tracking Requested - why for this release]:

This isn't an amazing experience: The end-result is essentially a loss in a user session, since we end up unloading about:sessionrestore for about:tabcrashed.

We should definitely fix this soon.
Duplicate of this bug: 1298907
Okay, I have a way of reproducing this locally now; I needed to enable the crash reporter in my local build with MOZ_CRASHREPORTER.
Is there any way a test could have caught this issue? Seems like the sort of thing that would be good to have a test for (that session restore on startup loads and remains interactive).
(In reply to Chris Lord [:cwiiis] from comment #21)
> Is there any way a test could have caught this issue? Seems like the sort of
> thing that would be good to have a test for (that session restore on startup
> loads and remains interactive).

Yeah, I'm in the process of writing a test for this - I should have it up today.
Priority: -- → P1
(In reply to Marcia Knous [:marcia - use ni] from comment #18)
> Tracking 51+ for all the reasons in Comment 16.

Just a heads up that this should probably track 50 for exactly the same reasons.
Comment on attachment 8786973 [details]
Bug 1298219 - Don't fire oop-browser-crashed event if the browser has already flipped remoteness and moved on.

https://reviewboard.mozilla.org/r/75824/#review74880
Attachment #8786973 - Flags: review?(wmccloskey) → review+
I'm worried the test is going to fail intermittently a lot. Can you do a bunch of re-triggers to see what happens?
(In reply to Bill McCloskey (:billm) from comment #27)
> I'm worried the test is going to fail intermittently a lot. Can you do a
> bunch of re-triggers to see what happens?

Thanks for getting me to push to try - it really didn't like my first crack at the test[1]! I've modified the test so that:

1) It only runs if e10s is enabled, naturally
2) It cleans up the minidumps that are created so that harness doesn't complain

I've done a new push with 20 retriggers on the test directory, and it seems happy: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a04a80654435a1acf8ce092f4e36e9c7d807710e

[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8aadec9b9a9d10741f9badf1988cf8130b53eb4e
Flags: needinfo?(mconley)
(In reply to Bill McCloskey (:billm) from comment #27)
> I'm worried the test is going to fail intermittently a lot. Can you do a
> bunch of re-triggers to see what happens?

Out of curiosity, where do you see a potential race?
Flags: needinfo?(wmccloskey)
It seems like if the Content.spawn finishes its setTimeout before updateBrowserRemoteness runs, then we will get a oop-browser-crashed notification and the test will fail.
Flags: needinfo?(wmccloskey)
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5948ae1f4218
Don't fire oop-browser-crashed event if the browser has already flipped remoteness and moved on. r=billm
https://hg.mozilla.org/integration/autoland/rev/98b0e9b88212
Regression test. r=billm
As RyanVM pointed out in IRC, ASAN builds don't have the crashreporter enabled. I've modified the browser.ini run-if rule accordingly and re-autolanded.
Flags: needinfo?(mconley)
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/804c98e6fef4
Don't fire oop-browser-crashed event if the browser has already flipped remoteness and moved on. r=billm
https://hg.mozilla.org/integration/autoland/rev/bd4d11e10ee2
Regression test. r=billm
https://hg.mozilla.org/mozilla-central/rev/804c98e6fef4
https://hg.mozilla.org/mozilla-central/rev/bd4d11e10ee2
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Comment on attachment 8786973 [details]
Bug 1298219 - Don't fire oop-browser-crashed event if the browser has already flipped remoteness and moved on.

Approval Request Comment
[Feature/regressing bug #]:

Bug 1261842

[User impact if declined]:

Users that have their browser start at a page that is non-remote (about:newtab, about:sessionrestore, about:config, chrome://, etc), run the risk of having that tab suddenly show the Tab Crashed page soon after startup.

[Describe test coverage new/current, TreeHerder]:

An automated test was added for this case.

[Risks and why]: 

Low risk - this makes us not fire the oop-browser-crashed event if the <xul:browser> has flipped remoteness to non-remote and loaded a page in it.

[String/UUID change made/needed]:

None.
Attachment #8786973 - Flags: approval-mozilla-aurora?
Comment on attachment 8787297 [details]
Bug 1298219 - Regression test.

See above.
Attachment #8787297 - Flags: approval-mozilla-aurora?
Hi Markus, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(mstange)
Comment on attachment 8786973 [details]
Bug 1298219 - Don't fire oop-browser-crashed event if the browser has already flipped remoteness and moved on.

Fixes a regression in session restore, Aurora50+
Attachment #8786973 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I never had a reliable way of reproducing this, and I haven't hit this anymore because my machine has stopped crashing. Justin, can you verify it?
Flags: needinfo?(mstange) → needinfo?(jwilliams)
Attachment #8787297 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I also never had a reliable way of reproducing this. From the comments above it looks like Mike and Kestrel did. NI them
Flags: needinfo?(mconley)
Flags: needinfo?(kestrel)
Flags: needinfo?(jwilliams)
Verified as fixed on Nightly 51.0a1 (2016-09-13) 20160913030425 Ubuntu 16.04.

Verified as fixed on Aurora 50.0a2 (2016-09-14) 20160914004005 Ubuntu 16.04.
Flags: needinfo?(kestrel)
I've not been able to reproduce this on Nightly on my Windows box, so I'm going to verify this as fixed on Nightly 51.0a1 (2016-09-12) on Windows 7 (64-bit).
Status: RESOLVED → VERIFIED
Flags: needinfo?(mconley)
You need to log in before you can comment on or make changes to this bug.