Closed Bug 1391594 Opened 7 years ago Closed 7 years ago

Async shutdown crash by infinite running JS code inside a window via setTimeout

Categories

(Core :: DOM: Content Processes, defect, P1)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 + wontfix
firefox58 + fixed

People

(Reporter: whimboo, Assigned: mconley)

References

Details

(Keywords: crash, hang, perf, Whiteboard: [reserve-photon-performance])

Attachments

(5 files, 2 obsolete files)

[Tracking Requested - why for this release]:

Quitting Firefox via the menu after running the following JS code in a window like scratch pad, the shutdown takes forever and finally Firefox is killed. Usually the long running script dialog is shown and the script can be canceled. But starting with Firefox 57 this dialog is not coming up.

window.setTimeout(function() {
  while (true) {}
}, 0);

Logging output from a debug build:

WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"SessionStore: flushing all windows","state":{"total":1,"current":0},"filename":"resource:///modules/sessionstore/SessionStore.jsm","lineNumber":1594,"stack":["resource:///modules/sessionstore/SessionStore.jsm:ssi_onQuitApplicationGranted:1594","resource:///modules/sessionstore/SessionStore.jsm:ssi_observe:775","chrome://global/content/globalOverlay.js:goQuitApplication:65","chrome://browser/content/hiddenWindow.xul:oncommand:1"]}] Barrier: quit-application-granted

So it looks to be SessionStore related.
Flags: needinfo?(mdeboer)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(ehsan)
(In reply to Henrik Skupin (:whimboo) from comment #0)
> running the following JS code in a
> window like scratch pad,
> window.setTimeout(function() {
>   while (true) {}
> }, 0);

Against which window? Chrome or content? There are different cut-off points for them for showing the slow script dialog. I expect the chrome one is so high we don't hit it before shutdown times out.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(hskupin)
In this case it was the content environment.
Flags: needinfo?(hskupin)
Attached file HTML testcase
It should be enough to load this HTML page.
Attachment #8898757 - Attachment mime type: text/plain → text/html
Flags: needinfo?(ehsan)
As far as I can tell, the bug here is that there may not be a window during shutdown to show the hang UI inside.  In <https://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/browser/modules/ProcessHangMonitor.jsm#276> we should probably deal with the case where there are no windows, and call terminateScript or some such.  Note that this is purely based on code inspection...

Mike, what do you think?
Component: Session Restore → DOM: Content Processes
Flags: needinfo?(mdeboer) → needinfo?(mconley)
Product: Firefox → Core
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #4)
> Mike, what do you think?

If there are no windows, and maybe if we're in the process of shutting down, and we get one of these process hang messages, I think it makes sense to kill the script immediately.
Flags: needinfo?(mconley)
Whiteboard: [qf]
Whiteboard: [qf] → [reserve-photon-performance][triage]
I'm not sure where this fits within the scope of Photon Performance, since it's like... shutdown-y, but I don't want to lose this, and florian and I are about to finish our MVP's for Photon in 57, and this might be a nice and quick slam dunk.
Whiteboard: [reserve-photon-performance][triage] → [photon-performance] [triage]
Flags: qe-verify-
Priority: -- → P3
Whiteboard: [photon-performance] [triage] → [reserve-photon-performance]
See Also: → 1399135
Assignee: nobody → mconley
Status: NEW → ASSIGNED
Priority: P3 → P1
Whiteboard: [reserve-photon-performance] → [photon-performance]
Whiteboard: [photon-performance] → [reserve-photon-performance]
Since you've chosen to fix this bug by special casing for the shutdown case, couldn't we fall into the same situation if we were on Mac and the last browser window were about to be closed?  It seems like we may get into the same problem there since then the browser wouldn't shut down since the hidden window would stay alive, but inside updateWindows() we wouldn't have any windows to enumerate again...
Flags: needinfo?(mconley)
Attachment #8914457 - Attachment is obsolete: true
Attachment #8914457 - Flags: review?(wmccloskey)
This new set of patches also tries to handle the "no windows" case.
Flags: needinfo?(mconley)
Comment on attachment 8914458 [details]
Bug 1391594 - If a browser window unloads, make sure ProcessHangMonitor removes event handlers.

https://reviewboard.mozilla.org/r/185790/#review192036

::: browser/modules/ProcessHangMonitor.jsm:421
(Diff revision 2)
> +      case "TabRemotenessChange": {
> -      this.updateWindow(win);
> +        this.updateWindow(win);
> +        break;
> +      }
> +      case "unload": {
> +        this.untrackWindow(win);

Is this necessary? We already remove event listeners when a window is closed.
http://searchfox.org/mozilla-central/rev/8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/dom/base/nsGlobalWindow.cpp#2110

I'd rather not add extra code if we don't need it.
Comment on attachment 8914534 [details]
Bug 1391594 - Unilaterally terminate script and plugin hangs during shutdown or when there are no windows.

https://reviewboard.mozilla.org/r/185850/#review192042

This looks good.

::: browser/modules/ProcessHangMonitor.jsm:493
(Diff revision 2)
>          this.updateWindow(win);
>          break;
>        }
>        case "unload": {
>          this.untrackWindow(win);
> +        this.updateWindows();

Now I see why you added this. However, I think that domwindowclosed might be a more reliable notification. https://bugzilla.mozilla.org/show_bug.cgi?id=1392352#c25 describes some issues with the unload event in XUL windows. In any case, I think domwindowclosed fits better with our existing domwindowopened notification.
Attachment #8914534 - Flags: review?(wmccloskey) → review+
Comment on attachment 8914459 [details]
Bug 1391594 - Add a test that checks that hangs are dealt with on shutdown.

https://reviewboard.mozilla.org/r/185792/#review192044
Attachment #8914459 - Flags: review?(wmccloskey) → review+
Hey ritu, this was marked tracking 57+, but this issue has been shipping on the release channel probably for as long as e10s has shipped. These patches aren't _super_ risky, but I personally think this is safe to let slip until 58. Are you okay if we wontfix this for 57 and just try for 58, considering how far along we are in the beta cycle?
Flags: needinfo?(rkothari)
Hi Mike, yes, that makes sense. I am told that we are not focusing on dramatically reducing shutdown crash rates in 57 so wontfix for that release seems reasonable. This is now tracked for 58.
Flags: needinfo?(rkothari)
Comment on attachment 8914458 [details]
Bug 1391594 - If a browser window unloads, make sure ProcessHangMonitor removes event handlers.

https://reviewboard.mozilla.org/r/185790/#review193350

Going to r- to get it out of my queue for now. If you want to use the unload event instead of a domwindowclosed notification, just let me know.
Attachment #8914458 - Flags: review?(wmccloskey) → review-
Comment on attachment 8914458 [details]
Bug 1391594 - If a browser window unloads, make sure ProcessHangMonitor removes event handlers.

https://reviewboard.mozilla.org/r/185790/#review192036

> Is this necessary? We already remove event listeners when a window is closed.
> http://searchfox.org/mozilla-central/rev/8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/dom/base/nsGlobalWindow.cpp#2110
> 
> I'd rather not add extra code if we don't need it.

You're right - I actually forgot that we did this and thought we had to do it manually for some reason!
Hey billm, I've realized there's another case I'm not dealing with here: the case where there are two windows A and B, and a hang notification belongs to a browser in only B, and then B is closed. It doesn't look like we handle that here.

In order to fix that, I'll need a way of determining if there are any hang reports that don't have browsers in any active windows.

I _could_ iterate the browsers of each window and check each active hang report... basically we're talking about the (worst case), number of browsers multiplied by the number of active hang reports if we do this naively.

I guess I'm trying to avoid calling into isReportForBrowser for each browser since I know that crossing the JS<->Native border in a loop isn't super cheap.

What we _could_ do is expose the osPid of the hung process on each nsIHangReport, and then collect the osPid's on each browser's nsITabParent and take the difference to compute the hang reports without browsers.

Do you think it matters? Should I be safe to just do the naive approach that I listed first?
Flags: needinfo?(wmccloskey)
Nevermind - I think I have a better idea.

Each hang is, at its root, associated with a single guilty TabChild / TabParent pair. I think I can probably expose the guilty <xul:browser> on the nsIHangReport. From there, on domwindowclose, I can then loop through each active report and see if any of their guilty <xul:browser>'s belong to the closing window; if so, we can terminate them immediately.

I also considered maybe just doing something lower level - if a TabParent is told to be destroyed, have it tell the ProcessHangMonitor to terminate any associated hang reports. I think I'm going to try the first one - especially because it might be handy to have the guilty <xul:browser> on the report so that we can more directly blame the offender in the notification bar.
Flags: needinfo?(wmccloskey)
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews from comment #25)
> 
> Each hang is, at its root, associated with a single guilty TabChild /
> TabParent pair. I think I can probably expose the guilty <xul:browser> on
> the nsIHangReport.
> 

Note that I think I can only do this for script hangs in content, as opposed to plugin hangs. Not sure what to do about plugin hangs for now.
Attachment #8914458 - Attachment is obsolete: true
Comment on attachment 8914534 [details]
Bug 1391594 - Unilaterally terminate script and plugin hangs during shutdown or when there are no windows.

https://reviewboard.mozilla.org/r/185850/#review195712

::: browser/modules/ProcessHangMonitor.jsm:274
(Diff revision 3)
> +    this.stopAllHangs();
> +    this.updateWindows();
> +  },
> +
> +  stopAllHangs() {
> +    for (let report of this._activeReports) {

I just realized: doesn't this also need to iterate over _pausedReports?
Comment on attachment 8918485 [details]
Bug 1391594 - If a window closes while hosting a <xul:browser> hung on script, clear the hang.

https://reviewboard.mozilla.org/r/189340/#review195714

::: browser/modules/ProcessHangMonitor.jsm:282
(Diff revision 1)
>    },
>  
> +  onWindowClosed(win) {
> +    // If there are any script hangs for browsers that are in this window
> +    // that is closing, we can stop them now.
> +    for (let report of this._activeReports) {

I think we also should look at _pausedReports here too.

::: browser/modules/ProcessHangMonitor.jsm:283
(Diff revision 1)
>  
> +  onWindowClosed(win) {
> +    // If there are any script hangs for browsers that are in this window
> +    // that is closing, we can stop them now.
> +    for (let report of this._activeReports) {
> +      if (report.hangType == report.SLOW_SCRIPT &&

For plugin hangs, the conservative thing is probably to kill the plugin if any window closes.

::: browser/modules/ProcessHangMonitor.jsm:284
(Diff revision 1)
> +  onWindowClosed(win) {
> +    // If there are any script hangs for browsers that are in this window
> +    // that is closing, we can stop them now.
> +    for (let report of this._activeReports) {
> +      if (report.hangType == report.SLOW_SCRIPT &&
> +          report.scriptBrowser.ownerGlobal == win) {

I'm a bit worried about this since accessing scriptBrowser iterates over all tabs in the given content process. But it would be a lot of work to do better than this, and it's not that common a scenario.

It probably would be good to deal with the case when scriptBrowser throws or returns null. In those cases it's probably a good idea to just stop the script.
Attachment #8918485 - Flags: review?(wmccloskey) → review+
Comment on attachment 8918486 [details]
Bug 1391594 - Test that closing a window with a <xul:browser> with a hanging script causes that script to terminate.

https://reviewboard.mozilla.org/r/189342/#review195718
Attachment #8918486 - Flags: review?(wmccloskey) → review+
Comment on attachment 8914534 [details]
Bug 1391594 - Unilaterally terminate script and plugin hangs during shutdown or when there are no windows.

https://reviewboard.mozilla.org/r/185850/#review195712

> I just realized: doesn't this also need to iterate over _pausedReports?

Yeah, you're absolutely right. For some reason, I thought that pausedReports were a subset of activeReports, but [it looks like I was wrong!](http://searchfox.org/mozilla-central/rev/dca019c94bf3a840ed7ff50261483410cfece24f/browser/modules/ProcessHangMonitor.jsm#148-149). Good eye.
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aa3c417fbc15
Unilaterally terminate script and plugin hangs during shutdown or when there are no windows. r=billm
https://hg.mozilla.org/integration/autoland/rev/6c524db228f2
Add a test that checks that hangs are dealt with on shutdown. r=billm
https://hg.mozilla.org/integration/autoland/rev/5da2257baf23
If a window closes while hosting a <xul:browser> hung on script, clear the hang. r=billm
https://hg.mozilla.org/integration/autoland/rev/55b8c0216d9b
Test that closing a window with a <xul:browser> with a hanging script causes that script to terminate. r=billm
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: