Closed Bug 914521 Opened 6 years ago Closed 6 years ago

Crash [@ nsDocShell::InternalLoad] doing sync XHR while navigating a window that is also closing

Categories

(Core :: Document Navigation, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 + wontfix
firefox25 + fixed
firefox26 + fixed

People

(Reporter: jruderman, Assigned: bholley)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, regression, testcase)

Attachments

(3 files)

Attached file testcase
No description provided.
Attached file stack+
On the ~third time through the loop, it crashes due to a null mScriptGlobal.

Nightly: bp-2ed9927e-43b4-433b-bada-558192130910
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/6066b9d92032
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130522 Firefox/24.0 ID:20130522071950
Bad:
http://hg.mozilla.org/mozilla-central/rev/00b264c7cced
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130522 Firefox/24.0 ID:20130522092949
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6066b9d92032&tochange=00b264c7cced

Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/8d56511bfcec
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130521 Firefox/24.0 ID:20130521104616
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/75abbd7e7e24
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130521 Firefox/24.0 ID:20130521104716
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8d56511bfcec&tochange=75abbd7e7e24

Triggered by Bug 841312
Bobby, can you take a look?
Flags: needinfo?(bobbyholley+bmo)
(In reply to Boris Zbarsky [:bz] from comment #3)
> Bobby, can you take a look?

I can track it given it has reproducible test case and is a recent regression.But unless absolutely critical i don't think we can do anything for this in Fx24 as we have shipped our final beta and gone-to-build with RC.
So basically, docshell does a slightly more complicated version of the following [1]:

if (mScriptGlobal) {
  mScriptGlobal->DispatchSyncPopState();
  mScriptGlobal->DispatchAsyncHashChange();
}

However, the synchronous popstate event triggers a handler that opens a sync XHR, causing us to spin the event loop. The inner event loop triggers a window.close(), which is dispatched to the event loop. But since it's nested, we end up in ReallyCloseWindow() before returning from DispatchSyncPopState(). ReallCloseWindow() tears down the docshell, which calls nsDocShell::Destroy(), which nulls out mScriptGlobal. Pretty textbook event loop spinning bug.

We could save a ref to mScriptGlobal in a local variable, or null-check it. Both of those seem like wack-a-mole though. And I can't immediately think of anything all that much better. We could defer the window destruction until after the nested event loop finishes (which is what we used to do with termination functions), but that's probably observable from script, and doesn't really seem correct.

Any ideas Boris?

[1] http://mxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#9217
Flags: needinfo?(bobbyholley+bmo) → needinfo?(bzbarsky)
Per spec, sync XHR should only spin very small parts of the event loop...

In the short term, I think we should just use a stack var.
Flags: needinfo?(bzbarsky)
Note also MMAdeathGrip earlier in the function.
Attachment #803477 - Flags: review?(bzbarsky)
Comment on attachment 803477 [details] [diff] [review]
Hold a stack reference to mScriptGlobal when dispatching sync events. v1

r=me
Attachment #803477 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #6)
> Per spec, sync XHR should only spin very small parts of the event loop...
Note, showModalDialog tends to work as a spec-compliant spin-event-loop-at-odd-times.
B2G and Android presumably don't like the window.open in the crashtest, which makes sense. Disabling the test for those platforms and repushing:

https://hg.mozilla.org/integration/mozilla-inbound/rev/742d5330f0c3
https://hg.mozilla.org/mozilla-central/rev/742d5330f0c3
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
So, it seems that the test here makes Cipc timeout (https://tbpl.mozilla.org/?tree=Try&rev=7e1feb223053). This wasn't noticed because Cipc tests have not been running correctly for the past four weeks or so (bug 917576). Would someone be able to look into this please?
Flags: needinfo?(bobbyholley+bmo)
It's probably just an issue with opening windows in that configuration. Just disable the test (like we do for Android and B2G) for Cipc, however you do that.
Flags: needinfo?(bobbyholley+bmo)
Needs an uplift request for FF25
Assignee: nobody → bobbyholley+bmo
Comment on attachment 803477 [details] [diff] [review]
Hold a stack reference to mScriptGlobal when dispatching sync events. v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 841312
User impact if declined: crashes
Testing completed (on m-c, etc.): baked on m-c and m-a
Risk to taking this patch (and alternatives if risky): extremely low risk 
String or IDL/UUID changes made by this patch: None
Attachment #803477 - Flags: approval-mozilla-beta?
Note that there were 2 followups to disable the crashtest in various configurations (b2g, android, and Cipc). Keep that in mind when uplifting.
Depends on: 917576
Attachment #803477 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [Check comment20 before uplift]
The Android/B2G disabling came along for the ride when this was re-landed. I folded in the Cipc disabling from bug 917576, though.

https://hg.mozilla.org/releases/mozilla-beta/rev/5d92f187ec81
Whiteboard: [Check comment20 before uplift]
You need to log in before you can comment on or make changes to this bug.