Closed Bug 1316837 Opened 8 years ago Closed 8 years ago

Assertion failure: mSuspendDepth != 0, at nsGlobalWindow.cpp:11900

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox51 --- unaffected
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: cbook, Assigned: bkelly)

References

()

Details

(Keywords: assertion)

Attachments

(3 files, 2 obsolete files)

Found via bughunter and reproduced on latest windows tinderbox trunk debug build.

Steps to reproduce: 

-> Load https://eregister.britishdesign.ru/uss/en_US/
-->> Assertion failure in debug trunk build

seems nightly only, regression from bug 1300659 maybe ? 

Assertion failure: mSuspendDepth != 0, at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/dom/base/nsGlobalWindow.cpp:11900
#01: mozilla_dump_image[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0xd6a4cb]
#02: mozilla_dump_image[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0xd89524]
#03: mozilla_dump_image[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x1d13312]
#04: XRE_AddStaticComponent[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x1e8184]
#05: NS_StringSetIsVoid[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x211817]
#06: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x5ba961]
#07: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x5baa97]
#08: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59b09e]
#09: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59b056]
#10: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59ad9f]
#11: mozilla_dump_image[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x1d6d221]
#12: mozilla_dump_image[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x1dc21b7]
#13: XRE_RunAppShell[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x25850e4]
#14: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x5ba9c4]
#15: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59b09e]
#16: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59b056]
#17: mozilla::net::LoadInfo::TriggeringPrincipal[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x59ad9f]
#18: XRE_InitChildProcess[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\xul.dll +0x2584c7e]
#19: ???[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\firefox.exe +0x18b5]
#20: ???[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\firefox.exe +0x1638]
#21: ???[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\firefox.exe +0x20a4]
#22: TargetNtUnmapViewOfSection[c:\bughunter\firefox-52.0a1.en-US.win32\firefox\firefox.exe +0x32bc1]
#23: BaseThreadInitThunk[C:\Windows\system32\kernel32.dll +0x4ee1c]
#24: RtlInitializeExceptionChain[C:\Windows\SYSTEM32\ntdll.dll +0x637eb]
#25: RtlInitializeExceptionChain[C:\Windows\SYSTEM32\ntdll.dll +0x637be]
looks like a recent regression could you take a look ?
Flags: needinfo?(bugs)
Flags: needinfo?(bkelly)
Attached file bughunter stack
Looks like a child frame is getting added while the window is in modal state from sync xhr, but we're not correctly applying that state to the new child.
Assignee: nobody → bkelly
Blocks: 1303167
Status: NEW → ASSIGNED
Flags: needinfo?(bugs)
Flags: needinfo?(bkelly)
Attached patch fix-1.patch (obsolete) — Splinter Review
It seems that if you use `document.write()` while the window is in the `Suspend()` state then the new inner window won't inherit the suspended state.

This patch avoids the problem in this page by making sure the window gets `Resume()` called before the `document.write()` can happen.  Thats kind of just papering over the issue, though.

I'm going to see if I can figure out a better fix.

Still, it seems reasonable to get rid of this extra runnable in sync xhr.  We might want to do both.  Lets see if this breaks anything else:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=78612efd5c977458c8c85f176545130f15095699
So, whats happening here is:

1) sync xhr runs putting the window in a suspended state
2) sync xhr completes and queues a runnable to resume the window
3) js script uses document.open() and document.write() which replaces the window
4) runnable attempts to resume the window, but the new window is not suspended

In this case I think its a bug that the sync xhr allows js to code to run while the window is suspended.  Note that sync xhr fires pending events immediately and does not wait like it does for the resume.

That being said, I'm also slightly concerned we might have more cases like this where a runnable is dispatched that assumes the window stays in a particular state but its possible for a navigation or document.open() to replace the inner window.  To help catch those issues I'm going to convert some assertions into MOZ_DIAGNOSTIC_ASSERT().

Finally, I'll write a test for this particular sync xhr issue.
By the way, I think we probably hit this same thing before, but the old code used a non-fatal NS_ASSERTION.  I changed it to a fatal MOZ_ASSERT() to catch stuff like this.
Actually, comment 5 may be wrong.  Let me investigate a bit more.
Actually, the issue is we're calling Resume() on an inner window that is in the bfcache.  This iterates the current inner window's children which includes an iframe in this case.  This child frame doesn't have the current suspend state, though, because the new window was never suspended.

We should absolutely not iterate these children from a bfcache inner.  I added some assertions to see if we are doing this anywhere in our current test coverage:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2037f06becb1eba5aff25b8cdc7415b68bcfb4dc
This patch short-circuits suspend/resume if they are called on inner windows that are not the current inner window.  This should be safe because we specifically do not allow suspended windows in the bfcache and other doomed windows won't become active again AFAIK.

We could also put some asserts here and try to fix the cases that call this on the non-current inner window.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fa3c3831271e9ae73332bdf4122281010a3d26b
Attachment #8809798 - Attachment is obsolete: true
Comment on attachment 8809949 [details] [diff] [review]
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug

Based on the try from comment 8 I'm fairly certain this try push will work as well.  So going to flag now.

See comment 9 for description.

I will add a test on Monday.
Attachment #8809949 - Flags: review?(bugs)
Comment on attachment 8809949 [details] [diff] [review]
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug

># HG changeset patch
># User Ben Kelly <ben@wanderview.com>
># Parent  d38d06f85ef59c5dbb5d4a1a8d895957a78714de
>Bug 1316837 P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug
>
>diff --git a/dom/base/nsGlobalWindow.cpp b/dom/base/nsGlobalWindow.cpp
>--- a/dom/base/nsGlobalWindow.cpp
>+++ b/dom/base/nsGlobalWindow.cpp
>@@ -11838,17 +11838,27 @@ nsGlobalWindow::CloneStorageEvent(const 
>   RefPtr<StorageEvent> event = StorageEvent::Constructor(this, aType, dict);
>   return event.forget();
> }
> 
> void
> nsGlobalWindow::Suspend()
> {
>   MOZ_ASSERT(NS_IsMainThread());
>-  MOZ_ASSERT(IsInnerWindow());
>+  MOZ_DIAGNOSTIC_ASSERT(IsInnerWindow());
>+
>+  // We can only safely suspend windows that are the current inner window.  If
>+  // its not the current inner, then we are in one of two different cases.
>+  // Either we are in the bfcache or we are doomed window that is going away.
>+  // We do not allow suspended windows in bfcache,
I don't understand this comment. We do freeze windows which go to bfcache, and freezing means also suspending.

> nsGlobalWindow::Resume()
> {
>   MOZ_ASSERT(NS_IsMainThread());
>-  MOZ_ASSERT(IsInnerWindow());
>+  MOZ_DIAGNOSTIC_ASSERT(IsInnerWindow());
>+
>+  // We can only safely resume a window if its the current inner window.  If
>+  // its not the current inner, then we are in one of two different cases.
>+  // Either we are in the bfcache or we are doomed window that is going away.
>+  // We do not allow suspended windows in bfcache, so Resume should never be
>+  // needed in that case.  If the window is doomed then there is no point in
>+  // resuming it.
>+  if (!AsInner()->IsCurrentInnerWindow()) {
>+    return;
>+  }
So based on your comment 8, this means we don't necessarily resume the window we suspended, meaning that we leave window broken state in bfcache.
What if someone then does history.back()? We end up staying in suspended state and the page just doesn't work.

Should we perhaps drop the window from bfcache in this kind of case? Better would be to resume it, and its nested iframe, but not child docshells of the current inner window
Attachment #8809949 - Flags: review?(bugs) → review-
I'm talking about this check:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#8263

We must decide if a document will go into bfcache before its frozen.  If its already suspended we refuse to put it in the bfcache.

If we decide it can go into the bfcache then Freeze() is called while the window is still the current inner window.  Note that inner comes from GetCurrentInnerWindowInternal() here:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#13496

When a document is restored from bfcache the window is first set back to the current inner window and then thawed.  Again, note the inner comes from GetCurrentInnerWindowInernal() here:

https://dxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#13539

So I believe we can safely say Suspend/Resume should never be called on an inner window that is not the current inner window.  This patch makes it just ignore calls that attempt to do that, but I would also be open to added diagnostic assertions instead so we can fix places that try to do it.  I think that might be better for the long term health of the code.

Does that help?  What do you think?
Flags: needinfo?(bugs)
The two places I know of that might try to Suspend/Resume on an inner that is not the current inner are:

1. The sync xhr nsResumeTimeoutsRunnable thing.
2. My nsGlobalWindow::MaybeApplyBackPressure() code could definitely try to call Resume() on a non-current inner and might also try to call Suspend on a non-current inner.

Not sure what else might try it.  Since the calls are exposed to chrome script its a bit harder to trace all cases.  This is why adding diagnostic assertions might be a good idea.  That way we can see what triggers in non-DEBUG builds up to the beta channel.  Release channel would not trigger the assertions, though.
(In reply to Ben Kelly [:bkelly] from comment #12)
> I'm talking about this check:
> 
> https://dxr.mozilla.org/mozilla-central/source/dom/base/nsDocument.cpp#8263
> 
> We must decide if a document will go into bfcache before its frozen.  If its
> already suspended we refuse to put it in the bfcache.

Sure, but the comment says "We do not allow suspended windows in bfcache". And windows certainly are suspended when they are in bfcache. So just rephrase the comment.



Ok, XHR can't be used when it is from non-current inner window. There are explicit CheckInnerWindowCorrectness() checks in many place.
So, looks like the patch should work.
Flags: needinfo?(bugs)
Comment on attachment 8809949 [details] [diff] [review]
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug

>+  // We can only safely resume a window if its the current inner window.  If
>+  // its not the current inner, then we are in one of two different cases.
>+  // Either we are in the bfcache or we are doomed window that is going away.
>+  // We do not allow suspended windows in bfcache,
Could you rephrase "We do not allow suspended windows in bfcache" to say something that we do not allow suspended
windows to enter bfcache or so.
Attachment #8809949 - Flags: review- → review+
Updated the comments as requested.
Attachment #8809949 - Attachment is obsolete: true
Attachment #8810465 - Flags: review+
This test triggers the assertion from comment 0.  It passes with the P1 patch applied.  I did this as a mochitest because its verifying some unique gecko stuff rather than web platform behavior.
Attachment #8810467 - Flags: review?(bugs)
Attachment #8810467 - Flags: review?(bugs) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8cf305046e5
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/659a966e70ea
P2 Verify that navigating to a window with an iframe right after sync xhr does not trigger assertions. r=smaug
Comment on attachment 8810465 [details] [diff] [review]
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug

Approval Request Comment
[Feature/regressing bug #]: Bug 1303167
[User impact if declined]: If a user or page navigates while the window is suspended we may incorrectly suspend frames in the new page.  This is a bit of corner case, but its worth closing it now before it hits release channels.
[Describe test coverage new/current, TreeHerder]: The P2 patch in this bug adds a new mochitest that triggers the issue.
[Risks and why]: Minimal.  We are adding additional checking for a rare condition, but not modifying the common case.
[String/UUID change made/needed]: None
Attachment #8810465 - Flags: approval-mozilla-aurora?
Comment on attachment 8810467 [details] [diff] [review]
P2 Verify that navigating right after sync xhr does not trigger assertions. r=smaug

See comment 20.
Attachment #8810467 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/b8cf305046e5
https://hg.mozilla.org/mozilla-central/rev/659a966e70ea
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8810465 [details] [diff] [review]
P1 Only allow suspend/resume/freeze/thaw on current inner windows. r=smaug

don't freeze/resume a non-current inner window, fixing a new assertion failure on 52, take this in aurora
Attachment #8810465 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8810467 [details] [diff] [review]
P2 Verify that navigating right after sync xhr does not trigger assertions. r=smaug

testonly update for bug 1316837, should land together with the P1 patch
Attachment #8810467 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: