Closed Bug 894215 Opened 8 years ago Closed 3 years ago

"Assertion failure: window" [@ nsDOMWindowList::IndexedGetter] with adoptNode, navigation

Categories

(Core :: DOM: Navigation, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: jruderman, Assigned: bzbarsky)

Details

(Keywords: assertion, sec-moderate, testcase, Whiteboard: [post-critsmash-triage][adv-main62+])

Attachments

(7 files, 1 obsolete file)

Attached file testcase
Assertion failure: window, at dom/base/nsDOMWindowList.cpp:102
Attached file stack
Attached file alt stack
Without the last part of the testcase, it usually hits the same assert from ssi_getSelectedPageStyle in SessionStore.jsm.  But that's on a longer timer.
Hrm.  We can add a null-check, but why do we have a child docshell that has a null window???
Flags: needinfo?(bugs)
So we're about to destroy the docshell we access, so GetInterface returns null.
But why do we have such docshell as a child when we call w[0]...
Flags: needinfo?(bugs)
Right, that was the question.  Why are we about to destroy a docshell that we have as a child?
Does anyone have any suggestion of a security rating for this?
Keywords: sec-high
Olli, should this be sec-moderate until we know that this is anything besides a null crash?
I'm going to reduce this to sec-moderate as it looks like just a null deref.  Feel free to increase it if you want.
Keywords: sec-highsec-moderate
This doesn't crash for me anymore, so I'd hazard a guess that this was fixed by Window WebIDL or something similar. Jesse, can you confirm that the testcase no longer works?
Flags: needinfo?(jruderman)
> so I'd hazard a guess that this was fixed by Window WebIDL or something similar

Unlikely.  The stack shown is the one we'd still take for indexed access on a window, and the same MOZ_ASSERT is still there...
I can still reproduce this bug.
Flags: needinfo?(jruderman)
Group: core-security → dom-core-security
This still reproduces on trunk. Note that the testcase requires you to set the "security.data_uri.unique_opaque_origin" and "security.data_uri.block_toplevel_data_uri_navigations" prefs to false now due to the restrictions on data URIs that were put in place.
Has Regression Range: --- → no
OK, so I think I know what's going on.  The sequence of events is as follows:

1) We load a page, with a subframe.

2) We start loading a new page.

3) We call nsDocShell::Embed on the new page's content viewer.  This calls
   SetupNewViewer, which calls nsDocShell::DestroyChildren etc.  This nulls
   out the parent pointer on the subframe docshell but it's still around,
   because we are planning to put the first page into bfcache.  It _also_
   captures the list of child docshells and stores it in the SHEntry
   (see nsDocShell::CaptureState).

4) This code runs:

    document.adoptNode(page1root);

   The adoptNode call removes page1root from the DOM of its document,
   which is the document that we are navigating away from.  At this point
   it has _not_ yet been "fully" placed in bfcache, because Destroy() hasn't
   been called on the viewer.  Specifically, the SHEntry has been stored
   on the documentviewer, but we have not called
   nsSHEntryShared::SetContentViewer yet.  So we are not yet watching for mutations
   to drop the document from bfcache.  Before returning from the adoptNode call,
   we end up calling Destroy() on the subframe's docshell (via
   nsDocument::MaybeInitializeFinalizeFrameLoaders being called from a
   scriptblocker coming off the stack).  This nulls out its mScriptGlobal pointer,
   which is why we have a docshell that has a null window.

5) This code runs:

    w.history.back();

   When back() is called, that lands in nsDocShell::InternalLoad, which does:

           nsCOMPtr<nsISHEntry> viewerEntry;
           prevViewer->GetHistoryEntry(getter_AddRefs(viewerEntry));
           if (viewerEntry == aSHEntry) {
             // Make sure this viewer ends up in the right place
             mContentViewer->SetPreviousViewer(nullptr);
             prevViewer->Destroy();
           }

   This Destroy() call places prevViewer's document in the bfcache.  Then we
   keep going with the back() navigation and load it from bfcache.

6) When going back via bfcache we grab the set of child shells from the shentry.
   This was stored back in step 3, so contains the now-torn-down docshell.

Upshot: The fix for bug 292962 didn't actually fix it very well.  There's this time period while a new page is loading when we have already grabbed some bfcache state but don't have the mutation observers set up yet, and if the old page is mutated during this time, bad things will happen.

At least the "holds weak refs to docshells" bit got fixed in bug 319551, so we get a null-deref here, not a UAF....

Anyway, we should fix things so we drop bfcache in this case.  That _may_ be enough, since even nsSHEntryShared::RemoveFromBFCacheAsync (which is what we do on mutations) calls DropPresentationState() sync unless it fails to dispatch the runnable.  And DropPresentationState() is what clears out the child shell list in the shentry.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Comment on attachment 8976779 [details] [diff] [review]
part 2.  Make sure we don't bfcache pages that get mutated after we capture their state

This part still has some issues I'm investigating...
Attachment #8976779 - Flags: review?(bugs)
Attachment #8976778 - Flags: review?(bugs) → review+
There are two issues being fixed here.  First, if DisallowBFCaching is called
before CanSavePresentation, we should really return false from
CanSavePresentation.  Otherwise we'll end up doing a bunch of state-capturing
work for no reason.

Second, if DisallowBFCaching is called between CanSavePresentation and
nsDocumentViewer::Destroy, we need to actually tear down the viewer state.
What we do right now is avoid putting the viewer into the SHEntry, but still
not tear down its presshell and so forth, which leads to asserts in
~nsDocumentViewer when this case is hit.
Attachment #8977079 - Flags: review?(bugs)
Attachment #8976779 - Attachment is obsolete: true
Comment on attachment 8977079 [details] [diff] [review]
part 2.  Fix the document "disallow bfcaching" mechanism to work without asserting

ahaa, 8 line context wasn't enough for DocumentViewer.
Attachment #8977079 - Flags: review?(bugs) → review+
Comment on attachment 8977080 [details] [diff] [review]
part 3.  Make sure we don't bfcache pages that get mutated after we capture their state

I wonder if we could have reused the MutationObserver session history has. But this isn't too complicated either.
Attachment #8977080 - Flags: review?(bugs) → review+
> I wonder if we could have reused the MutationObserver session history has.

Hmm.  I guess with the new setup where I just disallow bfcaching on the document (as opposed to the earlier attempt where it did something else) we could do that...  The logic might still end up more complicated and harder to follow, though.  In particular, we would need to add the observer when we set up the shentry, then remove it while we sanitize, then readd it.
https://hg.mozilla.org/mozilla-central/rev/131a40749cb9
https://hg.mozilla.org/mozilla-central/rev/4df8820be2aa
https://hg.mozilla.org/mozilla-central/rev/7f98616d7914
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
This grafts cleanly to both Beta and ESR60, though I'm not sure what the urgency is here for uplifting given how old the bug is. Is this something we should consider or just let it ride the trains?
Flags: needinfo?(bzbarsky)
Flags: in-testsuite?
Given the limited severity of the bug and the risk of any changes to this code, I don't think we should be uplifting anything here.
Flags: needinfo?(bzbarsky)
Depends on: 1462885
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
Hi Jesse,

Unfortunately I didn't managed to reproduce this issue by using an affected build (tried on the latest release 61, and with old asan builds).

I tried to reproduce this issue by modifying the prefs Ryan provided in Comment 12, but without luck.

Could you help us by verifying if this issue is fixed in Fx62?.

Thanks a lot!
Flags: needinfo?(jruderman)
It seems that Jesse is unavailable at this time (per abillings). Removing the qe-verify flag for now, as QA cannot reproduce the issue and therefore can't confirm whether it's fixed. 

Perhaps someone else could give it a try?
Flags: qe-verify+
Flags: needinfo?(jruderman)
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main62+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.