Racing subframe loads where a history load loses the race leaks

RESOLVED WORKSFORME

Status

()

Core
Document Navigation
RESOLVED WORKSFORME
6 years ago
5 years ago

People

(Reporter: Jesse Ruderman, Assigned: khuey)

Tracking

(Blocks: 1 bug, {mlk, testcase})

Trunk
mlk, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P3][fuzzblocker:session-history-leaks])

Attachments

(2 attachments)

(Reporter)

Description

6 years ago
Created attachment 543705 [details]
testcase

This testcase causes nsDocument and nsGlobalWindow objects to leak all the way through shutdown.
OS: Linux → All
Hardware: x86_64 → All
Version: 1.9.2 Branch → Trunk
I've been investigating this.  Taking the bug so we don't duplicate effort here.
Assignee: nobody → khuey
This was a fun one.

The leak here is caused by a cycle between a docshell and a nsSHEntry.  The docshell owns the nsSHEntry (through mOSHE) and the nsSHEntry owns the docshell (through mChildShells).  The leak is just an observable consequence of a far deeper bug though.

The key code here is:

> function b3()
> {
>   frameA.history.forward();
>   frameB.location.hash = "#g";
>   setTimeout(b4, 800);
> }

We navigate frame A forwards to "data:text/html,2".  The results in us finding frame A's next nsSHEntry from the tree and sticking it in docShellA.mLSHE before we return from history.forward().  This does not, however, update the nsSHistory's index.  That does not occur until the navigation of frame A completes, which happens asynchronously.

Then we come along and navigate frameB by setting location.hash.  Because this is a hash navigation we end up in the short-circuited load case in nsDocShell::InternalLoad.  This load completes *synchronously*.  Thus, when we go to fix up the history for the hash navigation the nsSHistory is still on frameA's old state (the entries for "data:text/html,1").

We add the session history tree for this hash navigation which results in the nsSHTransaction corresponding to frameA's "data:text/html,2" state being removed from the chain.  The nsSHTransaction is destroyed, and it releases the nsSHEntry corresponding to the parent docshell.  That nsSHEntry is destroyed and it sets all of its children's parent ptrs to null before it releases them.

The fun part here is that frameA's "data:text/html,2" nsSHEntry stays alive, because it's frameA's mLSHE.  It survives, but gets its parent ptr nulled.

Then the hash navigation finishes, we set the timeout, etc.  Eventually, the frameA navigation completes and frameA calls SetHistoryEntry(&mOSHE, mLSHE).  SetHistoryEntry does | nsISHEntry *newRootEntry = GetRootSHEntry(aEntry); | where aEntry is mLSHE (frame A's nsSHEntry for "data:text/html,2").  But aEntry looks like a root SHEntry because its parent ptr was nulled out earlier!  So the SetChildHistoryEntry dance ends up sticking this nsSHEntry onto the parent docshell (so it is now mOSHE for both the parent docshell and frameA's docshell).

Then, sometime later, we unload the parent page and when we freeze everything to stick it into the bfcache frameA's docshell gets attached to the parent's mOSHE ... which is frameA's mOSHE and the cycle is completed.

So the real underlying bug here is that subframe navigations race against each other when updating the state of the session history tree.
Created attachment 544859 [details] [diff] [review]
WIP

This is not perfect, but I think we should do something like this anyway.
This might fix the leak, but it still leaves the session history tree in a broken state, right?
Right. That is why the patch is not perfect :)
I won't pretend to understand comment 2, but it sounds epic.

If we're leaking nsDocument and nsGlobalWindow, that sounds like the leak can be quite large, is that right?  How frequent is it?
Blocks: 669241
Whiteboard: [MemShrink]
I don't think this leak is frequent.  It is pretty large though.
Anyways, I don't think I'm the right owner for this now.
Assignee: khuey → nobody
The leak can be as big as you want; we're leaking the two subframes and everything in them.

As for frequency, the leak requires that a history forward navigation (already sort of rare) race against another load and lose the race (in this testcase the other load is sync, so it always wins).  So I'm guessing this is not that common.
Actually, I think I have a workable plan here.
Assignee: nobody → khuey
Whiteboard: [MemShrink] → [MemShrink:P3]
Summary: Frame navigation leak → Racing subframe loads where a history load loses the race leaks
(Reporter)

Updated

6 years ago
Whiteboard: [MemShrink:P3] → [MemShrink:P3][fuzzblocker:session-history-leaks]
(Reporter)

Comment 11

5 years ago
WFM with the testcase in comment 0.

Kyle, does that make sense, given comment 2?  Should we file any new bugs on the issues you discovered while researching this?
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
Even if the testcase doesn't work anymore, comment 2 is still a problem.

If you want to close this out and file a separate bug on comment 2 I can live with that.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(Reporter)

Comment 13

5 years ago
Kyle, please file a separate bug on comment 2.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 14

5 years ago
I filed bug 830071.  Kyle, hopefully you can fill it in with details.
You need to log in before you can comment on or make changes to this bug.