Last Comment Bug 335998 - (strongparent) parentNode of element gets lost
(strongparent)
: parentNode of element gets lost
Status: VERIFIED FIXED
: testcase
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: x86 Windows XP
: P3 normal with 18 votes (vote)
: mozilla9
Assigned To: Olli Pettay [:smaug] (vacation Aug 25-28)
:
Mentors:
http://home.arcor.de/martin.honnen/mo...
: 413127 446248 456371 (view as bug list)
Depends on: 653716 454570 461640 463410 463424 508373 523080 523081 523082 523083 588681 652970 653420 653497 654182 662186 664430 664434 664438 664444 664447 664452 664455 664463 664464 664467 664470 664472 664506 665317 673806 708572
Blocks: acid3 413125 448993 465767 615034 674276 CVE-2011-3671
  Show dependency treegraph
 
Reported: 2006-04-30 06:13 PDT by Martin Honnen
Modified: 2014-04-26 03:18 PDT (History)
59 users (show)
jst: blocking1.9.2-
jst: wanted1.9.2+
dsicore: wanted1.9.1-
dbaron: blocking1.9-
reed: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wanted


Attachments
Strong parent pointer (4.75 KB, patch)
2008-05-21 10:51 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
browser test leaks still one document (et al.) (10.56 KB, patch)
2008-08-05 16:39 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP, doesn't leak (34.46 KB, patch)
2008-09-26 14:14 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
root of the content subtree owns document (40.38 KB, patch)
2008-10-05 10:30 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
up-to-date + missing null checks to xul tree handling. (37.89 KB, patch)
2009-03-18 09:08 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
up-to-date (38.56 KB, patch)
2009-07-20 12:45 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
v14 (39.11 KB, patch)
2009-07-27 13:31 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
v16 (39.88 KB, patch)
2009-08-04 14:23 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
v18 (43.58 KB, patch)
2009-08-06 08:45 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
v19 (43.49 KB, patch)
2009-08-07 11:29 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP (14.36 KB, patch)
2011-04-27 11:26 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP2 (15.34 KB, patch)
2011-04-27 12:59 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP3, yet more unlinking (15.88 KB, patch)
2011-04-27 14:11 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP4, for backup (40.87 KB, patch)
2011-05-04 13:56 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP5 (55.39 KB, patch)
2011-06-08 09:20 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
WIP6 (55.13 KB, patch)
2011-06-15 07:41 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
patch (14.60 KB, patch)
2011-06-15 09:51 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
peterv: review+
Details | Diff | Splinter Review
patch (14.57 KB, patch)
2011-07-25 11:21 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review

Description Martin Honnen 2006-04-30 06:13:20 PDT
The test case at
<http://home.arcor.de/martin.honnen/mozillaBugs/domLevel2/parentNodeProblem1.html>
has three div elements, |div id="div1"| contains |div id="div2"| as a child node which itself contains |div id="div3"| as a child node.
The script code in the document in the onload handler first stores a reference to |div id="div3"| in a global variable, then removes all child nodes of |div id="div1"| without storing a reference to them.
Then the parentNode of the stored |div id="div3"| is checked and initially is the |div id="div2"| element in Mozilla and other implementations (IE, Opera).
However with Mozilla if you then use the button to check the parentNode again and again Mozilla loses the parentNode and reports it as null. Other browsers don't do that, the parentNode of the stored |div id="div3"| element remains the |div id="div2"| element.

I see the problem with a recent trunk nightly Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060429 Minefield/3.0a1 as well as with Firefox 1.5 release as well as with Mozilla 1.7 builds.
Comment 1 Boris Zbarsky [:bz] 2006-04-30 08:41:27 PDT
So nodes only hold weak refs to parent nodes....  

Could we maybe have nsGenericElement::GetSCCIndex ensure the parent is wrapped?  Not sure whether creating JSObjects at that point is ok or whether the effect on performance would be ok.
Comment 2 Peter Van der Beken [:peterv] 2006-04-30 08:59:12 PDT
I think this used to work because we used to make the scope chain follow the DOM parent chain, but we changed that for IE compat (see bug 147058).
Comment 3 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-04-30 10:23:45 PDT
You don't want to hack this by creating wrappers that could just go away -- you'll just fix it most of the time, except when GC runs.  Unless you want to use the wrapper preservation mechanism on all wrappers that are the root of a fragment.

I think what you really want is a mechanism where the document manages all of the fragments that it owns; enforcing WRONG_DOCUMENT_ERR and implementing importNode and adoptNode would be a nice first step there.  (This might even be required if you want to do it using wrapper preservation.)
Comment 4 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-04-30 13:21:00 PDT
having every document manage all its disconnected trees sound expensive since during normal DOM manipulation a lot of small trees are created and destroyed (think single-element trees).
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2006-04-30 13:51:53 PDT
A hashtable insertion or removal for operations like that shouldn't be too bad.  How many of those do JS and XPConnect do in the normal execution of such an operation?
Comment 6 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-05-01 01:59:15 PDT
We would have to sprinkle hash-adding functions all over the place, at least in all the document.create* implementations and all the UnbindFromTree implementations.
Comment 7 Boris Zbarsky [:bz] 2006-05-01 08:43:17 PDT
So here's another fundamental question.  This bug is about keeping any node in a subtree alive keeping the whole subtree alive, right?  For JS I think we clearly want that.  Do we want it for C++ or python or whatever as well?  If so, we should consider non-JS-bound solutions (eg actually hold strong refs both ways and use graydon's cycle collector thing or something).
Comment 8 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-05-01 08:48:28 PDT
I don't necessarily care about the C++ case. Python would be nice though the same way python would be nice to have similar behaviour where nodes keep their document around.
Comment 9 Boris Zbarsky [:bz] 2006-05-01 08:58:29 PDT
I just feel that if we're going to try to solve this for multiple languages it may possibly be better to change the actual DOM tree impl as needed instead of doing per-language things.
Comment 10 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-05-01 17:55:22 PDT
Yeah, i agree. What i'm worried about though is small leaks blowing up into huge ones. We do occationally leak elements, but we don't want that to cause entire documents to leak.
Comment 11 Boris Zbarsky [:bz] 2006-05-01 20:45:16 PDT
Actually, generally speaking leaking an element already leaks the document.  Certainly if we leak the element's JSObject too.
Comment 12 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-05-02 00:53:24 PDT
I'd think it's more rare that we leak the JSObject since we have GC goodness there.
Comment 13 Jesse Ruderman 2006-05-11 19:35:25 PDT
> enforcing WRONG_DOCUMENT_ERR and implementing importNode and adoptNode

Yikes, that would break a lot of my code, and probably other people's code too.
Comment 14 Boris Zbarsky [:bz] 2006-05-11 19:38:30 PDT
Please see the relevant bug.  Don't start that discussion in this bug.
Comment 15 Jesse Ruderman 2006-05-11 19:55:10 PDT
I guess that's bug 47903.  Does this bug depend on it?
Comment 16 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-01-07 04:57:35 PST
Now that cycle collector works pretty well and document::destroy is cleaned up 
etc. making child nodes to own their parent might work quite well.
Comment 17 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-01-19 08:32:25 PST
*** Bug 413127 has been marked as a duplicate of this bug. ***
Comment 18 :Gijs Kruitbosch (away 26-29 incl.) 2008-01-19 08:38:11 PST
Yeah, so now we actually *do* enforce importNode and adoptNode, and at least for ChatZilla this is a major issue (bug 413125) on trunk. Ironically it's an issue because we can't find the ownerDocument of a node anymore so as to be able to adoptNode new child nodes... :-(
Comment 19 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-01-19 09:22:44 PST
I've never seen us leak just elemenets anyway so in reality we're likely leaking big on shutdown if we leak an element.
Comment 20 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-01-19 09:24:19 PST
I would be very worried about making such a change at this point in the release cycle. But for mozilla2 we should definitely do that. Though things will be very different then since we'll be using mmGC.
Comment 21 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-01-19 09:28:21 PST
(In reply to comment #19)
> I've never seen us leak just elemenets anyway so in reality we're likely
> leaking big on shutdown if we leak an element.

Upps, sorry, this was meant for another bug.
Comment 22 :Gijs Kruitbosch (away 26-29 incl.) 2008-01-19 09:29:13 PST
(In reply to comment #21)
> (In reply to comment #19)
> > I've never seen us leak just elemenets anyway so in reality we're likely
> > leaking big on shutdown if we leak an element.
> 
> Upps, sorry, this was meant for another bug.
> 

But comment #20 *was* meant for this bug? :-(
Comment 23 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-01-19 09:47:27 PST
yes
Comment 24 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-05-21 10:51:12 PDT
Created attachment 321973 [details] [diff] [review]
Strong parent pointer

This fixes the testcase.
Running chrome test doesn't show leaks. browser test leaks even without the patch. Running mochitest and waiting results from TryServer/Talos.
Comment 25 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-05-21 12:03:55 PDT
Comment on attachment 321973 [details] [diff] [review]
Strong parent pointer

This seems to cause one nsDocument leak with browser test and 2 nsDocument leaks with mochitest. Not too bad, I'd say. Going to look at what causes those leaks.
Comment 26 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-05-21 15:56:21 PDT
Wouldn't you also want to make the ownerDocument pointer (i.e. nsNodeInfoManager::mDocument) strong to ensure that things like comment 18 really work.

Though I still think we should wait until mmgc for this...
Comment 27 Damon Sicore (:damons) 2008-06-04 16:43:12 PDT
marking wanted1.9.1? to get this in the triage queue.  
Comment 28 Damon Sicore (:damons) 2008-06-10 16:19:58 PDT
Ok, after discussing this with Jonas, we don't think this has to be in the 1.9.1 release, but Smaug, you are welcome to take this.
Comment 29 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2008-08-05 16:34:11 PDT
*** Bug 446248 has been marked as a duplicate of this bug. ***
Comment 30 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-08-05 16:39:25 PDT
Created attachment 332444 [details] [diff] [review]
browser test leaks still one document (et al.)
Comment 31 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-09-10 09:31:00 PDT
So given that this is part of Acid3, I guess we shouldn't wait for mmgc (or bome or whatever it'll be).

We definitely need to track down the leaks that appear if you enable owning parent/ownerDocument references though.
Comment 32 Jesse Ruderman 2008-09-10 12:26:19 PDT
> (or bome or whatever it'll be).

Do you know what the time line is for experimenting with and possibly integrating Boehm GC?
Comment 33 Graydon Hoare :graydon 2008-09-10 13:20:04 PDT
(In reply to comment #32)
> > (or bome or whatever it'll be).
> 
> Do you know what the time line is for experimenting with and possibly
> integrating Boehm GC?

Benjamin started, and I am presently continuing conducting the experiment. The browser executes for quite a while with Boehm managing the heap and treating malloc/free memory as uncollectable (poorly but functionally). In this mode, however, it buys us nothing over jemalloc, and in fact performs worse. In the "real" mode -- treating such memory as collectable -- it executes for only a brief period before crashing. 

I am investigating this, but I would not hold your breath at the moment. Any estimate I made about how long it will take to fix "all" the things that might be causing it to crash would be complete fiction. Adjustments to global memory behavior are very tricky to get right, as you know.
Comment 34 Brendan Eich [:brendan] 2008-09-22 10:26:10 PDT
*** Bug 456371 has been marked as a duplicate of this bug. ***
Comment 35 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-09-24 15:20:13 PDT
Making nsRange Ccollectable drops mochitest leaks to 1 document (+other stuff).
...now trying to find which test causes that leak.
Comment 36 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-09-24 16:04:48 PDT
test_bug382990.xul causes leaks
Comment 37 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-09-26 14:14:27 PDT
Created attachment 340645 [details] [diff] [review]
WIP, doesn't leak

This doesn't leak any refcounted objects in mochitest or chrome, and nothing new
in browser-chrome (and those very few leaks are image cache related).
I'm not happy with NODE_OWNS_OWNER_DOCUMENT. Traverse/unlink should go via
nodeinfo/nodeinfomanager, but more important thing was to track down leaks.
Next to figure out how to modify nodeinfo handling (and even without crashes 
which I got when trying the most obvious changes).
Comment 38 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-10-05 10:30:30 PDT
Created attachment 341824 [details] [diff] [review]
root of the content subtree owns document

Peter, want to look at this patch and give comments?
I was trying to reduce traversing: only root of a content subtree owns
document and each content object owns parent content. So traversing doesn't go
via nodeinfo. (Attr nodes aren't handled)
Passes tests without leaks. Running now in try server - an older version of the
patch didn't seem to cause bad perf regressions, though try server isn't
too reliable.
Comment 39 Boris Zbarsky [:bz] 2008-10-05 11:31:16 PDT
Don't we already skip traversing DOM nodes whose GetCurrentDoc is currently being rendered?  If so, and if doing more traversing would make the patch cleaner, we should perhaps do that.
Comment 40 Olli Pettay [:smaug] (vacation Aug 25-28) 2008-10-05 12:04:35 PDT
(In reply to comment #39)
> Don't we already skip traversing DOM nodes whose GetCurrentDoc is currently
> being rendered?
That is sort of different thing, and yes, we don't traverse in that case.
But there are documents which are never 'rendered'.
If we can easily skip traversing element->ownerDocument almost every case, 
it might be worth to try.

>  If so, and if doing more traversing would make the patch
> cleaner, we should perhaps do that.
Most of the patch is just adding new stuff to CC. AddRefing/Releasing
parent/ownerDoc is pretty simple, IMO.
Perhaps I should try to add some code to see how much we eventually traverse, when document is being CC'd.

Anyway, all comments welcome.
Comment 41 Bill Gianopoulos [:WG9s] 2008-10-28 03:24:06 PDT
(In reply to comment #38)
> Created an attachment (id=341824) [details]
> root of the content subtree owns document

This patch no longer applies cleanly.
Comment 42 Bill Gianopoulos [:WG9s] 2008-11-06 06:46:43 PST
If and when the patch for bug 457022 lands and actually sticks, that severely bitrots portions of this patch to the point where I am not certain how to correctly modify and apply it.
Comment 43 Peter Van der Beken [:peterv] 2008-11-06 07:19:28 PST
There's nothing incompatible between the two patches, it's just a case of merging. This patch will probably get updated once it has been reviewed. I'm currently reviewing (also by splitting off some chunks in separate bug). There's no need to keep commenting about the fact that the patch doesn't apply, it happens.
Comment 44 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-02-06 04:13:46 PST
Comment on attachment 341824 [details] [diff] [review]
root of the content subtree owns document

This needs a new patch.
Part of the patch has already landed (nsRange at least)
Comment 45 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-03-18 09:08:51 PDT
Created attachment 368031 [details] [diff] [review]
up-to-date + missing null checks to xul tree handling.
Comment 46 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-07-20 12:45:19 PDT
Created attachment 389534 [details] [diff] [review]
up-to-date

...but something is now leaking. Try server shows that one document and its elements are leaking
Comment 47 Johnny Stenback (:jst, jst@mozilla.com) 2009-07-22 17:12:32 PDT
Not blocking 1.9.2 on this.
Comment 48 Bill Gianopoulos [:WG9s] 2009-07-22 17:17:54 PDT
(In reply to comment #47)
> Not blocking 1.9.2 on this.

I thought 1.9.2 was supposed to pass ACID3.
Comment 49 Reed Loden [:reed] (use needinfo?) 2009-07-22 17:24:45 PDT
(In reply to comment #48)
> (In reply to comment #47)
> > Not blocking 1.9.2 on this.
> 
> I thought 1.9.2 was supposed to pass ACID3.

That's what I thought, too (and I read somewhere, I think?). Re-nom'ing.
Comment 50 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-07-22 17:30:34 PDT
If the only reason to fix this is that someone said somewhere that 1.9.2 would pass ACID3, I almost hope we _don't_ fix it.  We should fix bugs that affect users and web developers before we start implementing things just to get another point on ACID3.

I certainly wouldn't hold the release of 1.9.2 if this were the last bug remaining, speaking only for myself.
Comment 51 Johnny Stenback (:jst, jst@mozilla.com) 2009-07-22 17:50:22 PDT
(In reply to comment #48)
> (In reply to comment #47)
> > Not blocking 1.9.2 on this.
> 
> I thought 1.9.2 was supposed to pass ACID3.

I haven't heard anyone make that decision, and I certainly don't think it'd be made now given the short schedule. I'd still gladly consider a patch for this, but I'm with shaver here, I wouldn't hold the release for a point on ACID3.
Comment 52 Bill Gianopoulos [:WG9s] 2009-07-22 18:00:04 PDT
(In reply to comment #51)
> > I thought 1.9.2 was supposed to pass ACID3.
> 
> I haven't heard anyone make that decision, and I certainly don't think it'd be
> made now given the short schedule. I'd still gladly consider a patch for this,
> but I'm with shaver here, I wouldn't hold the release for a point on ACID3.

Well that is kind of what was said to answer why Firefox 3.5 was going to be released not passing ACID3.  I agree that the pass level we have now is about all that is needed, the test that we fail are for features that probably never belonged being part of ht e test in the first place.

However, besides technical issues there are also political issues.

Given the close schedule after 3.5 for this next branching I can see that there is now way that it will include the SVG font support, blocking for ACID3 issues probably makes no sense.  However, having another release with out announcing a commitment to make the next version pass ACID3 with a sort of slushy date associated with that would probably be a political mistake.
Comment 53 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-07-22 18:06:21 PDT
I would rather make a political mistake than have someone work on this instead of a bug or feature that will actually affect people.  In fact, I would describe *that* as a political mistake, since it would be making a poor choice due to (theoretical) political pressure.

FF3.6 will be released not passing ACID3 for the same reason that FF3.5 was released without passing ACID3, and FF3, and FF2 for that matter -- because there were other things that we felt were more important than the items required to pass ACID3.  (Since all the things in ACID3 were standards before FF2 was released, IIRC, we could have chosen to implement all of those features, on their own merits.  We didn't, though dbaron's post on the topic points to some interesting analysis on such stuff.)

I would consider it a gift if people didn't use ACID3 to justify a blocking nomination again!
Comment 54 Boris Zbarsky [:bz] 2009-07-22 18:51:56 PDT
I would also strongly oppose a "commitment to make the next version pass ACID3", if that matters.  Or any version.  If we pass it, it'll be because the things it tests stand on their merits (or someone whose goal is actually passing ACID3 decides to do something instead of just complaining), not because someone chose to test them.
Comment 55 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-07-27 13:31:42 PDT
Created attachment 390890 [details] [diff] [review]
v14

Still one leak when tryserver runs mochitest.
Can't reproduce on my linux system, testing elsewhere....
Comment 56 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-04 14:23:28 PDT
Created attachment 392569 [details] [diff] [review]
v16

Passes tryserver tests at least on Windows.

This includes two hacks.
- unbound native anon doesn't own the document. This is because of the bug in
  generated content handling. There is occasionally a loop
  content->a_property_which_owns_the_content->content
  Fixing this could be a followup, IMO, because it needs some changes to
  generated content handling.
- Running mochitest (or even a small part of it) using --close-when-done and
  --autorun leaks without the change to quit.js.
  Note, there is no leak without --close-when-done.
  So far I haven't found what causes the extra reference to an element.
  DEBUG_CC isn't too helpful here.
  I think something somewhere assumes normal shutdown, not a special case like
  quit.js where a content page calls Quit().
Comment 57 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-06 08:45:26 PDT
Created attachment 392950 [details] [diff] [review]
v18

The iframe related leak was the nsContentList cache after all.

Passed tryserver tests without leaks.

Unbind has a small hack for native anonymous content (followup bug 508373).

If we want to try to make nodeinfomanager to own doc etc., that could be 
hopefully done in a followup bug. The node-owns-doc part of this patch is pretty small to backout. (Or if really wanted, I could do that in this bug.)

This is something which should land to 1.9.3 pretty early, IMO.
Comment 58 Boris Zbarsky [:bz] 2009-08-06 08:59:25 PDT
Is there a reason to addref/release our GetParent() instead of our GetNodeParent()?
Comment 59 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-08-06 09:31:59 PDT
Why all the magic with NS_MAYBEADDREFOWNERDOC? Why not simply make nsNodeInfoManager hold a strong reference to its document?
Comment 60 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-06 10:59:32 PDT
(In reply to comment #59)
> Why all the magic with NS_MAYBEADDREFOWNERDOC? Why not simply make
> nsNodeInfoManager hold a strong reference to its document?
Because at least originally that just didn't work as easily as
the ownerdoc approach. Way more leaks.
But the nodeinfomanager approach is perhaps something to do.

(In reply to comment #58)
> Is there a reason to addref/release our GetParent() instead of our
> GetNodeParent()?
That doesn't really matter here.
Comment 61 Boris Zbarsky [:bz] 2009-08-06 11:04:06 PDT
GetParent() is one extra branch, no?  I agree it's probably not a big deal if we're getting of the MAYBEADDREF stuff.  Otherwise, it can avoid that stuff for kids of the document...
Comment 62 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-08-06 14:10:55 PDT
(In reply to comment #60)
> (In reply to comment #59)
> > Why all the magic with NS_MAYBEADDREFOWNERDOC? Why not simply make
> > nsNodeInfoManager hold a strong reference to its document?
> Because at least originally that just didn't work as easily as
> the ownerdoc approach. Way more leaks.

Any idea why? It seems like a much cleaner solution.
Comment 63 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-07 04:02:08 PDT
It isn't really that clean solution. Currently there is the assumption that
nsINode has always nodeinfo and nodeinfo belongs to nodeinfomanager.
So unlinking should unlink only mDocument in nodeinfomanager.
I could try that once bug 508373 is fixed (hoping someone hacking layout would do it :p), because the hack for that bug
isn't possible with nodeinfo-approach.
Comment 64 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-07 11:29:33 PDT
Created attachment 393225 [details] [diff] [review]
v19

Without the hack for bug 508373.

I started the nodeinfo-approach, but it is a bit tricky because of all the
bindingmanager and nodeinfomanager ownership issues.
Comment 65 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-08-08 03:25:18 PDT
(In reply to comment #64)
> I started the nodeinfo-approach
But IMHO, that could be a followup bug.

Note, the changes to cached content list should be enough.
No need to change nsContentList::NodeWillBeDestroyed in any way.
Comment 66 Peter Van der Beken [:peterv] 2009-08-19 13:34:46 PDT
I split off the boxobject CC stuff to bug 461640 (it has a reviewed piece of attachment 393225 [details] [diff] [review]). I'll also split off the cached content list stuff to a separate bug.

I have to say I also really dislike the NS_MAYBEADDREFOWNERDOC business.
Comment 67 Bill Gianopoulos [:WG9s] 2009-08-29 16:27:18 PDT
(In reply to comment #66)
> I have to say I also really dislike the NS_MAYBEADDREFOWNERDOC business.

Is it the whole idea or the name you are objecting to?  Changing MAYBE to CONDITIONALLY would certainly describe it better, although adding 8 characters to the function name.
Comment 68 Peter Van der Beken [:peterv] 2009-10-18 02:20:05 PDT
I discussed this with Jonas last week, here's what we would like to see: nsNode holds strong reference to its nsNodeInfo, nsNodeInfo holds strong reference to its nsNodeInfoManager, nsNodeInfoManager holds strong reference to the document, nsDocument holds strong reference to its nsBindingManager, nsNodeInfoManager doesn't hold the nsBindingManager anymore (partly backing out the patch from bug 415192). We unlink from CC if possible. Does that setup work? If it doesn't because it leaks, we should find out what those leaks are and fix them (I can help). Do we still need NS_MAYBEADDREFOWNERDOC if we do that?
Comment 69 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-18 03:16:41 PDT
(In reply to comment #68)
> We unlink from CC if possible.

Specifically, unlinking unlinks the strong reference from the nodeinfomanager to nsDocument. The strong links node->nodeinfo->nodeinfomanager aren't touched during unlinking.
Comment 70 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-10-18 04:40:00 PDT
So need to fix bug 415192 in some other way. Maybe the new setup just fixes it.

document->bindingmanager must never be unlinked, and document needs to always have
nodeinfomanager.
But if nodeinfomanager->document is unlinked and since bindingmanager doesn't
own document, I think that should work.
Comment 71 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-18 04:56:00 PDT
bug 415192 was caused by people holding on to an element and then having the elements owner-doc get garbage collected, which meant that an actively used element lost its bindingmanager. However with the proposed setup, an actively used element will hold a strong reference to its owner document (through element->nodeinfo->nodeinfomanager->document), which means that the document won't get garbage collected and thus we won't loose the binding manager.
Comment 72 Boris Zbarsky [:bz] 2009-10-18 10:41:00 PDT
So the idea is to have node->nodeinfo->nodeinfomanager strong, nodeinfomanager->document strong but breakable via cc?  And nodes always holding strong refs to parent nodes?  We're sure that nodes don't need the binding manager after the unlink phase of cc?  If so, that sounds doable...
Comment 73 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-18 11:27:17 PDT
(In reply to comment #72)
> We're sure that nodes don't need the
> binding manager after the unlink phase of cc?  If so, that sounds doable...

Technically speaking nothing should need anything after unlink since unlink should kill it. Of course that isn't always true.

All code we could find got the bindingmanager by doing:

node->GetOwnerDoc()->BindingManager()

(with appropriate nullchecks). If this is a situation where the document is getting unlinked along with the element in question, the above code will already stop producing a bindingmanager as soon as the document is destroyed.


We talked about making unlinking instead of clearing the nodeinfomanager->nsdocument link, change it to be a weak link (appropriately cleared when the document is destroyed as it is now). I don't remember if we decided that this was needed or not.
Comment 74 Boris Zbarsky [:bz] 2009-10-18 12:29:07 PDT
The thing is, I'd really like to not have to null-check GetOwnerDoc() or BindingManager()....
Comment 75 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-18 17:27:20 PDT
Absolutely. The way to do that is, IMHO, to make sure that node destructors never call out to code that attempts to get ownerdoc/bindingmanager. We should audit for that. Though probably after this bug is fixed. Then make it so that when the document goes away it clears out all the stuff that the bindingmanager holds on to.

Then the only place where we'd need to nullcheck is in the dtor for the element. If we there can't get the ownerdoc we know that the bindingmanager is empty. Otherwise we do what we currently do.
Comment 76 Olli Pettay [:smaug] (vacation Aug 25-28) 2009-10-19 02:06:54 PDT
(In reply to comment #75)
> Absolutely. The way to do that is, IMHO, to make sure that node destructors
> never call out to code that attempts to get ownerdoc/bindingmanager.
Node destructor, or actually LastRelease, does always call GetOwnerDoc().
Comment 77 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-19 02:47:55 PDT
But AFAICT in every case where it does so, it does it in order to clear out things from hashes managed by the document. If the document, upon destruction, cleared out those hashes then it would be fine for LastRelease to do:

nsIDocument* owner = aNode->OwnerDoc();
if (owner) {
  owner->ClearXFor(aNode);
}

We'd have to audit all NodeWillBeDestroyed implementations to make sure they don't try to get the ownerdoc.

Anyhow, I think all this is a secondary project that should happen after this bug is fixed.
Comment 78 Peter Van der Beken [:peterv] 2009-10-19 02:52:09 PDT
(In reply to comment #77)
> If the document, upon destruction,
> cleared out those hashes then it would be fine for LastRelease to do:
> 
> nsIDocument* owner = aNode->OwnerDoc();
> if (owner) {
>   owner->ClearXFor(aNode);
> }

And probably do the same when unlinking a node (can't rely on unlink order, so nodeinfomanager->document might already be broken at that point).
Comment 79 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-10-19 02:58:14 PDT
Actually, when unlinking we shouldn't need to do anything since either

1) The document is getting unlinked as well
2) The document is not getting unlinked as well

If 1 then the document getting destroyed will clear the hashes. If 2 then the destructor should be able to get to the ownerdocument and remove appropriate items from hashes.
Comment 80 Peter Van der Beken [:peterv] 2009-10-19 03:15:08 PDT
(In reply to comment #79)
> 1) The document is getting unlinked as well
> 2) The document is not getting unlinked as well
> 
> If 1 then the document getting destroyed will clear the hashes. If 2 then the
> destructor should be able to get to the ownerdocument and remove appropriate
> items from hashes.

Given that unlinking is generally incomplete we should still try to unlink them in case we don't unlink enough to get the destructor to run.
Comment 81 Stebs 2010-04-13 06:36:28 PDT
FYI, Acid3 Test 27 (tied to this Bug) suddenly fails with 
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a5pre) Gecko/20100413 Minefield/3.7a5pre
All ok on nightly before that.
Comment 82 Stebs 2010-04-13 07:03:03 PDT
Hold on, something went (reproducably) wrong on my side, everything back to normal now (verified by others).
Mea culpa
Comment 83 d 2010-05-04 15:00:21 PDT
Is this still relevant for the Acid3 test?
Comment 84 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-05-04 15:02:31 PDT
Yes
Comment 85 d 2010-05-09 06:03:35 PDT
Ok. How does it affect the test?
Comment 86 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2010-05-09 08:21:26 PDT
It causes Acid3 numeric tests #26 and/or #27 to fail intermittently a small percentage of the time (depending on garbage collection timing).  See bug 410460 comment 51, etc.
Comment 87 Johnny Stenback (:jst, jst@mozilla.com) 2010-05-26 15:52:41 PDT
Smaug, I think we should get this finalized for 1.9.3, and I think you had planned to do so either way. If you don't agree, please let me know.
Comment 88 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-09-02 11:35:50 PDT
This feels risky enough that we want it in beta. Please let me know if you disagree.
Comment 89 Olli Pettay [:smaug] (vacation Aug 25-28) 2010-10-25 03:47:50 PDT
I'm afraid this might be too risky for FF4.

(Sorry that I've been slow with this. The old patch did work without leaks,
but since people didn't quite like it.
*So far* I haven't found other way which doesn't leak.)
Comment 90 Henri Sivonen (:hsivonen) 2010-10-25 23:39:20 PDT
(In reply to comment #89)
> I'm afraid this might be too risky for FF4.

Is this now only about the parentNode getting lost per bug summary or does this mean that GetOwnerDoc() can return null in Firefox 4?
Comment 91 Olli Pettay [:smaug] (vacation Aug 25-28) 2010-10-26 02:47:47 PDT
This is both.
And GetOwnerDoc() may return null even after this is fixed, at least after
unlink or during dtor.
Comment 92 Raul Malea 2010-10-31 04:40:43 PDT
Any news for this bug?
Comment 93 Johnny Stenback (:jst, jst@mozilla.com) 2010-10-31 12:37:03 PDT
Yes, smaug is close to having a landable fix for this, but we've decided it's too risky to take this for Firefox 4, so this will need to wait until the following release.
Comment 94 Matthias Versen [:Matti] 2011-03-24 12:00:07 PDT
We are at the beginning of a new cycle. Would that be the right time for smaug Patch ?
Comment 95 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-03-24 12:02:41 PDT
This is highly regression risky thing. I'd rather do this for Fx6.
Comment 96 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-03-24 12:03:19 PDT
Especially since there is only few weeks left Fx5 developing.
Comment 97 Boris Zbarsky [:bz] 2011-03-24 12:05:02 PDT
Matti, the right way to think about it is that we're halfway through the Fx5 cycle right now: week 3-4 of a 6-week cycle.
Comment 98 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-24 03:42:09 PDT
(In reply to comment #68)
> I discussed this with Jonas last week, here's what we would like to see: nsNode
> holds strong reference to its nsNodeInfo, nsNodeInfo holds strong reference to
> its nsNodeInfoManager, nsNodeInfoManager holds strong reference to the
> document, nsDocument holds strong reference to its nsBindingManager,
> nsNodeInfoManager doesn't hold the nsBindingManager anymore (partly backing out
> the patch from bug 415192).

I wonder if this setup causes too many cycles. As we saw during AllHands, 
we do create lots of cycles which could be break down without
cycle collector. NS_MAYBEADDREFOWNERDOC is one such way to break down cycles.
Comment 99 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-24 03:59:49 PDT
...but we may want to have something a bit different anyway.
Comment 100 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-27 11:26:27 PDT
Created attachment 528650 [details] [diff] [review]
WIP

This doesn't leak all the time when browsing :)
The idea is to keep nodeinfomanager alive as long as document via
document's nodeinfo. But document's nodeinfo doesn't cause
nodeinfomanager's mDocument to be strong. Other kinds of
nodeinfos keep mDocument alive.
This way GetOwnerDoc() returns always valid value.
(I wish we could get rid of special casing DocumentType)

Uploaded to tryserver to find new problematic cases.
Comment 101 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-27 12:59:34 PDT
Created attachment 528688 [details] [diff] [review]
WIP2
Comment 102 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-04-27 14:11:32 PDT
Created attachment 528707 [details] [diff] [review]
WIP3, yet more unlinking
Comment 103 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-05-04 13:56:23 PDT
Created attachment 530134 [details] [diff] [review]
WIP4, for backup
Comment 104 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-06-08 09:20:00 PDT
Created attachment 538035 [details] [diff] [review]
WIP5

Waiting for tryserver results. nsSessionStore or something close
to it seems to do something hacky with timers, so not sure if all the tests
pass yet.
Comment 105 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-06-15 07:41:05 PDT
Created attachment 539513 [details] [diff] [review]
WIP6

This doesn't seem to leak when running tests.

I need to split the patch and file separate bugs for the certain parts for the patch.
Comment 106 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-06-15 09:51:20 PDT
Created attachment 539573 [details] [diff] [review]
patch

This is the patch for this bug. WIP6 contains patches also for bugs which
this bug depend on.
Comment 107 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-06-15 09:55:55 PDT
Comment on attachment 539573 [details] [diff] [review]
patch

This certainly can't land before all the other related patches have landed.

The idea is that all the other nodeinfos, except document nodeinfo keep
document alive. That way GetOwnerDoc() returns the right value even
in dtors, and yet there isn't cycle document->ni->nim->document.
Comment 108 Jesse Ruderman 2011-06-18 13:38:54 PDT
I fuzzed WIP6 (along with the followup in bug 664463), and all I found was bug 665317. So I'd say this looks good :)
Comment 109 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-07-22 17:58:58 PDT
I pushed the patch to try, and there are some new leaks.
Probably some regression.
Comment 110 Peter Van der Beken [:peterv] 2011-07-25 01:44:05 PDT
Comment on attachment 539573 [details] [diff] [review]
patch

Review of attachment 539573 [details] [diff] [review]:
-----------------------------------------------------------------

I'll review this part already, since any leaks are probably due to problems outside of this patch. It's likely that this patch will lead to more leaks in the future though :-(.

::: content/base/src/nsNodeInfoManager.cpp
@@ +396,5 @@
>  nsNodeInfoManager::RemoveNodeInfo(nsNodeInfo *aNodeInfo)
>  {
>    NS_PRECONDITION(aNodeInfo, "Trying to remove null nodeinfo from manager!");
>  
> +  if (aNodeInfo == mDocumentNodeInfo) {

I'd make this null out mDocument, presumably this can only happen after we released mDocument? We should be able to assert mNonDocumentNodeInfos == 0?

@@ +406,5 @@
> +        // even if mDocument gets deleted.
> +        mDocument->Release();
> +        if (!mDocumentNodeInfo) {
> +          mDocument = nsnull;
> +        }

And then you can remove this.
Btw, I'm not really sure it's correct anyway, what if the document and the nodeinfo manager are also held by non-nodeinfo objects, it seems like we'd never null out mDocument potentially leading to a dangling pointer once the document is destroyed?
Comment 111 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-07-25 03:26:10 PDT
nsNodeInfoManager::mDocument is set to null when document is deleted.
And nsDocument::mNodeInfo keeps nodeinfomanager alive.
Comment 112 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-07-25 11:21:54 PDT
Created attachment 548230 [details] [diff] [review]
patch

Got all green on try with the previous patch + fix for animation controller unlink.
Comment 113 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-07-26 10:30:48 PDT
http://hg.mozilla.org/mozilla-central/rev/c53ee19aeffa

Marking this fixed now, but I may need to back this out if
bad leaks are found.
Also, atm I intend to back this out from the next Aurora, so the target
could be FF9.
Comment 114 Boris Zbarsky [:bz] 2011-07-26 10:32:14 PDT
Do we need a bug tracking that backout?
Comment 115 Boris Zbarsky [:bz] 2011-09-27 10:27:58 PDT
Olli, I assume things are looking good here and we don't need to do anything for aurora 9, right?
Comment 116 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-09-27 10:31:12 PDT
I'm not aware of any problems why we should back this out from A9.
Comment 117 Andrew McCreight [:mccr8] 2011-12-15 13:22:35 PST
Did this get backed out of 9?
Comment 118 Olli Pettay [:smaug] (vacation Aug 25-28) 2011-12-15 13:24:19 PST
Yes.
Comment 119 Jesse Ruderman 2011-12-20 19:44:01 PST
And then the backout got backed out. See bug 711794.
Comment 120 Al Billings [:abillings] 2012-03-28 16:15:22 PDT
Verified fixed in current builds.

Note You need to log in before you can comment on or make changes to this bug.