Bugzilla

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 1

•

18 years ago

Adding some more bugs that this blocks...

Blocks: 303260, 335896, 341431, 345026

Assignee

Comment 2

•

18 years ago

Btw, It would be great if we could use graydons cycle collector rather than nsDocument::Destroy to fix these leaks.

Updated

•

18 years ago

Blocks: 266754

Updated

•

18 years ago

Blocks: 361226

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 3

•

18 years ago

Is this easier now that Graydon's cycle collector is in?  I'd really like to see bug 335896 and bug 361226 go away, since they interfere with my testing.

Assignee

Comment 4

•

18 years ago

Oh, yes, this'd be awesome to fix now with the collector. Do we know which cycles cause leaks?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 5

•

18 years ago

We should just go through the UnbindFromTree impls and hook up everything cleaned up by them to cycle collection.  As I recall, the main issue was XUL stuff (controllers, etc).

Assignee

Comment 6

•

18 years ago

I'll get on this one.

Assignee: general → bugmail

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

17 years ago

Blocks: 370810

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Depends on: 367779, 368530, 368549

Peter Van der Beken [:peterv]

Updated

•

17 years ago

Depends on: 368773

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Blocks: 359821

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Flags: blocking1.9+

Assignee

Updated

•

17 years ago

Target Milestone: --- → mozilla1.9alpha5

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 7

•

17 years ago

Jonas?  What's the story here?  Gonna make A5?  Let's change this if necessary to A6.

Assignee

Updated

•

17 years ago

Target Milestone: mozilla1.9alpha5 → mozilla1.9alpha6

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Whiteboard: [sg:critical]

Assignee

Comment 8

•

17 years ago

Attached patch Patch to fix (obsolete) — Details — Splinter Review

So this makes us not UnbindFromTree nodes in nsDocument::Destroy. It also makes us not kill the hash that keeps wrappers alive.

I audited all ::UnbindFromTree implementations and the only thing I found not hooked up to the cycle collector were these three things:

1. mControllers. There are piles of different controllers and many implemented
   in javascript. The common ownership seems to be
     mControllers->nsXULControllers->nsBaseCommandController->
     nsControllerCommandTable->{{stuff}}
   However there might be other ones too for all I know and there is plenty of
   "stuff".
   But I think that we want to kill all controllers from nsDocument::Destroy
   anyway, so this shouldn't be a problem. I simply walk the DOM and manually
   unhook all mControllers.
   I did add code to unhook mControllers in Unlink though since we already
   traverse it.
2. nsHTMLFormElement::mWebProgress. I made this a weak reference instead.
3. Access keys. A few elements unregister their access keys from UnbindFromTree.
   This makes the nsEventStateManager no longer hold strong refs to these nodes.
   However I can't see a way to get the nsEventStateManager to participate in a
   cycle so I think we should be fine there.

Boris, let me know if you won't have time to review this tonight or tomorrow and I'll try to get someone else to do it.

Attachment #268039 - Flags: superreview?

Attachment #268039 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Attachment #268039 - Flags: superreview? → superreview?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 9

•

17 years ago

Comment on attachment 268039 [details] [diff] [review]
Patch to fix

Arg, I'm still leaking with this :(

Attachment #268039 - Attachment is obsolete: true

Attachment #268039 - Flags: superreview?(bzbarsky)

Attachment #268039 - Flags: review?(bzbarsky)

Comment 10

•

17 years ago

Comment on attachment 268039 [details] [diff] [review]
Patch to fix

>Index: content/base/src/nsDocument.cpp
>-    // mRootContent.  If we did this (or at least the latter), we could remove
>-    // the silly null-checks in nsHTMLDocument::MatchLinks.

File a followup bug to remove the null-check?

>+  nsNodeUtils::KillControllers(this);

I think we should work on making all controllers participate in cycle collection (the JS ones do automatically, right?) and removing this code, possibly.

>Index: content/base/src/nsNodeUtils.cpp
>+nsNodeUtils::KillControllers(nsINode* aRoot)

>+  if (aRoot->IsNodeOfType(nsINode::eXUL)) {
>+    nsXULElement* xul = NS_STATIC_CAST(nsXULElement*, aRoot);

Ugh.  Maybe nsXULElement::FromContent should take an nsINode* instead.  Followup bug?

Thinking about why this change starts to leak.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 11

•

17 years ago

We might want to add nsXTFElementWrapper and stuff it points to(?) to cycle collection.  I doubt that's your issue, though.

Are you leaking on documents with subframes?  Or just on starting and quitting the browser?

Could we be leaking through XBL?  Does calling 
ChangeDocumentFor to null for each node on the binding manager help the leak?

Past that, I don't really know what could be going on.  Stylesheets don't hold strong refs into the DOM, that I can see...

Assignee

Comment 12

•

17 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

This also kills all XBL bindings from a document when it's being destroyed. It's unfortunate, but it's no worse then we are currently. I'll file a bug on getting it removed.

Attachment #268305 - Flags: superreview?(jst)

Attachment #268305 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 13

•

17 years ago

Attached patch Patch v2.1 (obsolete) — Details — Splinter Review

Forgot to do a final diff before attaching

Attachment #268305 - Attachment is obsolete: true

Attachment #268307 - Flags: superreview?(jst)

Attachment #268307 - Flags: review?(jst)

Attachment #268305 - Flags: superreview?(jst)

Attachment #268305 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 14

•

17 years ago

I added nsXTFElementWrapper, but the stuff that it points to is implemented by extensions. But at least this way XTF extensions could take part of cycle collection.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 15

•

17 years ago

Right.  And if it's implemented in JS (which I think is the idea), it participates automatically, right?

Assignee

Comment 16

•

17 years ago

Yeah, though XForms uses C++

Johnny Stenback (:jst)

Comment 17

•

17 years ago

Comment on attachment 268307 [details] [diff] [review]
Patch v2.1

r+sr=jst

Attachment #268307 - Flags: superreview?(jst)

Attachment #268307 - Flags: superreview+

Attachment #268307 - Flags: review?(jst)

Attachment #268307 - Flags: review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 18

•

17 years ago

Checked in. Thanks for the review!

Martijn Wargers (dead)

Comment 19

•

17 years ago

So this bug is fixed then?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 20

•

17 years ago

RLk on the Firefox Linux tinderbox jumped from 3KB to 982KB around the time this was checked in :(

Assignee

Comment 21

•

17 years ago

Backed out :(

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 22

•

17 years ago

Have to push this unfortunately :(

Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Target Milestone: mozilla1.9 M8 → ---

Assignee

Comment 23

•

17 years ago

Attached patch Sync to tip (obsolete) — Details — Splinter Review

Same as above but synced to tip. Testing now I don't seem to leak any more.

Attachment #268307 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 24

•

17 years ago

Comment on attachment 282791 [details] [diff] [review]
Sync to tip

got a=jst on landing to see if it still leaks.

Attachment #282791 - Flags: approval1.9+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 25

•

17 years ago

It looks like this turned the talos tinderboxes red (crashing).

(It's really confusing to understand what the talos tinderboxes are doing because of bug 393338, though.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 26

•

17 years ago

In particular, it seems (from the log) that it's the talos Tp test, although it's a little hard to tell from the log.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 27

•

17 years ago

I backed this out to see if it fixes the talos tinderbox red.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 28

•

17 years ago

[28 21:48:37] <dbaron> bhearsum, did you see what it was doing when it crashed?
[28 21:48:42] <dbaron> bhearsum, (which test it was running?)
[28 21:48:45] <bhearsum> it was running through the pageset
[28 21:48:45] <bhearsum> nothing special
[28 21:48:48] <bhearsum> tp

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 29

•

17 years ago

I think this patch made http://www.somethingawful.com/d/photoshop-phriday/ leak.

Assignee

Comment 30

•

17 years ago

I have the leak pinned down. Gotta get Tp2 installed to try to reproduce the crash.

Status: REOPENED → ASSIGNED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 31

•

17 years ago

Attached patch patch v3 (obsolete) — Details — Splinter Review

Turns out the crash and the leak was caused by the same problem. The leak problem was that frame-elements holds a strong reference to the nsFrameLoader, which holds a strong reference to the child-frames nsDocShell which holds a strong reference to its document. So if there are any references from the inner document to the outer we have a cycle, and since we still can't cyclecollect nsDocShells we leak.

I fixed this by destroying the nsFrameLoader while recursively "destroying" the content tree. We should additionally hook up nsDocShell to the cycle collector, but that's more work so I'll do that separately.

Not destroying the frameloader also caused crashes as we are currently relying on nsFrameLoader::Destroy to null out mParentDocument on the inner document. This is not very reliable as the outer document can get destroyed before the frameloader. I bet it's possible to come up with testcases that crashes because of this.

Instead I made the nsDocument code that deals with the sub-document hash guarantee that if a subdocument is removed from the hash (as happens when the outer document is destroyed) it nulls out the mParentDocument pointer of the subdocument.

Attachment #282791 - Attachment is obsolete: true

Attachment #283108 - Flags: superreview?(jst)

Attachment #283108 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Attachment #283108 - Attachment is patch: true

Attachment #283108 - Attachment mime type: application/octet-stream → text/plain

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 32

•

17 years ago

Comment on attachment 283108 [details] [diff] [review]
patch v3

Oops, i still have some #if 0 code in there. Removed locally.

Johnny Stenback (:jst)

Updated

•

17 years ago

Attachment #283108 - Flags: superreview?(jst)

Attachment #283108 - Flags: superreview+

Attachment #283108 - Flags: review?(jst)

Attachment #283108 - Flags: review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 33

•

17 years ago

Had to back this out since it caused perf regressions on the talos Tp runs. My theory is this:

So for a lot of documents we now rely on the cycle collector to delete the document. This could cause Tp regressions for two reasons:

1. We used to tear down the document as soon as it was navigated away from (modulo things like bfcache, but we'd still throw out one document right at the end of each navigation). This had a higher chance of being between two page-loads of the Tp test.
2. Cycle collection is probably slower than normal tear-down since we call unlink on each node in the collected cycle. For cycles that involve the document-node these cycles can presumably be pretty big.

One way to fix 1 would be to make Tp force a cycle collection between page-loads. During normal browsing this should normally happen, but since Tp tests continuously loads page after page chances of cycle collecting between two pages gets pretty small.

Not sure if 2 is an actual problem as long as we are able to schedule cycle collections in situations where the user isn't bothered. Fixing bug 373462 should take care of that hopefully.

alice nodelman [:alice] [:anode]

Comment 34

•

17 years ago

There's a third possible reason:

3.  Talos is not measuring the right thing.

Exactly what quantity does Talos measure?  Note that the old Tp test was not affected, and it usually shows 3-5% across-the-board regressions just fine.

Comment 35

•

17 years ago

The numbers that you see reported by Talos on the Firefox tinderbox page are generated by Tp2 - so effectively you are seeing the same page load time collection as on the other tinderbox test machines.

The difference that we see in results is usually presumed to be because of the complexity of the talos page set - compared to the 40 circa year 2000 web pages being used on test tinderboxes talos uses ~400 pages that are now about 6 months old.

We are also starting to build up a history of talos showing regressions where the test tinderboxes do not.  We are not yet totally, 100% convinced that this is because the talos numbers are correct, but we have seen regression causing patches backed out causing an improvement in talos results.  At the very least, I believe that talos regressions are worthy of investigation and as they are starting to develop a history they will become more and more trustworthy.

alice nodelman [:alice] [:anode]

Comment 36

•

17 years ago

OK.  Note that we've had issues with Tp2 in the past that I'm not sure we ever resolved, both in terms of measurement methodology (it had some bugs in the number-crunching, pure and simple) and in terms of what it's measuring (loads from file:// and loads from http:// often have somewhat different performance characteristics).  Tp2 is useful, but not a replacement for Tp.

Jonas, if this is Tp2 there should be no bfcache involved: the test runs in an iframe.  It also means that the time is measured from right before window.location is set to when onload fires on the subframe.  So I would have really expected cycle collection to happen during the timing, both before and now.

Comment 37

•

17 years ago

Just for clarity the tp2 test that talos is running loads from http:// with pages served by a local apache install.

I can also say things about Vlad's new pageloader, but then I fear that I'm getting farther and farther off topic.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 38

•

17 years ago

Jonas, I wonder whether it helps to change the UnbindFromTree() calls in nsGenericElement::Unlink to be shallow.  Right now if we happen to unlink in more or less root-to-leaf order we end up unbinding the leaves once for each ancestor.

And because we know that all descendants will be unlinked if we're being unlinked, we don't have to worry about leaving dangling document pointers or whatnot... I think.

Assignee

Comment 39

•

17 years ago

Unfortunately doing a shallow unbind would not be entierly correct. Imagine a situation like (ext is an external reference):

   Y         D -
    \       /    \
 +-->X  B  C     |
 |    \ | /      |
ext    \|/      /
        A <----

In this case we would cycle collect the A-B-C-D cycle, but only shallowly unbinding X, leaving Y to still believe that it still is connected to A.

This might actually be good enough as in most cases Y will only care about if it's connected to A or not when it's part of a document. I think forms and SVG are cases that care, but only when connected to the document.

However ideally cycle collecting shouldn't be timed as part of the Tp tests as we should try to move it to a point when it won't interfere with the user (see also bug 373462).

The weird part is that when I do some logging it looks like at least locally on my machine cycle collection happens between pageloads. I logged timing-start, timing-end, cyclecollection and document-dtor. In basically all cases cyclecollection and document-dtor happened between timing-end and timing-start.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 40

•

17 years ago

I'm not sure what that diagram represents.  Are those DOM nodes?  Which way are the parent/child relationships?

In general, since parents own their kids we can't collect a kid while the parent is alive.  So if we're unlinking a node, that means all its ancestors are being collected as well.  We already have that situation on branch right now:  if you build up a DOM in JS, never stick it in a document, and then forget about it, we'll GC that and as the nodes go away we'll do shallow unbinds on the kids.

I guess we could have an issue where a kid is holding a weak ref to some ancestor and doesn't clear the ref because while the ancestor is being cycle-collected the kid isn't?  Or something like that?

Ideally we'd do a deep unbind on the root of any subtree being collected, and shallow unbinds on the rest of it, right?

As for timing this as part of Tp and interference with the user...  In practice it interferes with the user, imo.  We might as well be measuring the extent (and trying to mitigate it).

Assignee

Comment 41

•

17 years ago

In the graph all nodes except 'ext' are DOM elements. A is the root of the tree.

I didn't realize that we do shallow unbinds from the dtor of nsGenericElement, but we do indeed. This is actually a bit risky as not necessarily all descendants in the tree will be destroyed. So for example an <input> element that is a descendant of a <form> will not notice that the form is going away and will keep a weak pointer to it :(

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 42

•

17 years ago

I think we should make that pointer in particular a strong pointer that gets nulled out on Unlink.

Assignee

Updated

•

17 years ago

Attachment #282791 - Flags: approval1.9+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 43

•

17 years ago

Now that bug 373462 is fixed, is this patch ready to land (after beta)?

Depends on: 373462

Assignee

Updated

•

17 years ago

Priority: -- → P1

Mike Schroepfer

Comment 44

•

17 years ago

should we land this?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Depends on: 403835

Mike Schroepfer

Comment 45

•

17 years ago

Jonas - we makin progress here?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Whiteboard: [sg:critical] → [sg:critical] Blocked by bug 403835

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 46

•

17 years ago

Attached patch Patch v3.5 (obsolete) — Details — Splinter Review

This syncs to tip and also adds a couple of performance improvements:

1. Call DestroyLinkMap first in the nsDocument unlink function
2. Don't do deep unbind in the nsGenericElement unlink function

Both these are just mimicking what we're doing in destructors of these classes.

Got r/sr=jst over irl

Attachment #283108 - Attachment is obsolete: true

Attachment #290639 - Flags: superreview+

Attachment #290639 - Flags: review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 47

•

17 years ago

Attached patch Patch v4 — Details — Splinter Review

So this moves back to using recursive UnbindFromTree. We definitely still have to do that when IsInDoc is true, but we can possibly do non-recusive when it's not. But I'll do that as a separate patch instead, if we feel that we want to.

I also added a SaveState call to nsGenericHTMLFormElement::DestroyContent so that we're still saving state where we used to.

Attachment #290639 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 48

•

17 years ago

Looks like it finally stuck!

/me does victory dance

Status: ASSIGNED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Mike Schroepfer

Comment 49

•

17 years ago

Congrats!

Jeff Walden [:Waldo]

Comment 50

•

17 years ago

Tests for this should be associated with the dependencies, so nothing to do here.

Flags: in-testsuite-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

17 years ago

Depends on: 406840

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Depends on: 406684

Assignee

Updated

•

17 years ago

Depends on: 406852

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

17 years ago

Whiteboard: [sg:critical] Blocked by bug 403835 → [sg:critical]

Assignee

Updated

•

17 years ago

Depends on: 408268

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Blocks: 370811

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 51

•

17 years ago

So does this mean that the InZombieDocument() checks in presshell need updating?

Assignee

Comment 52

•

17 years ago

Not really sure what the purpose of that code is, so not sure. Could be that things work fine since the document is 'intact', or we might want to test mIsGoingAway

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 53

•

17 years ago

roc, mats, do you remember what the point of the InZombieDocument() stuff is?

I'd do the CVS archeology, but I'm not sure I'll have time for it until a few days from now.  :(

Comment 54

•

17 years ago

I think at least part of it was fixing the bug where app shortcut keys didn't work at all during the transition between documents (assuming the content area was focused) -- originally bug 110718, with some later regression fixes on top.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 55

•

17 years ago

Jonas, can you check on that, please?  That's been a major annoyance in that past; I'm hoping we didn't regress it...

Assignee

Comment 56

•

17 years ago

It seems to work fine for me.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

17 years ago

Depends on: 412920

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

16 years ago

Depends on: 419132

Martijn Wargers (dead)

Updated

•

16 years ago

Depends on: 415192

Daniel Veditz [:dveditz]

Updated

•

14 years ago

Group: core-security