Last Comment Bug 795221 - Zombie compartment after visiting www.moosejaw.com due to expandos on cssRules[n].style
: Zombie compartment after visiting www.moosejaw.com due to expandos on cssRule...
Status: RESOLVED FIXED
[MemShrink:P1]
: mlk, regression
Product: Core
Classification: Components
Component: DOM (show other bugs)
: 17 Branch
: x86_64 Linux
: -- normal (vote)
: mozilla18
Assigned To: Boris Zbarsky [:bz]
: Virgil Dicu [:virgil] [QA]
Mentors:
http://www.moosejaw.com
: 798143 (view as bug list)
Depends on: 810103
Blocks: ZombieCompartments 753517 774980
  Show dependency treegraph
 
Reported: 2012-09-27 23:54 PDT by Nicholas Nethercote [:njn]
Modified: 2014-01-10 10:41 PST (History)
21 users (show)
bzbarsky: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
+
verified


Attachments
Minimal testcase (100 bytes, text/html)
2012-09-28 09:33 PDT, Boris Zbarsky [:bz]
no flags Details
part 1. Implement cycle collection for nsCSSStyleSheet objects, so we don't leak through them. nsCSSStyleSheet has a pointer to a nsCSSStyleSheetInner. The (14.18 KB, patch)
2012-09-28 18:13 PDT, Boris Zbarsky [:bz]
bugs: review-
Details | Diff | Review
part 2. Hook up <style> elements to cycle collect their stylesheet. (5.00 KB, patch)
2012-09-28 18:16 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
part 3. Implement cycle collection for GroupRule objects. (6.57 KB, patch)
2012-09-28 18:16 PDT, Boris Zbarsky [:bz]
bugs: review-
Details | Diff | Review
part 4. Hook up <link> elements to cycle collect their stylesheet. (8.16 KB, patch)
2012-09-28 18:17 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
part 5. Hook up <svg:style> elements to cycle collect their stylesheet. (3.75 KB, patch)
2012-09-28 18:17 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
part 6. Hook up xml-stylesheet PIs to cycle collect their stylesheet. (3.58 KB, patch)
2012-09-28 18:18 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
Interdiff for part 1 to change how we traverse mNext (2.23 KB, patch)
2012-10-01 08:00 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
Part 1 updated to smaug's review comments (14.30 KB, patch)
2012-10-01 08:01 PDT, Boris Zbarsky [:bz]
dbaron: review+
Details | Diff | Review
Interdiff for part 3 to move all the refcounting to the subclasses. (8.23 KB, patch)
2012-10-01 08:02 PDT, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Review
Part 3 updated to smaug's comments (13.68 KB, patch)
2012-10-01 08:03 PDT, Boris Zbarsky [:bz]
dbaron: review+
Details | Diff | Review
Roll-up patch against aurora (not what I'd land; I would be landing the 6 separate changesets) (44.48 KB, patch)
2012-10-07 20:01 PDT, Boris Zbarsky [:bz]
akeybl: approval‑mozilla‑beta+
Details | Diff | Review

Description Nicholas Nethercote [:njn] 2012-09-27 23:54:17 PDT
A reader on my blog found this problem (https://blog.mozilla.org/nnethercote/2012/09/19/memshrink-progress-week-65-66/#comment-7640).

Steps to reproduce:

- Start Firefox with a new profile.
- Navigate to moosejaw.com.
- Navigate to about:compartments and hit reload a few times to force GC/CC.
- about:compartments says the following:

    User Compartments
    about:blank
    http://www.moosejaw.com/moosejaw/shop/home____
    null-principal
    resource://gre-resources/hiddenWindow.html

When I look at about:memory, the remaining compartment is moderately large (this is on Linux64):

│  │   ├──7.21 MB (08.09%) -- window(http://www.moosejaw.com/moosejaw/shop/home____)
│  │   │  ├──7.21 MB (08.09%) -- js/compartment(http://www.moosejaw.com/moosejaw/shop/home____)
│  │   │  │  ├──3.19 MB (03.58%) ++ gc-heap
│  │   │  │  ├──2.18 MB (02.44%) ── script-data
│  │   │  │  ├──1.03 MB (01.15%) ++ type-inference
│  │   │  │  └──0.82 MB (00.92%) ++ (5 tiny)
│  │   │  └──0.00 MB (00.00%) ── dom/other
Comment 1 Nicholas Nethercote [:njn] 2012-09-27 23:56:32 PDT
I should add: the reporter says he cannot reproduce it on Aurora/Beta/Release, so this may be a new leak.
Comment 2 Loic 2012-09-28 02:30:30 PDT
I'm able to reproduce it. As soon as the tab with moosejaw.com is closed, ghost compartments appear and stay in memory even if GC/CC are forced a few times.
The regression is in Firefox 17 too.

Mozregression range:

m-c
good=2012-08-25
bad=2012-08-26
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f077de66e52d&tochange=b3cce81fef1a

Bisecting surely needed.
Comment 3 Nicholas Nethercote [:njn] 2012-09-28 02:34:27 PDT
I saw two ghost windows the first time I tried this, but then I couldn't reproduce them again.  No matter;  the zombie compartment is good enough for diagnosing this.
Comment 4 Alice0775 White 2012-09-28 07:14:45 PDT
Regression window(cached m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/67ff83142ba5
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120824174558
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/89e186d2a171
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120824181259
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=67ff83142ba5&tochange=89e186d2a171

In local build
Last good: 75f3cd90e743
First bad: c526d9dfb684

Triggered by c526d9dfb684	Boris Zbarsky — Bug 753517 part 4. Set up auto-generation of CSS2Properties.webidl from nsCSSPropList.h and enable Paris bindings for CSSStyleDeclaration and CSS2Properties. r=khuey,peterv,dbaron
Comment 5 Boris Zbarsky [:bz] 2012-09-28 09:33:18 PDT
Specifically, this is due to DOMCSSDeclarationImpl being converted to the new bindings.

Minimal testcase coming up.
Comment 6 Boris Zbarsky [:bz] 2012-09-28 09:33:43 PDT
Created attachment 665946 [details]
Minimal testcase
Comment 7 Boris Zbarsky [:bz] 2012-09-28 11:15:53 PDT
OK, so what happened here is we started preserving wrappers for CSS declarations attached to rules in sheets.

So now I think we have a cycle as follows: DOMCSSDeclarationImpl -> preserved wrapper -> window (global) -> document (own prop) -> nsDocument -> nsCSSStyleSheet (multiple paths) -> StyleRule -> DOMCSSStyleRule -> DOMCSSDeclarationImpl.
Comment 8 Boris Zbarsky [:bz] 2012-09-28 18:13:37 PDT
Created attachment 666134 [details] [diff] [review]
part 1.  Implement cycle collection for nsCSSStyleSheet objects, so we don't leak through them.   nsCSSStyleSheet has a pointer to a nsCSSStyleSheetInner.  The

nsCSSStyleSheetInner is shared across multiple stylesheets, in
general.  The nsCSSStyleSheetInner owns the rules and the child
stylesheets.

What this means is that a given rule object is effectively owned by
multiple sheets.  However, cycles can only form through rule objects
that have been JS-wrapped, and if we're JS-wrapping a rule object that
means we have ensured that it's owned by only one stylesheet.
Therefore, we only traverse and unlink mInner if it's uniquely owned
by our sheet.

Similarly, if our child sheets or any of their rules have been
JS-wrapped, that means that we must have an mInner that we own
outright.
Comment 9 Boris Zbarsky [:bz] 2012-09-28 18:16:32 PDT
Created attachment 666135 [details] [diff] [review]
part 2.  Hook up <style> elements to cycle collect their stylesheet.
Comment 10 Boris Zbarsky [:bz] 2012-09-28 18:16:47 PDT
Created attachment 666136 [details] [diff] [review]
part 3.  Implement cycle collection for GroupRule objects.
Comment 11 Boris Zbarsky [:bz] 2012-09-28 18:17:36 PDT
Created attachment 666138 [details] [diff] [review]
part 4.  Hook up <link> elements to cycle collect their stylesheet.
Comment 12 Boris Zbarsky [:bz] 2012-09-28 18:17:54 PDT
Created attachment 666139 [details] [diff] [review]
part 5.  Hook up <svg:style> elements to cycle collect their stylesheet.
Comment 13 Boris Zbarsky [:bz] 2012-09-28 18:18:14 PDT
Created attachment 666140 [details] [diff] [review]
part 6.  Hook up xml-stylesheet PIs to cycle collect their stylesheet.
Comment 14 Boris Zbarsky [:bz] 2012-09-28 18:19:25 PDT
Nicholas, I love your blog readers!
Comment 15 Nicholas Nethercote [:njn] 2012-09-28 20:59:14 PDT
> Nicholas, I love your blog readers!

I too love them when they produce bug reports this straightforward.  I like them less when they moan that "Firefox's memory consumption is really high after a few hours" and they won't do even small amounts of follow-up detective work :/
Comment 16 Olli Pettay [:smaug] 2012-10-01 06:07:15 PDT
Comment on attachment 666134 [details] [diff] [review]
part 1.  Implement cycle collection for nsCSSStyleSheet objects, so we don't leak through them.   nsCSSStyleSheet has a pointer to a nsCSSStyleSheetInner.  The

>+NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(nsCSSStyleSheet)
>+  tmp->DropMedia();
>+  // We do not unlink mNext; our parent will handle that.  If we
>+  // unlinked it here, our parent would not be able to walk its list
>+  // of child sheets and null out the back-references to it, if we got
>+  // unlinked before it does.
>+  tmp->DropRuleCollection();
>+  tmp->UnlinkInner();
>+NS_IMPL_CYCLE_COLLECTION_UNLINK_END
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(nsCSSStyleSheet)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mMedia)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mNext, nsIStyleSheet)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_RAWPTR(mRuleCollection)
>+  tmp->TraverseInner(cb);
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

So this traverses mNext but doesn't unlink it.
Would it be possible for the parent to handle also traversing.
I hope that would make the setup easier to understand.
Comment 17 Olli Pettay [:smaug] 2012-10-01 06:26:49 PDT
Comment on attachment 666138 [details] [diff] [review]
part 4.  Hook up <link> elements to cycle collect their stylesheet.


>+static PLDHashOperator
>+TraverseSheet(URIPrincipalAndCORSModeHashKey*,
>+              nsCSSStyleSheet* sheet,
>+              void* closure)
aSheet, aClosure
Comment 18 Olli Pettay [:smaug] 2012-10-01 06:53:48 PDT
Comment on attachment 666136 [details] [diff] [review]
part 3.  Implement cycle collection for GroupRule objects.


>+  NS_DECL_CYCLE_COLLECTION_CLASS(GroupRule)
>+  // XXXbz this is duplicating a refcount that lives on Rule.h, but I
>+  // don't see how we can avoid that.
>+  NS_DECL_CYCLE_COLLECTING_ISUPPORTS
>+

This odd.
Per IRC conversation refcounting should happen in the classes inheriting Rule.
Comment 19 Boris Zbarsky [:bz] 2012-10-01 08:00:30 PDT
Created attachment 666556 [details] [diff] [review]
Interdiff for part 1 to change how we traverse mNext
Comment 20 Boris Zbarsky [:bz] 2012-10-01 08:01:09 PDT
Created attachment 666557 [details] [diff] [review]
Part 1 updated to smaug's review comments
Comment 21 Boris Zbarsky [:bz] 2012-10-01 08:02:40 PDT
Created attachment 666558 [details] [diff] [review]
Interdiff for part 3 to move all the refcounting to the subclasses.
Comment 22 Boris Zbarsky [:bz] 2012-10-01 08:03:16 PDT
Created attachment 666559 [details] [diff] [review]
Part 3 updated to smaug's comments
Comment 23 Manoj 2012-10-01 20:22:12 PDT
(In reply to Nicholas Nethercote [:njn] from comment #15)
> > Nicholas, I love your blog readers!
> 
> I too love them when they produce bug reports this straightforward.  I like
> them less when they moan that "Firefox's memory consumption is really high
> after a few hours" and they won't do even small amounts of follow-up
> detective work :/

Nicholas, I think Boris was being sarcastic ;)
Comment 24 Boris Zbarsky [:bz] 2012-10-02 07:15:22 PDT
No, I most certainly was not.  The blog reader who reported this caught a serious memory leak bug, and did so before we shipped it to users.  That sort of thing is _very_ helpful.
Comment 25 Manoj 2012-10-02 20:38:29 PDT
I stand corrected.
Comment 26 Boris Zbarsky [:bz] 2012-10-04 17:03:00 PDT
*** Bug 798143 has been marked as a duplicate of this bug. ***
Comment 27 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-10-07 14:38:51 PDT
Comment on attachment 666557 [details] [diff] [review]
Part 1 updated to smaug's review comments

>+void
>+nsCSSStyleSheet::UnlinkInner()
>+{

>+  // mDocument, but don't want to work with deleted objects.  And we
>+  // don't want to do any addrefing in the process.

Is the need to avoid AddRef calls just for general reference counting performance, or something relating to CC?  Please explain in the comment.

>+void
>+nsCSSStyleSheet::TraverseInner(nsCycleCollectionTraversalCallback &cb)
>+{

>+  for (int32_t i = 0; i < mInner->mOrderedRules.Count(); ++i) {
>+    NS_CYCLE_COLLECTION_NOTE_EDGE_NAME(cb, "mOrderedRules[i]");
>+    cb.NoteXPCOMChild(mInner->mOrderedRules[i]->GetExistingDOMRule());
>+  }

Please put mInner->mOrderedRules in a local variable to save refetching it every time.  And maybe its Count() as well.


r=dbaron
Comment 28 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-10-07 14:52:10 PDT
Comment on attachment 666559 [details] [diff] [review]
Part 3 updated to smaug's comments

>+NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(GroupRule)
>+  tmp->mRules.EnumerateForwards(SetParentRuleReference, nullptr);
>+  tmp->mRules.EnumerateForwards(SetStyleSheetReference, nullptr);

I think the SetStyleSheetReference enumeration is unneeded, since if the
group rule is being unlinked then presumably either (a) its containing
style sheet is being unlinked, which will call
GroupRule::SetStyleSheet(nsnull), which will do the same enumeration or
(b) it's no longer in a style sheet, in which case the style sheet
should already be null.

Also, I think if you don't remove it, unlinking deeply nested group
rules will be O(N^2) in the nesting depth.

So I think you should remove that one line and add a comment explaining
why it's not there.  But you should also check my logic carefully first.


>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(GroupRule)
>+  for (int32_t i = 0; i < tmp->mRules.Count(); ++i) {
>+    NS_CYCLE_COLLECTION_NOTE_EDGE_NAME(cb, "mRules[i]");
>+    cb.NoteXPCOMChild(tmp->mRules[i]->GetExistingDOMRule());
>+  }

Put tmp->mRules in a local, and maybe also its Count().

r=dbaron
Comment 29 Boris Zbarsky [:bz] 2012-10-07 19:07:54 PDT
> Is the need to avoid AddRef calls just for general reference counting performance, or
> something relating to CC? 

Mostly because I wasn't sure whether it was safe to refcount inside traverse.  I'll add a comment.

> Please put mInner->mOrderedRules in a local variable to save refetching it every time.
> And maybe its Count() as well.

Done.

> So I think you should remove that one line and add a comment explaining
> why it's not there.  But you should also check my logic carefully first.

I think your logic is off because in case (a) we have no guarantee on the order of the unlink calls.  So we might be getting unlinked before the sheet is, and since we're about to clear tmp->mRules this is our last chance to call SetStyleSheetReference on them.  I should have documented that, and will do so now.

That said, you're right about the O(N^2) bit.  I'll add a check that our own stylesheet is non-null around that call, and a check that aSheet != GetStyleSheet() before setting on the kids in SetStyleSheet.  That should address that problem.

> Put tmp->mRules in a local, and maybe also its Count().

Done.
Comment 30 Andrew McCreight [:mccr8] 2012-10-07 19:12:59 PDT
(In reply to Boris Zbarsky (:bz) from comment #29)
> Mostly because I wasn't sure whether it was safe to refcount inside
> traverse.  I'll add a comment.

You shouldn't change the refcount of another object during a CC. The same object is probably technically okay somehow, but there are assertions that should complain.
Comment 31 Andrew McCreight [:mccr8] 2012-10-07 19:14:06 PDT
(If you already traversed an object x, then if you later change the refcount of x, then the CC will make incorrect conclusions, because it saves away ref counts while building the graph)
Comment 32 Boris Zbarsky [:bz] 2012-10-07 19:15:30 PDT
> You shouldn't change the refcount of another object during a CC. 

Yeah, this is definitely for "another object".  Good to know my paranoia was not misplaced!
Comment 33 Boris Zbarsky [:bz] 2012-10-07 19:16:16 PDT
Er, wait.  Those refcounts would be during unlink, not traverse.  So maybe they would be safe after all...
Comment 34 Boris Zbarsky [:bz] 2012-10-07 19:22:40 PDT
And also, they would not be adding a refcount.  Just doing an addref followed by two releases, on an object I'm trying to unlink.
Comment 35 Andrew McCreight [:mccr8] 2012-10-07 19:29:14 PDT
That doesn't seem unsound, though it won't cause any additional objects to be Unlinked: the CC decides what is going to be unlinked, adds those to a list, and just goes through the list, unlinking everything.

There may be assertions that complain about changing ref counts on the CC thread, I don't recall exactly how those go.
Comment 36 Boris Zbarsky [:bz] 2012-10-07 19:32:21 PDT
In any case, it's easy enough to not do the addrefing here, so I'm sticking with it.  ;)
Comment 38 Boris Zbarsky [:bz] 2012-10-07 20:01:59 PDT
Created attachment 668992 [details] [diff] [review]
Roll-up patch against aurora (not what I'd land; I would be landing the 6 separate changesets)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 753517 
User impact if declined: Leaks on at least one site; quite possibly on others as
   well.  These are leaks for the lifetime of the process, so if you load the
   site a few times things get pretty bad.
Testing completed (on m-c, etc.): Passes regression tests and the new leak
   tests in this changeset.
Risk to taking this patch (and alternatives if risky): Risk should be
   reasonably low.  The patch just adds several more classes to the cycle 
   collector; the main risk would be if the additions are done incorrectly. 
   There is a slight risk of increasing cycle collector pauses.  If that
   happens, we can mitigate via adding skippability for CC purposes to CSS
   rules, but I avoided that for now to keep complexity down.  The obvious
   alternatives are shipping the leak or disabling the WebIDL bindings for CSS
   declarations.  I'm not sure whether anything on Aurora depends on those yet,
   but could do some try runs to find out if needed.
String or UUID changes made by this patch: None.

NOTE: this is an approval request for Aurora 17.  So if this is handled after the merge tomorrow, please consider it a request for Beta 18.
Comment 40 Alex Keybl [:akeybl] 2012-10-08 16:29:57 PDT
Comment on attachment 668992 [details] [diff] [review]
Roll-up patch against aurora (not what I'd land; I would be landing the 6 separate changesets)

BZ and I discussed - we'll land for b2 (and will approve after b1 goes to build).
Comment 41 Nicholas Nethercote [:njn] 2012-10-10 17:42:19 PDT
I assume the approval-mozilla-beta implies approval-mozilla-aurora as well, right?
Comment 42 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-10-10 17:43:19 PDT
It landed in time for the merge, so it's already on Aurora.
Comment 44 Danial Horton 2012-11-03 07:17:35 PDT
fwiw, this is also reproducible in 16 (but not 15) and can be tripped up on github as well as a couple other sites
Comment 45 Boris Zbarsky [:bz] 2012-11-03 11:48:09 PDT
Whatever you're seeing in 16 is not this bug: this bug was due to a code change that happened in 17.

So whatever you're seeing, I recommend filing a bug with steps to reproduce, assuming it's also present in current beta builds...
Comment 46 Virgil Dicu [:virgil] [QA] 2012-11-15 07:11:43 PST
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0

Marking as verified with 17 beta 6. Could previously reproduce the issue. No longer reproducible, also checked a few popular sites (youtube, facebook, wikipedia), no compartments spotted.
Comment 47 Virgil Dicu [:virgil] [QA] 2012-12-12 05:53:03 PST
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0

Verified in Firefox 18 beta 3 as well.
Comment 48 Tracy Walker [:tracy] 2014-01-10 10:41:41 PST
mass remove verifyme requests greater than 4 months old

Note You need to log in before you can comment on or make changes to this bug.