Closed Bug 795221 Opened 12 years ago Closed 12 years ago

Zombie compartment after visiting www.moosejaw.com due to expandos on cssRules[n].style

Categories

(Core :: DOM: Core & HTML, defect)

17 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox17 + verified
firefox18 + verified

People

(Reporter: n.nethercote, Assigned: bzbarsky)

References

()

Details

(Keywords: memory-leak, regression, Whiteboard: [MemShrink:P1])

Attachments

(10 files, 2 obsolete files)

100 bytes, text/html
Details
5.00 KB, patch
smaug
: review+
Details | Diff | Splinter Review
8.16 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.75 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.58 KB, patch
smaug
: review+
Details | Diff | Splinter Review
2.23 KB, patch
smaug
: review+
Details | Diff | Splinter Review
14.30 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
8.23 KB, patch
smaug
: review+
Details | Diff | Splinter Review
13.68 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
44.48 KB, patch
Details | Diff | Splinter Review
A reader on my blog found this problem (https://blog.mozilla.org/nnethercote/2012/09/19/memshrink-progress-week-65-66/#comment-7640).

Steps to reproduce:

- Start Firefox with a new profile.
- Navigate to moosejaw.com.
- Navigate to about:compartments and hit reload a few times to force GC/CC.
- about:compartments says the following:

    User Compartments
    about:blank
    http://www.moosejaw.com/moosejaw/shop/home____
    null-principal
    resource://gre-resources/hiddenWindow.html

When I look at about:memory, the remaining compartment is moderately large (this is on Linux64):

│  │   ├──7.21 MB (08.09%) -- window(http://www.moosejaw.com/moosejaw/shop/home____)
│  │   │  ├──7.21 MB (08.09%) -- js/compartment(http://www.moosejaw.com/moosejaw/shop/home____)
│  │   │  │  ├──3.19 MB (03.58%) ++ gc-heap
│  │   │  │  ├──2.18 MB (02.44%) ── script-data
│  │   │  │  ├──1.03 MB (01.15%) ++ type-inference
│  │   │  │  └──0.82 MB (00.92%) ++ (5 tiny)
│  │   │  └──0.00 MB (00.00%) ── dom/other
I should add: the reporter says he cannot reproduce it on Aurora/Beta/Release, so this may be a new leak.
Keywords: regression
I'm able to reproduce it. As soon as the tab with moosejaw.com is closed, ghost compartments appear and stay in memory even if GC/CC are forced a few times.
The regression is in Firefox 17 too.

Mozregression range:

m-c
good=2012-08-25
bad=2012-08-26
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f077de66e52d&tochange=b3cce81fef1a

Bisecting surely needed.
Version: unspecified → 17 Branch
I saw two ghost windows the first time I tried this, but then I couldn't reproduce them again.  No matter;  the zombie compartment is good enough for diagnosing this.
Summary: Zombie compartment and two ghost windows after visiting www.moosejaw.com → Zombie compartment after visiting www.moosejaw.com
Regression window(cached m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/67ff83142ba5
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120824174558
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/89e186d2a171
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120824181259
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=67ff83142ba5&tochange=89e186d2a171

In local build
Last good: 75f3cd90e743
First bad: c526d9dfb684

Triggered by c526d9dfb684	Boris Zbarsky — Bug 753517 part 4. Set up auto-generation of CSS2Properties.webidl from nsCSSPropList.h and enable Paris bindings for CSSStyleDeclaration and CSS2Properties. r=khuey,peterv,dbaron
Depends on: 753517
Blocks: 753517
No longer depends on: 753517
Component: General → DOM
Specifically, this is due to DOMCSSDeclarationImpl being converted to the new bindings.

Minimal testcase coming up.
Attached file Minimal testcase
OK, so what happened here is we started preserving wrappers for CSS declarations attached to rules in sheets.

So now I think we have a cycle as follows: DOMCSSDeclarationImpl -> preserved wrapper -> window (global) -> document (own prop) -> nsDocument -> nsCSSStyleSheet (multiple paths) -> StyleRule -> DOMCSSStyleRule -> DOMCSSDeclarationImpl.
Assignee: nobody → bzbarsky
nsCSSStyleSheetInner is shared across multiple stylesheets, in
general.  The nsCSSStyleSheetInner owns the rules and the child
stylesheets.

What this means is that a given rule object is effectively owned by
multiple sheets.  However, cycles can only form through rule objects
that have been JS-wrapped, and if we're JS-wrapping a rule object that
means we have ensured that it's owned by only one stylesheet.
Therefore, we only traverse and unlink mInner if it's uniquely owned
by our sheet.

Similarly, if our child sheets or any of their rules have been
JS-wrapped, that means that we must have an mInner that we own
outright.
Attachment #666134 - Flags: review?(dbaron)
Attachment #666134 - Flags: review?(bugs)
Attachment #666136 - Flags: review?(dbaron)
Attachment #666136 - Flags: review?(bugs)
Nicholas, I love your blog readers!
Whiteboard: [MemShrink] → [need review][MemShrink]
> Nicholas, I love your blog readers!

I too love them when they produce bug reports this straightforward.  I like them less when they moan that "Firefox's memory consumption is really high after a few hours" and they won't do even small amounts of follow-up detective work :/
Assignee: bzbarsky → nobody
Component: DOM → General
Component: General → DOM
Assignee: nobody → bzbarsky
Comment on attachment 666134 [details] [diff] [review]
part 1.  Implement cycle collection for nsCSSStyleSheet objects, so we don't leak through them.   nsCSSStyleSheet has a pointer to a nsCSSStyleSheetInner.  The

>+NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(nsCSSStyleSheet)
>+  tmp->DropMedia();
>+  // We do not unlink mNext; our parent will handle that.  If we
>+  // unlinked it here, our parent would not be able to walk its list
>+  // of child sheets and null out the back-references to it, if we got
>+  // unlinked before it does.
>+  tmp->DropRuleCollection();
>+  tmp->UnlinkInner();
>+NS_IMPL_CYCLE_COLLECTION_UNLINK_END
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(nsCSSStyleSheet)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mMedia)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR_AMBIGUOUS(mNext, nsIStyleSheet)
>+  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_RAWPTR(mRuleCollection)
>+  tmp->TraverseInner(cb);
>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END

So this traverses mNext but doesn't unlink it.
Would it be possible for the parent to handle also traversing.
I hope that would make the setup easier to understand.
Attachment #666134 - Flags: review?(bugs) → review-
Attachment #666135 - Flags: review?(bugs) → review+
Comment on attachment 666138 [details] [diff] [review]
part 4.  Hook up <link> elements to cycle collect their stylesheet.


>+static PLDHashOperator
>+TraverseSheet(URIPrincipalAndCORSModeHashKey*,
>+              nsCSSStyleSheet* sheet,
>+              void* closure)
aSheet, aClosure
Attachment #666138 - Flags: review?(bugs) → review+
Attachment #666139 - Flags: review?(bugs) → review+
Attachment #666140 - Flags: review?(bugs) → review+
Comment on attachment 666136 [details] [diff] [review]
part 3.  Implement cycle collection for GroupRule objects.


>+  NS_DECL_CYCLE_COLLECTION_CLASS(GroupRule)
>+  // XXXbz this is duplicating a refcount that lives on Rule.h, but I
>+  // don't see how we can avoid that.
>+  NS_DECL_CYCLE_COLLECTING_ISUPPORTS
>+

This odd.
Per IRC conversation refcounting should happen in the classes inheriting Rule.
Attachment #666136 - Flags: review?(bugs) → review-
Attachment #666134 - Attachment is obsolete: true
Attachment #666134 - Flags: review?(dbaron)
Attachment #666559 - Flags: review?(dbaron)
Attachment #666136 - Attachment is obsolete: true
Attachment #666136 - Flags: review?(dbaron)
Attachment #666556 - Flags: review?(bugs) → review+
Attachment #666558 - Flags: review?(bugs) → review+
(In reply to Nicholas Nethercote [:njn] from comment #15)
> > Nicholas, I love your blog readers!
> 
> I too love them when they produce bug reports this straightforward.  I like
> them less when they moan that "Firefox's memory consumption is really high
> after a few hours" and they won't do even small amounts of follow-up
> detective work :/

Nicholas, I think Boris was being sarcastic ;)
No, I most certainly was not.  The blog reader who reported this caught a serious memory leak bug, and did so before we shipped it to users.  That sort of thing is _very_ helpful.
Whiteboard: [need review][MemShrink] → [need review][MemShrink:P1]
I stand corrected.
Blocks: 774980
Summary: Zombie compartment after visiting www.moosejaw.com → Zombie compartment after visiting www.moosejaw.com due to expandos on cssRules[n].style
Keywords: mlk
Comment on attachment 666557 [details] [diff] [review]
Part 1 updated to smaug's review comments

>+void
>+nsCSSStyleSheet::UnlinkInner()
>+{

>+  // mDocument, but don't want to work with deleted objects.  And we
>+  // don't want to do any addrefing in the process.

Is the need to avoid AddRef calls just for general reference counting performance, or something relating to CC?  Please explain in the comment.

>+void
>+nsCSSStyleSheet::TraverseInner(nsCycleCollectionTraversalCallback &cb)
>+{

>+  for (int32_t i = 0; i < mInner->mOrderedRules.Count(); ++i) {
>+    NS_CYCLE_COLLECTION_NOTE_EDGE_NAME(cb, "mOrderedRules[i]");
>+    cb.NoteXPCOMChild(mInner->mOrderedRules[i]->GetExistingDOMRule());
>+  }

Please put mInner->mOrderedRules in a local variable to save refetching it every time.  And maybe its Count() as well.


r=dbaron
Attachment #666557 - Flags: review?(dbaron) → review+
Comment on attachment 666559 [details] [diff] [review]
Part 3 updated to smaug's comments

>+NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(GroupRule)
>+  tmp->mRules.EnumerateForwards(SetParentRuleReference, nullptr);
>+  tmp->mRules.EnumerateForwards(SetStyleSheetReference, nullptr);

I think the SetStyleSheetReference enumeration is unneeded, since if the
group rule is being unlinked then presumably either (a) its containing
style sheet is being unlinked, which will call
GroupRule::SetStyleSheet(nsnull), which will do the same enumeration or
(b) it's no longer in a style sheet, in which case the style sheet
should already be null.

Also, I think if you don't remove it, unlinking deeply nested group
rules will be O(N^2) in the nesting depth.

So I think you should remove that one line and add a comment explaining
why it's not there.  But you should also check my logic carefully first.


>+NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(GroupRule)
>+  for (int32_t i = 0; i < tmp->mRules.Count(); ++i) {
>+    NS_CYCLE_COLLECTION_NOTE_EDGE_NAME(cb, "mRules[i]");
>+    cb.NoteXPCOMChild(tmp->mRules[i]->GetExistingDOMRule());
>+  }

Put tmp->mRules in a local, and maybe also its Count().

r=dbaron
Attachment #666559 - Flags: review?(dbaron) → review+
> Is the need to avoid AddRef calls just for general reference counting performance, or
> something relating to CC? 

Mostly because I wasn't sure whether it was safe to refcount inside traverse.  I'll add a comment.

> Please put mInner->mOrderedRules in a local variable to save refetching it every time.
> And maybe its Count() as well.

Done.

> So I think you should remove that one line and add a comment explaining
> why it's not there.  But you should also check my logic carefully first.

I think your logic is off because in case (a) we have no guarantee on the order of the unlink calls.  So we might be getting unlinked before the sheet is, and since we're about to clear tmp->mRules this is our last chance to call SetStyleSheetReference on them.  I should have documented that, and will do so now.

That said, you're right about the O(N^2) bit.  I'll add a check that our own stylesheet is non-null around that call, and a check that aSheet != GetStyleSheet() before setting on the kids in SetStyleSheet.  That should address that problem.

> Put tmp->mRules in a local, and maybe also its Count().

Done.
(In reply to Boris Zbarsky (:bz) from comment #29)
> Mostly because I wasn't sure whether it was safe to refcount inside
> traverse.  I'll add a comment.

You shouldn't change the refcount of another object during a CC. The same object is probably technically okay somehow, but there are assertions that should complain.
(If you already traversed an object x, then if you later change the refcount of x, then the CC will make incorrect conclusions, because it saves away ref counts while building the graph)
> You shouldn't change the refcount of another object during a CC. 

Yeah, this is definitely for "another object".  Good to know my paranoia was not misplaced!
Er, wait.  Those refcounts would be during unlink, not traverse.  So maybe they would be safe after all...
And also, they would not be adding a refcount.  Just doing an addref followed by two releases, on an object I'm trying to unlink.
That doesn't seem unsound, though it won't cause any additional objects to be Unlinked: the CC decides what is going to be unlinked, adds those to a list, and just goes through the list, unlinking everything.

There may be assertions that complain about changing ref counts on the CC thread, I don't recall exactly how those go.
In any case, it's easy enough to not do the addrefing here, so I'm sticking with it.  ;)
Flags: in-testsuite+
Whiteboard: [need review][MemShrink:P1] → [MemShrink:P1]
Target Milestone: --- → mozilla18
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 753517 
User impact if declined: Leaks on at least one site; quite possibly on others as
   well.  These are leaks for the lifetime of the process, so if you load the
   site a few times things get pretty bad.
Testing completed (on m-c, etc.): Passes regression tests and the new leak
   tests in this changeset.
Risk to taking this patch (and alternatives if risky): Risk should be
   reasonably low.  The patch just adds several more classes to the cycle 
   collector; the main risk would be if the additions are done incorrectly. 
   There is a slight risk of increasing cycle collector pauses.  If that
   happens, we can mitigate via adding skippability for CC purposes to CSS
   rules, but I avoided that for now to keep complexity down.  The obvious
   alternatives are shipping the leak or disabling the WebIDL bindings for CSS
   declarations.  I'm not sure whether anything on Aurora depends on those yet,
   but could do some try runs to find out if needed.
String or UUID changes made by this patch: None.

NOTE: this is an approval request for Aurora 17.  So if this is handled after the merge tomorrow, please consider it a request for Beta 18.
Attachment #668992 - Flags: approval-mozilla-aurora?
Comment on attachment 668992 [details] [diff] [review]
Roll-up patch against aurora (not what I'd land; I would be landing the 6 separate changesets)

BZ and I discussed - we'll land for b2 (and will approve after b1 goes to build).
Attachment #668992 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Attachment #668992 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I assume the approval-mozilla-beta implies approval-mozilla-aurora as well, right?
It landed in time for the merge, so it's already on Aurora.
fwiw, this is also reproducible in 16 (but not 15) and can be tripped up on github as well as a couple other sites
Whatever you're seeing in 16 is not this bug: this bug was due to a code change that happened in 17.

So whatever you're seeing, I recommend filing a bug with steps to reproduce, assuming it's also present in current beta builds...
Depends on: 810103
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0

Marking as verified with 17 beta 6. Could previously reproduce the issue. No longer reproducible, also checked a few popular sites (youtube, facebook, wikipedia), no compartments spotted.
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0

Verified in Firefox 18 beta 3 as well.
Keywords: verifyme
QA Contact: virgil.dicu
mass remove verifyme requests greater than 4 months old
Keywords: verifyme
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: