Last Comment Bug 671053 - tbpl.mozilla.org causes a zombie compartment for 2-3 minutes because an XHR is holding things alive
: tbpl.mozilla.org causes a zombie compartment for 2-3 minutes because an XHR i...
Status: VERIFIED FIXED
[MemShrink:P1]
: regression
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla7
Assigned To: Kyle Huey [:khuey] (khuey@mozilla.com)
:
Mentors:
Depends on:
Blocks: 623948 ZombieCompartments
  Show dependency treegraph
 
Reported: 2011-07-12 13:14 PDT by Andrew McCreight [:mccr8]
Modified: 2013-12-27 14:28 PST (History)
24 users (show)
khuey: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
-
fixed


Attachments
Patch (817 bytes, patch)
2011-07-13 17:12 PDT, Kyle Huey [:khuey] (khuey@mozilla.com)
mcmanus: review+
asa: approval‑mozilla‑aurora+
asa: approval‑mozilla‑beta-
Details | Diff | Review

Description Andrew McCreight [:mccr8] 2011-07-12 13:14:56 PDT
After you close a tab with tbpl.mozilla.org, a compartment sticks around for 2-3 minutes, even if you use "minimize memory usage" many times.  This is also seems to be about the same length of time as the refresh timer is set for, so perhaps it is related to the timer.
Comment 1 Andrew McCreight [:mccr8] 2011-07-12 13:16:37 PDT
See early comments in Bug 668871 for discussion of this.
Comment 2 Alon Zakai (:azakai) 2011-07-12 14:08:54 PDT
I tried calling ClearAllTimeouts() in nsGlobalWindow::PostHandleEvent when NS_PAGE_UNLOAD is called (which happens when the tab is closed). This didn't help with the issue I was sad to see.
Comment 3 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 10:24:08 PDT
Poking around during the initial cycle collection ... we don't appear to have any DOM timeouts sitting around.
Comment 4 Andrew McCreight [:mccr8] 2011-07-13 10:28:40 PDT
When I looked into this before, I saw that these objects were rooting the TBPL window (copied over from the other bug):

    0x22679a0 [nsXPCWrappedJS (nsIDOMEventListener) [1/3]]
        --[]-> 0x7f5b55052be0 [JS Object (Function) (global=7f5b60ab4088)]
        --[]-> 0x7f5b5509a480 [JS Object (Function) (global=7f5b60ab4088)]
        --[]-> 0x7f5b60ab4088 [JS Object (Window) (global=7f5b60ab4088)]

    0x1e9d0c0 [nsJSContext [45/47]]
        --[mContext]-> 0x275c840 [JSContext]
        --[[global object]]-> 0x7f5b60a07028 [JS Object (Proxy) (global=7f5b60ab4088)]
        --[]-> 0x7f5b60ab4088 [JS Object (Window) (global=7f5b60ab4088)]

I think somebody in IRC said that the nsJSContext was expected, given the other thing.  jlebar also saw the window being held by nsXPCWrappedJS (nsIDOMEventListener).
Comment 5 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 11:16:34 PDT
My current theory is that this is related to XHR.  Loading TBPL in a tab creates five XHRs, three go away when the tab is closed.  A docshell is destroyed immediately after the last two XHRs die, and "Minimize Memory" after that docshell is gone results in the tbpl windows and compartment being collected.

Still investigating what's going on with the XHR ...
Comment 6 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 12:11:21 PDT
Yes, this is definitely the XHR.  There is an XHR to 'http://hg.mozilla.org/mozilla-central/json-pushes?full=1&maxhours=24' which is being held alive by Necko.  The XHR holds onto an event listener (which the the nsXPCWrappedJS above) which holds onto the Window ...
Comment 7 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 15:39:00 PDT
I understand this now, and it's a regression from Bug 623948.  A patch later today.
Comment 8 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 17:12:14 PDT
Created attachment 545784 [details] [diff] [review]
Patch

Bug 623948 added a reference from nsHttpConnections to an nsIInterfaceRequestor used for callbacks.  The problem is that nsHttpConnections can live several minutes after the channel is closed.  The fix is to drop the reference when the http transaction ends, instead of when the connection is finally torn down or reused.
Comment 9 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-13 22:59:03 PDT
Upgrading this to MemShrink:P1 per conversation with njn because it obscures any other time-limited compartment leak and is very easy to hit.
Comment 10 Jason Duell [:jduell] (needinfo? me) 2011-07-13 23:13:20 PDT
Comment on attachment 545784 [details] [diff] [review]
Patch

Almost +r'd this--seems simple enough.  But want to make sure Patrick or Honza looks over it to make sure we don't need mCallbacks at some point after CloseTransaction.
Comment 11 Patrick McManus [:mcmanus] 2011-07-14 05:54:24 PDT
There is a test I want to run first before completing the review.. I'll have it done today.
Comment 12 Honza Bambas (:mayhemer) 2011-07-14 08:17:38 PDT
CloseTransaction is always called with a failure code.  Right after these new lines we also close the connection with call of Close on it self.  That method releases the connection as callbacks from socket transport and also from security info object ; those two should be AFAIK only two consumers of nsHttpConnection as callbacks.

More importantly, before the HTTP syn retry patch, nsHttpConnection::GetInterface code was querying mTransaction, if it were there (!), for callbacks.  And we release mTransaction just above the new lines.

So IMO the patch is OK and fixes this HTTP syn retry patch leak regression.
Comment 13 Patrick McManus [:mcmanus] 2011-07-14 10:50:36 PDT
Comment on attachment 545784 [details] [diff] [review]
Patch

Thanks - the ref needs to be held across init/activate, but its fine to drop it at closetransaction.
Comment 14 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-14 11:08:56 PDT
Thanks for the quick reviews everyone.
Comment 15 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-14 11:13:38 PDT
http://hg.mozilla.org/mozilla-central/rev/a4d88ea926ec

Setting in-testsuite? so I write a test after I write the lifetime testing code.
Comment 16 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-14 12:55:14 PDT
Comment on attachment 545784 [details] [diff] [review]
Patch

I think we should take this on branches.  The fix is pretty simple, and we haven't shipped the regression yet.
Comment 17 Patrick McManus [:mcmanus] 2011-07-14 13:11:59 PDT
just to be clear, this delayed freeing of memory is going to scale based on numbers of live connections - not number of transactions or xhrs or something.. and the former is a fairly low number (maxxing out at 6 per unique host).

with that being said you might want to get a little air time with the fix before promoting to beta. or you might not :)
Comment 18 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-14 14:51:20 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/c8961d4e0471
Comment 19 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-07-14 16:09:19 PDT
Accepting for Aurora but rejecting for Beta seems ok to me.  Arguably the biggest problem with this bug is that it makes the diagnosis of zombie compartments (bug 668871) much harder, but the per-compartment reporters you use to diagnose them only landed in FF7.
Comment 20 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-14 20:03:19 PDT
(In reply to comment #19)
> Accepting for Aurora but rejecting for Beta seems ok to me.  Arguably the
> biggest problem with this bug is that it makes the diagnosis of zombie
> compartments (bug 668871) much harder, but the per-compartment reporters you
> use to diagnose them only landed in FF7.

This, combined with the fact that it's time limited to a short time was why it was approval-beta-'d.
Comment 21 Justin Lebar (not reading bugmail) 2011-07-19 13:14:33 PDT
I have a TBPL zombie compartment sticking around for 5+ minutes.  Linux-64 2011-07-19 nightly.
Comment 22 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-19 13:15:17 PDT
File a new bug please.
Comment 23 Justin Lebar (not reading bugmail) 2011-07-19 13:30:04 PDT
Filed bug 672619.
Comment 24 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-07-19 16:18:06 PDT
(In reply to comment #21)
> I have a TBPL zombie compartment sticking around for 5+ minutes.  Linux-64
> 2011-07-19 nightly.

The "It's All Text" add-on appears to be at fault for jlebar's case.
Comment 25 Virgil Dicu [:virgil] [QA] 2011-08-19 08:51:57 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0

Could anyone provide a test case or clear STRs in order to have this issue verified?
Comment 26 Andrew McCreight [:mccr8] 2011-08-19 08:58:03 PDT
Well, to confirm the fixing of the initial symptom, you can do the following:

1. load http://tbpl.mozilla.org/
2. open about:memory in a new tab, look for a line that has something like compartment(http://tbpl.mozilla.org/) in it
3. close the tbpl tab
4. in about:memory, click "minimize memory usage" a couple of times
There should not be a line in about:memory that says compartment(http://tbpl.mozilla.org/) any more
Comment 27 Virgil Dicu [:virgil] [QA] 2011-08-22 01:18:35 PDT
Thanks for the feedback.

Mozilla/5.0 (Windows NT 5.1; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Windows NT 6.1; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0

The issue is no longer reproducible when following the STRs in comment 26.
Setting status to verified fixed.

Note You need to log in before you can comment on or make changes to this bug.