Closed Bug 671053 Opened 13 years ago Closed 13 years ago

tbpl.mozilla.org causes a zombie compartment for 2-3 minutes because an XHR is holding things alive

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla7
Tracking Status
firefox6 - wontfix
firefox7 - fixed

People

(Reporter: mccr8, Assigned: khuey)

References

Details

(Keywords: regression, Whiteboard: [MemShrink:P1])

Attachments

(1 file)

After you close a tab with tbpl.mozilla.org, a compartment sticks around for 2-3 minutes, even if you use "minimize memory usage" many times.  This is also seems to be about the same length of time as the refresh timer is set for, so perhaps it is related to the timer.
Whiteboard: [MemShrink]
See early comments in Bug 668871 for discussion of this.
Assignee: general → khuey
Whiteboard: [MemShrink] → [MemShrink:P2]
I tried calling ClearAllTimeouts() in nsGlobalWindow::PostHandleEvent when NS_PAGE_UNLOAD is called (which happens when the tab is closed). This didn't help with the issue I was sad to see.
Poking around during the initial cycle collection ... we don't appear to have any DOM timeouts sitting around.
When I looked into this before, I saw that these objects were rooting the TBPL window (copied over from the other bug):

    0x22679a0 [nsXPCWrappedJS (nsIDOMEventListener) [1/3]]
        --[]-> 0x7f5b55052be0 [JS Object (Function) (global=7f5b60ab4088)]
        --[]-> 0x7f5b5509a480 [JS Object (Function) (global=7f5b60ab4088)]
        --[]-> 0x7f5b60ab4088 [JS Object (Window) (global=7f5b60ab4088)]

    0x1e9d0c0 [nsJSContext [45/47]]
        --[mContext]-> 0x275c840 [JSContext]
        --[[global object]]-> 0x7f5b60a07028 [JS Object (Proxy) (global=7f5b60ab4088)]
        --[]-> 0x7f5b60ab4088 [JS Object (Window) (global=7f5b60ab4088)]

I think somebody in IRC said that the nsJSContext was expected, given the other thing.  jlebar also saw the window being held by nsXPCWrappedJS (nsIDOMEventListener).
My current theory is that this is related to XHR.  Loading TBPL in a tab creates five XHRs, three go away when the tab is closed.  A docshell is destroyed immediately after the last two XHRs die, and "Minimize Memory" after that docshell is gone results in the tbpl windows and compartment being collected.

Still investigating what's going on with the XHR ...
Yes, this is definitely the XHR.  There is an XHR to 'http://hg.mozilla.org/mozilla-central/json-pushes?full=1&maxhours=24' which is being held alive by Necko.  The XHR holds onto an event listener (which the the nsXPCWrappedJS above) which holds onto the Window ...
Component: JavaScript Engine → DOM
QA Contact: general → general
Summary: tbpl.mozilla.org leaks a compartment for 2-3 minutes after the tab is closed → tbpl.mozilla.org leaks a compartment for 2-3 minutes after the tab is closed because an XHR is holding things alive
I understand this now, and it's a regression from Bug 623948.  A patch later today.
Blocks: 623948
Component: DOM → Networking: HTTP
Keywords: regression
QA Contact: general → networking.http
Attached patch PatchSplinter Review
Bug 623948 added a reference from nsHttpConnections to an nsIInterfaceRequestor used for callbacks.  The problem is that nsHttpConnections can live several minutes after the channel is closed.  The fix is to drop the reference when the http transaction ends, instead of when the connection is finally torn down or reused.
Attachment #545784 - Flags: review?
Attachment #545784 - Flags: review? → review?(mcmanus)
Upgrading this to MemShrink:P1 per conversation with njn because it obscures any other time-limited compartment leak and is very easy to hit.
Whiteboard: [MemShrink:P2] → [MemShrink:P1]
Comment on attachment 545784 [details] [diff] [review]
Patch

Almost +r'd this--seems simple enough.  But want to make sure Patrick or Honza looks over it to make sure we don't need mCallbacks at some point after CloseTransaction.
OS: Mac OS X → All
Hardware: x86 → All
There is a test I want to run first before completing the review.. I'll have it done today.
CloseTransaction is always called with a failure code.  Right after these new lines we also close the connection with call of Close on it self.  That method releases the connection as callbacks from socket transport and also from security info object ; those two should be AFAIK only two consumers of nsHttpConnection as callbacks.

More importantly, before the HTTP syn retry patch, nsHttpConnection::GetInterface code was querying mTransaction, if it were there (!), for callbacks.  And we release mTransaction just above the new lines.

So IMO the patch is OK and fixes this HTTP syn retry patch leak regression.
Comment on attachment 545784 [details] [diff] [review]
Patch

Thanks - the ref needs to be held across init/activate, but its fine to drop it at closetransaction.
Attachment #545784 - Flags: review?(mcmanus) → review+
Thanks for the quick reviews everyone.
http://hg.mozilla.org/mozilla-central/rev/a4d88ea926ec

Setting in-testsuite? so I write a test after I write the lifetime testing code.
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Comment on attachment 545784 [details] [diff] [review]
Patch

I think we should take this on branches.  The fix is pretty simple, and we haven't shipped the regression yet.
Attachment #545784 - Flags: approval-mozilla-beta?
Attachment #545784 - Flags: approval-mozilla-aurora?
just to be clear, this delayed freeing of memory is going to scale based on numbers of live connections - not number of transactions or xhrs or something.. and the former is a fairly low number (maxxing out at 6 per unique host).

with that being said you might want to get a little air time with the fix before promoting to beta. or you might not :)
Attachment #545784 - Flags: approval-mozilla-beta?
Attachment #545784 - Flags: approval-mozilla-beta-
Attachment #545784 - Flags: approval-mozilla-aurora?
Attachment #545784 - Flags: approval-mozilla-aurora+
Accepting for Aurora but rejecting for Beta seems ok to me.  Arguably the biggest problem with this bug is that it makes the diagnosis of zombie compartments (bug 668871) much harder, but the per-compartment reporters you use to diagnose them only landed in FF7.
Summary: tbpl.mozilla.org leaks a compartment for 2-3 minutes after the tab is closed because an XHR is holding things alive → tbpl.mozilla.org causes a zombie compartment for 2-3 minutes because an XHR is holding things alive
(In reply to comment #19)
> Accepting for Aurora but rejecting for Beta seems ok to me.  Arguably the
> biggest problem with this bug is that it makes the diagnosis of zombie
> compartments (bug 668871) much harder, but the per-compartment reporters you
> use to diagnose them only landed in FF7.

This, combined with the fact that it's time limited to a short time was why it was approval-beta-'d.
I have a TBPL zombie compartment sticking around for 5+ minutes.  Linux-64 2011-07-19 nightly.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
File a new bug please.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
(In reply to comment #21)
> I have a TBPL zombie compartment sticking around for 5+ minutes.  Linux-64
> 2011-07-19 nightly.

The "It's All Text" add-on appears to be at fault for jlebar's case.
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0

Could anyone provide a test case or clear STRs in order to have this issue verified?
Well, to confirm the fixing of the initial symptom, you can do the following:

1. load http://tbpl.mozilla.org/
2. open about:memory in a new tab, look for a line that has something like compartment(http://tbpl.mozilla.org/) in it
3. close the tbpl tab
4. in about:memory, click "minimize memory usage" a couple of times
There should not be a line in about:memory that says compartment(http://tbpl.mozilla.org/) any more
Thanks for the feedback.

Mozilla/5.0 (Windows NT 5.1; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Windows NT 6.1; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0

The issue is no longer reproducible when following the STRs in comment 26.
Setting status to verified fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.