Ghost windows on washingtonpost.com with e10s enabled

RESOLVED FIXED in Firefox 41

Status

()

Core
Networking
--
major
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: mccr8, Assigned: valentin)

Tracking

(Blocks: 1 bug, {memory-leak, perf})

Trunk
mozilla41
memory-leak, perf
Points:
---

Firefox Tracking Flags

(e10sm8+, firefox40 affected, firefox41 fixed)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
With e10s, if you go to a story on washingtonpost.com, the scroll all the way to the bottom so various comment junk starts loading, then navigate to a new page while the loading is in progress, then you leak the page forever. I can't reproduce without e10s enabled.

I'm still investigating, but the problem seems to be an orphan object element being held alive by a networking channel. (The problem could be in how the DOM handles the channels, of course.)
(Reporter)

Updated

3 years ago
tracking-e10s: --- → ?
(Reporter)

Comment 1

3 years ago
For at least one of the orphan object elements, it seems like what is holding it alive is a ObjectInterfaceRequestorShim. The only thing referring to that is something allocated in mozilla::net::nsHttpHandler::NewProxiedChannel2(), which I think is an HttpChannelChild.

As for the channel, it looks like there's an nsDocument http://js.washingtonpost.com/pb/resources/js/echo2/sdk/v3/gui.pack.css that is holding it alive, something allocated under HttpChannelChild() (maybe a ChannelEventQueue?) and an nsPerformance object. The nsPerformance object is being referred to directly only by its wrapper, which seems weird to me. I don't know about the ChannelEventQueue, but the nsDocument and nsPerformance object are being held entrained (via a very long path) by the orphan object elements, so there's your cycle.

I'm not really sure what is going wrong here. Maybe something should have cleared the channel to shim edge?

My information here is coming from conservative heap scanning and allocation stacks, along with the cycle collector log. I should be able to rerun the analysis while the browser is running, which would let me poke around with a debugger.
(Reporter)

Comment 2

3 years ago
jdm, do you have any ideas about what might be going wrong here? I'm guessing something akin to bug 1157468 where cleanup isn't being done in some condition.
(Reporter)

Updated

3 years ago
Flags: needinfo?(josh)
(Reporter)

Updated

3 years ago
See Also: → bug 1158648
So ObjectInterfaceRequestorShim is used as the notification callbacks on the channel nsObjectLoadingContent::OpenChannel opens.  It's also used as the channel's streamlistener.

Normally necko is responsible for nulling out both the streamlistener and the callbacks after OnStopRequest.  And in particular, HttpBaseChannel::ReleaseListeners nulls them both out and should be called OnStopRequest.  It also nulls out mProgressSink, which is another pointer to the same object.

And if the channel is being held by something other than the stack, it would seem like AsyncOpen succeeded, so we should have gotten OnStopRequest.

I guess we don't know which member of the channel is pointing to the shim?
HttpBaseChannel::ReleaseListeners() is not used on the child process.  One thing the channel doesn't throw away is the load group reference.  There are several places we remove the channel (self) from the load group, never readd (AFAIK) and never release the load group after that.
OK, I think we simply don't clean up correctly mProgressSink on the child.
The loadgroup should not be an issue.

But if we don't clean up mProgressSink/mCallbacks/mListener on the child, that should cause all sorts of leaks...

Comment 7

3 years ago
Sorry, I don't think I'll be able to provide any useful information here before I leave on vacation.
Flags: needinfo?(josh)
Jason, this sounds kinda bad. The leaks cause really bad cycle collector performance, which can lead to frequent, long pauses.

Is this something that could be fixed easily? Could you find someone to work on it?
Flags: needinfo?(jduell.mcbugs)
(Assignee)

Updated

3 years ago
Assignee: nobody → valentin.gosu
(Assignee)

Updated

3 years ago
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [MemShrink] → [MemShrink:P1]
tracking-e10s: ? → m8+

Updated

3 years ago
Severity: normal → major
Keywords: mlk, perf
(Assignee)

Comment 9

3 years ago
Created attachment 8622127 [details] [diff] [review]
Listeners are not released in OnStopRequest in e10s mode
Attachment #8622127 - Flags: review?(jduell.mcbugs)
(Assignee)

Comment 10

3 years ago
Releasing the listeners in OnStopRequest seems to ameliorate the issue - the entries for washingtonpost.com disappear from about:memory after a GC,CC,Minimize.
That's assuming I'm testing this correctly. Can someone check with the try build?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7681f9e4f7ab
(Reporter)

Comment 11

3 years ago
I'm unable to reproduce the leak with the attached patch. Thanks!
I wonder if this should be fixed on b2g also.
Attachment #8622127 - Flags: review?(jduell.mcbugs) → review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/02eb7fe4cdd1
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
(Reporter)

Comment 15

3 years ago
Valentin, could you get approval for landing this on Aurora? People are running e10s there and it would be nice to get it fixed.
Flags: needinfo?(valentin.gosu)
(Assignee)

Comment 16

3 years ago
Comment on attachment 8622127 [details] [diff] [review]
Listeners are not released in OnStopRequest in e10s mode

Approval Request Comment
[Feature/regressing bug #]:
e10s/HttpChannelChild
[User impact if declined]:
Rather large and long lasting memory leaks in the content process
[Describe test coverage new/current, TreeHerder]:
All green on TreeHerder. Has been on m-c for a few days.
[Risks and why]:
Low risk. Just clears some refPtrs in OnStopRequest.
[String/UUID change made/needed]:
None.
Flags: needinfo?(valentin.gosu)
Attachment #8622127 - Flags: approval-mozilla-aurora?
status-firefox40: --- → affected
Comment on attachment 8622127 [details] [diff] [review]
Listeners are not released in OnStopRequest in e10s mode

We are going to merge aurora in beta next Monday. e10s is going to be disabled in beta. So, not taking it as it could have potential regressions.
Attachment #8622127 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
(Reporter)

Updated

2 years ago
Blocks: 1181677
You need to log in before you can comment on or make changes to this bug.