Closed Bug 1171127 Opened 9 years ago Closed 9 years ago

Ghost windows on washingtonpost.com with e10s enabled

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
e10s m8+ ---
firefox40 --- affected
firefox41 --- fixed

People

(Reporter: mccr8, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, perf, Whiteboard: [MemShrink:P1])

Attachments

(1 file)

With e10s, if you go to a story on washingtonpost.com, the scroll all the way to the bottom so various comment junk starts loading, then navigate to a new page while the loading is in progress, then you leak the page forever. I can't reproduce without e10s enabled.

I'm still investigating, but the problem seems to be an orphan object element being held alive by a networking channel. (The problem could be in how the DOM handles the channels, of course.)
tracking-e10s: --- → ?
For at least one of the orphan object elements, it seems like what is holding it alive is a ObjectInterfaceRequestorShim. The only thing referring to that is something allocated in mozilla::net::nsHttpHandler::NewProxiedChannel2(), which I think is an HttpChannelChild.

As for the channel, it looks like there's an nsDocument http://js.washingtonpost.com/pb/resources/js/echo2/sdk/v3/gui.pack.css that is holding it alive, something allocated under HttpChannelChild() (maybe a ChannelEventQueue?) and an nsPerformance object. The nsPerformance object is being referred to directly only by its wrapper, which seems weird to me. I don't know about the ChannelEventQueue, but the nsDocument and nsPerformance object are being held entrained (via a very long path) by the orphan object elements, so there's your cycle.

I'm not really sure what is going wrong here. Maybe something should have cleared the channel to shim edge?

My information here is coming from conservative heap scanning and allocation stacks, along with the cycle collector log. I should be able to rerun the analysis while the browser is running, which would let me poke around with a debugger.
jdm, do you have any ideas about what might be going wrong here? I'm guessing something akin to bug 1157468 where cleanup isn't being done in some condition.
Flags: needinfo?(josh)
See Also: → 1158648
So ObjectInterfaceRequestorShim is used as the notification callbacks on the channel nsObjectLoadingContent::OpenChannel opens.  It's also used as the channel's streamlistener.

Normally necko is responsible for nulling out both the streamlistener and the callbacks after OnStopRequest.  And in particular, HttpBaseChannel::ReleaseListeners nulls them both out and should be called OnStopRequest.  It also nulls out mProgressSink, which is another pointer to the same object.

And if the channel is being held by something other than the stack, it would seem like AsyncOpen succeeded, so we should have gotten OnStopRequest.

I guess we don't know which member of the channel is pointing to the shim?
HttpBaseChannel::ReleaseListeners() is not used on the child process.  One thing the channel doesn't throw away is the load group reference.  There are several places we remove the channel (self) from the load group, never readd (AFAIK) and never release the load group after that.
OK, I think we simply don't clean up correctly mProgressSink on the child.
The loadgroup should not be an issue.

But if we don't clean up mProgressSink/mCallbacks/mListener on the child, that should cause all sorts of leaks...
Sorry, I don't think I'll be able to provide any useful information here before I leave on vacation.
Flags: needinfo?(josh)
Jason, this sounds kinda bad. The leaks cause really bad cycle collector performance, which can lead to frequent, long pauses.

Is this something that could be fixed easily? Could you find someone to work on it?
Flags: needinfo?(jduell.mcbugs)
Assignee: nobody → valentin.gosu
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [MemShrink] → [MemShrink:P1]
Severity: normal → major
Keywords: mlk, perf
Releasing the listeners in OnStopRequest seems to ameliorate the issue - the entries for washingtonpost.com disappear from about:memory after a GC,CC,Minimize.
That's assuming I'm testing this correctly. Can someone check with the try build?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7681f9e4f7ab
I'm unable to reproduce the leak with the attached patch. Thanks!
I wonder if this should be fixed on b2g also.
Attachment #8622127 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/mozilla-central/rev/02eb7fe4cdd1
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Valentin, could you get approval for landing this on Aurora? People are running e10s there and it would be nice to get it fixed.
Flags: needinfo?(valentin.gosu)
Comment on attachment 8622127 [details] [diff] [review]
Listeners are not released in OnStopRequest in e10s mode

Approval Request Comment
[Feature/regressing bug #]:
e10s/HttpChannelChild
[User impact if declined]:
Rather large and long lasting memory leaks in the content process
[Describe test coverage new/current, TreeHerder]:
All green on TreeHerder. Has been on m-c for a few days.
[Risks and why]:
Low risk. Just clears some refPtrs in OnStopRequest.
[String/UUID change made/needed]:
None.
Flags: needinfo?(valentin.gosu)
Attachment #8622127 - Flags: approval-mozilla-aurora?
Comment on attachment 8622127 [details] [diff] [review]
Listeners are not released in OnStopRequest in e10s mode

We are going to merge aurora in beta next Monday. e10s is going to be disabled in beta. So, not taking it as it could have potential regressions.
Attachment #8622127 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Blocks: GhostWindows
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: