Open Bug 1277376 Opened 5 years ago Updated 1 month ago

twitter can leak overlay windows until all twitter tabs are closed


(Core :: DOM: Core & HTML, defect, P3)

48 Branch



Tracking Status
platform-rel --- -


(Reporter: bkelly, Unassigned)



(Whiteboard: [MemShrink:P3] btpp-active [platform-rel-Twitter])


(1 file, 4 obsolete files)

I use twitter heavily and have noticed it often starts to use a lot of memory.  When this occurs I tend to see some detached twitter windows and a lot of orphaned nodes in the main twitter window.  For example:

│    ├──188.28 MB (15.77%) -- top(, id=266)
│    │  ├──161.30 MB (13.51%) -- active
│    │  │  ├──148.79 MB (12.46%) -- window(
│    │  │  │  ├───84.65 MB (07.09%) -- dom
│    │  │  │  │   ├──64.27 MB (05.38%) ── orphan-nodes
│    │  │  │  │   ├──16.50 MB (01.38%) ── element-nodes
│    │  │  │  │   └───3.88 MB (00.33%) ++ (4 tiny)
│    │  │  │  ├───33.80 MB (02.83%) ++ layout
│    │  │  │  ├───24.03 MB (02.01%) ++ js-compartment(
│    │  │  │  └────6.30 MB (00.53%) ++ (2 tiny)
│    │  │  └───12.51 MB (01.05%) ++ (3 tiny)


│    │   ├───7.14 MB (00.60%) -- top(none)/detached
│    │   │   ├──0.95 MB (00.08%) ++ window(
│    │   │   ├──0.95 MB (00.08%) ++ window(
│    │   │   ├──0.93 MB (00.08%) ++ window(
│    │   │   ├──0.92 MB (00.08%) ++ window(
│    │   │   ├──0.92 MB (00.08%) ++ window(
│    │   │   ├──0.91 MB (00.08%) ++ window(
│    │   │   ├──0.78 MB (00.07%) ++ window(
│    │   │   └──0.78 MB (00.07%) ++ window(

I can usually get this to reproduce in a short period of time by doing the following steps:

1) Open a new browser window (e10s or non-e10s)
2) Launch in a tab
3) Click on a tweet to expand its card in an overlay
4) Close the overlay by clicking the "x" in the upper right.
5) Click on "notifications" at the top of the page
6) Click back on "home" at the top of the page
7) Open a new tab with about:memory, minimize, and measure.
8) If you see a detached twitter window, you are done.  Otherwise repeat from step (3).

From CC logs it seems these windows are being held alive via a message handler in the main window:

    $ /c/devel/heapgraph/ cc-edges.4988.1464807065.log 0000018E96C38000
    Parsing cc-edges.4988.1464807065.log. Done loading graph.
    0000018E8383F240 [JS Object (Window)]
        --[UnwrapDOMObject(obj)]--> 0000018E96C38000 [nsGlobalWindow # 174 inner
        Root 0000018E8383F240 is a marked GC object.
    bkelly@kosh /c/devel/tmp/cclogs/twitter-orphans
    $ /c/devel/heapgraph/ gc-edges.4988.1464807065.log -bro 0000018E8383F240
    Parsing gc-edges.4988.1464807065.log. Done loading graph.
    via mCallback :
    0000018E88433A40 [Function g]
        --[fun_environment]--> 0000018E8A56F480 [Call <no private>]
        --[e]--> 0000018E94B5D580 [Proxy <no private>]
        --[private]--> 0000018E89AEC2C0 [Proxy <no private>]
        --[private]--> 0000018E8383F240 [Window <no private>] 

Where the mCallback is:

ownload | new post

    $ /c/devel/heapgraph/ cc-edges.4988.1464807065.log 0000018E85588E40
    Parsing cc-edges.4988.1464807065.log. Done loading graph.
    0000018E9703FE00 [nsTimeout]
        --[mWindow]--> 0000018E859CAC00 [nsGlobalWindow # 58 inner]
        --[mListenerManager]--> 0000018E84A4ACA0 [EventListenerManager]
        --[mListeners event=onmessage listenerType=3 [i]]--> 0000018E85588E40 [CallbackObject]
        Root 0000018E9703FE00 is a ref counted object with 1 unknown edge(s).
        known edges:
           0000018E859CAC00 [nsGlobalWindow # 58 inner] --[]--> 0000018E9703FE00 

Looking at the site code it does use a lot of message event listeners.  It seems to register one for each "card" or tweet.  These are removed in the "destroyCard()" function.

There is a mechanism for detecting when cards are no longer visible and calling destroyCard().  I'm not sure yet if this is through mutation observers, scroll events, or something else.

I believe the steps leak the window because they prevent a destroyCard() from getting called for one of the tweets in the overlay window.

Its unclear to me yet if this is a page bug or we are not firing some event that is expected.  Filing this in DOM for now, but it might be tech evangelism.
FWIW, I see the detached window, but only for some time. Then it goes away.
Same with some orphan nodes. I see lots of them for some time and then they go away.
Whiteboard: [MemShrink] → [MemShrink] btpp-active
Whiteboard: [MemShrink] btpp-active → [MemShrink] btpp-active [platform-rel-Twitter]
platform-rel: --- → ?
I'm not actively working this, but I think we need to do something here.  Many sites like twitter, gmail, facebook use "cross domain messaging" technique.  We seem to hold the iframes alive longer than we need to degrading performance on these sites.
Assignee: bkelly → nobody
Do we understand succinctly what the cycle/etc is?
Flags: needinfo?(bkelly)
(In reply to Kyle Huey [:khuey] ( from comment #3)
> Do we understand succinctly what the cycle/etc is?

I believe our conclusion was that a cross-compartment message event handler was holding the iframe alive.  But these iframes are otherwise unreferenced and can't effectively receive messages any more.  We need to cut the message event handler in that case.
Flags: needinfo?(bkelly)
I pretty much always have junk like this lying around:

│    ├───22.24 MB (02.08%) -- top(none)/detached
│    │   ├──13.52 MB (01.26%) ++ window(,t/am=nhEPBNyTcX8wrmEUkJG-QmHee89nS8qPnOBF_QkDIFKrAPxv9v8APg_0oi0U/rt=h/d=1/rs=AHGWq9Cljn1CoKGkFsgMVkkOFiQOnZjjdg)
│    │   └───8.72 MB (00.82%) -- (6 tiny)
│    │       ├──4.52 MB (00.42%) ++ window(
│    │       ├──0.97 MB (00.09%) ++ window(
│    │       ├──0.90 MB (00.08%) ++ window(
│    │       ├──0.90 MB (00.08%) ++ window(
│    │       ├──0.88 MB (00.08%) ++ window(
│    │       └──0.55 MB (00.05%) ++ window(
We could try making a stripped down test using the xdm library:
We won't actually fire the event, so it wouldn't hurt to blow away any "message" event listeners in FreeInnerObject.
Would help to have a reasonable pass/fail condition so we know if it's worth it though.
If this is an easy patch to write, I could run with it under my current workload to see if it helps.
(In reply to Kyle Huey [:khuey] ( from comment #7)
> We won't actually fire the event, so it wouldn't hurt to blow away any
> "message" event listeners in FreeInnerObject.
> ef24c234ed53b3ba50a1734f6b946942e4434b5b/dom/base/PostMessageEvent.cpp#76
Last that I was looking at this, the message listener isn't in the Window object which is about to die, but in the parent Window object.
Given the sites we're seeing this on lets go with P1.
Whiteboard: [MemShrink] btpp-active [platform-rel-Twitter] → [MemShrink:P1] btpp-active [platform-rel-Twitter]
I checked this again today and got the following trace for a detached twitter window:

0000019B7D651920 [JS Object (Window)]
    --[UnwrapDOMObject(obj)]--> 0000019B545D6C00 [nsGlobalWindow # 2147484160 inner https://twitter.

    Root 0000019B7D651920 is a marked GC object.

via persistent-Object :
0000019B76A490A0 [HTMLScriptElement <no private>]
    --[shape]--> 0000019B83C88588 [shape]
    --[base]--> 0000019B7ADE1380 [base_shape]
    --[global]--> 0000019B7D651920 [Window <no private>]

It seems we're leaking an HTMLScriptElement somewhere via a PersistentRoot.  Not sure how to track this down.

The interesting thing, though, is that I don't see anything about message event listeners in here.
See Also: → 1287547
I triggered the detached overlay windows again.  They still seem tied back to a mCallback:

$ /c/devel/heapgraph/ cc-edges.5176.1468955077.log 000001EC9AA22400
Parsing cc-edges.5176.1468955077.log. Done loading graph.

000001EC7F63BA60 [JS Object (Window)]
    --[UnwrapDOMObject(obj)]--> 000001EC9AA22400 [nsGlobalWindow # 2147484652 inner https://twitter.

    Root 000001EC7F63BA60 is a marked GC object.

bkelly@kosh /c/devel/tmp/cclogs/twitter
$ /c/devel/heapgraph/ gc-edges.5176.1468955077.log -bro 000001EC7F63BA60
Parsing gc-edges.5176.1468955077.log. Done loading graph.

via mCallback :
000001EC896B9D00 [Function g]
    --[fun_environment]--> 000001EC70D69D00 [Call <no private>]
    --[e]--> 000001EA5FE944C0 [Proxy <no private>]
    --[private]--> 000001EC88B406C0 [Proxy <no private>]
    --[private]--> 000001EC7F63BA60 [Window <no private>]

Which is held in the onmessage event handler list.
platform-rel: ? → -
So I was thinking today that if the message event handling code was implemented in chrome script, then I think these references would be cleaned up when the outer window destruction triggers:

Maybe we could create a chrome proxy wrapper when a cross-compartment target is passed to addEventListener().  That would then allow the cross-compartment cutting code to clean things up here.

Assuming I understand this stuff properly, which I very well might not.

Olli, what do you think?
Flags: needinfo?(bugs)
Using chrome wrapper sounds odd.
And this same issue can happen with any callbacks.

We should clear the callback objects coming from "destroyed" compartments

(trying to find the bug where that was discussed)
Flags: needinfo?(bugs)
Yea, that would be preferable.  I just didn't know if we had an easy way to iterate over all callback objects in order to check for "destroyed" compartments.
mccr8, what do you think of adding another JSHolder hashtable. It would be used only for callbacks?

Though, one could argue we could just trace through all the jsholders and clear the js values which are coming from destroyed compartments.
A question is how to detect a compartment is destroyed
Flags: needinfo?(continuation)
> A question is how to detect a compartment is destroyed

Today we do it when a window is destroyed I believe.
Oh, you said "how" not "when".  I thought we could use this general logic:

Is there a reason NukeCrossCompartmentWrappers() style logic would not work here?
I'm a bit worried about performance. Going through all the jsholders whenever closing a window.
I guess we could collect closed compartments to some list, and next time we do forget_skippable in such way that we go through jsholders anyhow, we clear the jsvalues.
I have a WIP patch coming hopefully later today.
NukeCrossCompartmentWrappers() is what you want. I talked about this with Olli in IRC a bit. I'm also a little worried about performance here. We already have problems with quadratic behavior in NukeCrossCompartmentWrappers() (see bug 816784), and this function will make it worse. It has to iterate over all the holders, and do virtual callbacks on each one, for every window that is closed. When you shut down, you close a bunch of windows at once.
Flags: needinfo?(continuation)
We can't use NukeCrossCompartmentWrappers here since there are no ccw.

I'm now testing generations for this stuff and looks like globalwindows are usually destroyed properly after two CCs. So if we only rarely go through the held js stuff and clear only cases which we know are safe (like, if we explicitly make CallbackObjects safe and modify the code), that is at least one possibility there.
testing this approach.
er, I was wrong, we do have ccw here too (or may not have them), but we don't want to kill all the
ccw (that would be super risky), only the ones we know should be safe-ish to kill. Like, we don't end up calling event listeners anymore if the callback is from dying compartment, so that should be quite safe one to kill.

But, we have still one issue. CallbackObjects keep the global object alive explicitly. So need to clear the C++->JS_in_drying_compartment edges.

So, I have added something very special for callback objects.
Attached patch jsclearance.diff (obsolete) — Splinter Review

Let's if this compiles everywhere and what all this breaks.

The tiny bit worrisome part is that if iframe sets parent.onmessage, this will clear also that, even though that js function would be actually available still in the parent. (addEventListener doesn't have that issue)
But since window.dispatchEvent(new MessageEvent("message")); in parent context wouldn't call it, I hope it is ok. One would really need to explicitly call onfoo.
Attached patch v3 (obsolete) — Splinter Review
Missed to throw an exception in one case, but in general tryserver results look pretty good.

So, bkelly, feel tree to try out the opt builds (either from previous try run or this new one.)

If we end up doing this(, where the scary part is onfoo EventHandlers being cleared), we could also warn in Web Console if there are other CCWs pointing to dying js compartment and thus keeping it alive. That would be a separate bug of course, but might be rather useful for web devs.
Attachment #8783091 - Attachment is obsolete: true
er, that patch doesn't compile.

And I think I want to fix the onfoo handling somehow. Then the patch should be safe to land.
Attached patch v4 (obsolete) — Splinter Review
Don't clear EventHandlers

This is more conservative, since JS could access onfoo EventHandlers from EventTargets. EventListeners and other callbacks can be still optimized out by default.
Attached patch v5Splinter Review
Yet safer version. Only clear the Js values if they are _all_ coming from the dying compartment.

Note, javascript.options.asyncstack, which is enabled by default in Nightlies, may prevent this optimization since the stack may be coming from the other compartment.
Attachment #8783312 - Attachment is obsolete: true
Attachment #8783317 - Attachment is obsolete: true
I've been testing with the build from comment 30 most of the day.  Unfortunately it does not seem to prevent the twitter detached windows.

I dug into the gc/cc log again:

    $ /c/devel/heapgraph/ cc-edges.6740.1471894898.log 000002372E192400
    Parsing cc-edges.6740.1471894898.log. Done loading graph.
    000002374F9D1560 [JS Object (Window)]
        --[UnwrapDOMObject(obj)]--> 000002372E192400 [nsGlobalWindow # 285 inner
        Root 000002374F9D1560 is a marked GC object.
    bkelly@kosh /c/devel/tmp/cclogs/twitter
    $ /c/devel/heapgraph/ gc-edges.6740.1471894898.log -bro 000002374F9D1560
    Parsing gc-edges.6740.1471894898.log. Done loading graph.
    via mCallback :
    0000023753FC41C0 [Function g]
        --[fun_environment]--> 0000023735C26E00 [Call <no private>]
        --[e]--> 0000023731D9F200 [Proxy <no private>]
        --[private]--> 0000023731D9F060 [Proxy <no private>]
        --[private]--> 000002374F9D1560 [Window <no private>] 

Unfortunately the message handler itself is a same-compartment function.  It entrains the dead cross-compartment global through a closure fun_environment.

I wonder if this `e` variable in the fun_environment is even used or if its just getting sucked along with the rest of the closure environment.
From inspecting the fun_environment I'm fairly confident this is the `function g()` from the gc logs:

The `e` variable is `callerWindow`.
Looking at that code, the `callerWindow` is going to be captures by a number of other closures as well.  It seems highly likely this will leak no matter what we do to onmessage.

I think the question now is, should we keep the callback ccw cutting patch anyway.  It seems like a good thing, but maybe the complexity and perf costs don't match the pay off?
There should be very little performance cost, since the slow path is taken only when we actually have dying window and it has stayed in dying state for some time. And its compartment is handled just once.
But yes, the patch does add complexity.

I'm just testing locally whether the patch catches other issues.
Looks like it does at least in our chrome code (since for that I don't get any principal uri)
(In reply to Olli Pettay [:smaug] (vacation Aug 25-28) from comment #35)
> I'm just testing locally whether the patch catches other issues.
> Looks like it does at least in our chrome code (since for that I don't get
> any principal uri)

Sounds good to me.  We should probably land it in a separate bug, though.
I think perhaps the xdm code could avoid this sort of thing if it used a MessageChannel instead of the window directly.  Maybe we could prototype that and see if twitter will take it upstream.
Does the issue remain? If so, sounds like we should get it prioritize soon-ish.
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #38)
> Does the issue remain? If so, sounds like we should get it prioritize
> soon-ish.

Yes it does, but its unclear what we can do on our end.  The twitter script is holding on to the iframes after they are removed from the page.
Priority: -- → P3
Seems like there's not a lot we can do from the platform side, this is more of a tech evangelism issue.
Whiteboard: [MemShrink:P1] btpp-active [platform-rel-Twitter] → [MemShrink:P3] btpp-active [platform-rel-Twitter]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.