Memory leak in iframe when attaching functions to events
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: grzegorz.w.1597, Assigned: bzbarsky)
References
(Regression)
Details
(Keywords: regression)
Attachments
(4 files)
2.46 KB,
text/html
|
Details | |
2.02 KB,
text/html
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0
Steps to reproduce:
- create IFrame with button
- attach function from global context to buttons onclick event, with local function work ok
- remove iframe
Reproduce witch attached file:
Press "single" button, it will create iframe, attach action to button and remove it from body after 450 ms. "start" button will do it repeatedly in 500ms interval, pressing it again will stop it.
Unchecking "useBug" will change attaching global function to local function.
Leak start to appear after commit
https://hg.mozilla.org/mozilla-central/rev/1b528e3cac94793db32568726080b6cc1209b0e7
from bug 1523843. I bisected sources with attached script.
Commenting line
return aOptions.setExistingCompartment(data.compartment);
prevents memory leak
Workaround is using addEventListener or wrapping function call in closure.
Actual results:
Data from iframe is not freed.
Expected results:
Memory freed after removing iframe.
Comment 1•5 years ago
|
||
:bz, according to the comment 0 bisection, this looks like a regression introduced by bug 1523843, can you please take a look?
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
So just to make sure we're on the same page, here's what I see.
- I load the testcase from a file:// URL; the "useBug" and "useEventListener" checkboxes are both checked.
- I click the "start" button and wait for the iframe to flicker 20 times, then click the "stop" button.
- I load about:memory in a separate tab, click "Minimize memory usage", then click "Measure".
- I click on the "file:// content" link over on the right of about:memory.
- I see a bunch of
top(none)/detached/window(file:///Users/bzbarsky/test.html)
window objects still live.
If I uncheck "useEventListener" before step 2, that does not seem to affect the behavior; I still get the live detached windows.
If I uncheck "useBug", then the detached windows get collected when I minimize memory, no matter whether "useEventListener" is checked or not.
This doesn't quite match my understanding of the comment 0 description, so I would like to understand whether we're really seeing different things or whether I misunderstood the description.
Assignee | ||
Comment 3•5 years ago
|
||
Oh, and I do see that creating a new compartment for the subframe while following steps 1-5 with "useBug" checked makes the problem go away, so this definitely seems like a regression from bug 1523843.
Assignee | ||
Comment 4•5 years ago
|
||
OK, if I ask for a "full" GC/CC log in about:memory, the buttons inside the iframe show up as "garbage" in the trace after the ==========
bit and get collected.
If I ask for a "concise" GC/CC log, then they don't show up in the trace after ==========
at all (though they do show up earlier, via the GetParent()
edge from their child CharacterData and via the mFirstChild
edge from the <body>
. Which you'd think would be enough to collect them, since their refcount is 2 (!).
Comment 5•5 years ago
|
||
If an object doesn't show up after ====, that means the number of references is equal to the refcount (as you noticed) and the object is not garbage. It isn't garbage because some other object holding onto it is being held alive. You need to use the find_roots.py script in my heapgraph repo to figure out what. https://github.com/amccreight/heapgraph/
If it is being collected from a verbose GC log but not a concise that likely means some CC optimization is incorrectly being applied. Probably the easiest way to figure this out is to run the find_roots.py script on the concise log to figure out what object is holding the buttons alive, then find the equivalent object in the full CC log and see what the extra reference is (assuming that it is a refcounted object acting as the root.). If a GCed object is holding the button alive, then you can see what find_roots.py on the GC log, and the same rooting object, says, with the GC log from the concise case.
Assignee | ||
Comment 6•5 years ago
|
||
If I uncheck "useEventListener" before step 2, that does not seem to affect the behavior; I still get the live detached windows.
Ah, I see. This is due to a typo in the testcase. useEventListener
starts out false, but the checkbox state is initially set to useBug
, so it's checked. If I fix that state initialization, then the bug reproduces precisely when useEventListener
is false and useBug
is true.
I think we're definitely on the same page here. Thank you for finding this and for the bug report! Analysis coming up.
Assignee | ||
Comment 7•5 years ago
|
||
Thank you for finding this and for the bug report!
And for bisecting it to the right change, of course!
Assignee | ||
Comment 8•5 years ago
|
||
The issue in the provided testcase is that after bug 1523843 the callback and callback global of a CallbackObject
may actually be from different (same-compartment) globals on the web. Before bug 1523843 this could only happen in system code with compartment-sharing...
In particular, this line in the testcase:
b.onclick = testFunction;
calls into the onclick
setter from the subframe, so the current global in that setter is the subframe global. The function passed in, on the other hand, is from the parent global (before bug 1523843 we got a CCW from the subframe global to the parent global instead).
Now we go to cycle-collect this nsJSEventListener. Its CanSkip implementation returns true if the CallbackObject
's callable is not gray. Before bug 1523843 this was false: the callable was the CCW, and the only thing holding on to the CCW was the CallbackObject
itself, so it was in fact gray.
But now the callable is testFunction
, which is of course black, because it's a global function in the parent window, so is definitely live. So we skip CCing the JSEventHandler
, miss the edge to the CallbackObject
, and now have a cycle we don't break: the CallbackObject
references its mCallbackGlobal
, which is the subframe's window, which references the subframe document via mDoc
, which via mFirstChild
/mNextSibling
chains reaches the button b
, which of course holds a ref to the JSEventHandler
.
Before bug 1523843 this specific chain could not arise, because even if the callable CCW were not gray it would then have mCallbackGlobal
as its actual global, so would be keeping it alive anyway.
But a similar leak can be produced before bug 1523843 by having the callback be non-gray (and in fact not a CCW) but making its mIncumbentJSGlobal
be a different global from the callback's global and entrain things as described above.
Assignee | ||
Comment 9•5 years ago
|
||
Assignee | ||
Comment 10•5 years ago
|
||
So really, JSEventHandler::IsBlackForCC
is kinda buggy. It says:
// We can claim to be black if all the things we reference are
// effectively black already.
but then only tests the color of mTypedHandler.Ptr()->mCallback
, but references all the other things that CallbackObject
traces or traverses: mCallbackGlobal
, mCreationStack
, mIncumbentJSGlobal
, and mIncumbentGlobal
.
We can pretty easily add color-testing for the other three JS members (mCallbackGlobal
, mCreationStack
, mIncumbentJSGlobal
). I guess mIncumbentJSGlobal
, if black, keeps mIncumbentGlobal
alive, so it's good enough to just check mIncumbentJSGlobal
's color.
Assignee | ||
Comment 11•5 years ago
|
||
Assignee | ||
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
What I don't know is how to write a test for this, or whether it's even sanely possible... Andrew, Olli, any ideas?
Comment 14•5 years ago
|
||
I don't think we have too good ways to test some particular leaks, - well if one can make the leak a shutdown leak, then it should be easier.
We should or could add some CC log analysis thingie.
Assignee | ||
Comment 15•5 years ago
|
||
Yeah, I can't really make this one a shutdown leak, because once the parent page is unloaded the callback is no longer black and the cycle gets collected.
Comment 16•5 years ago
|
||
One could perhaps utilize ghost window detection for the test case. mccr8 is more familiar with that.
But that is perhaps a tad heavy way.
Comment 17•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #13)
What I don't know is how to write a test for this, or whether it's even sanely possible... Andrew, Olli, any ideas?
I didn't read through the details in this bug very closely, but I've written some tests along these lines before. The basic idea is that you can hack up a weak reference to any (non-old DOM bindings) object by using it as a key in a weak map, then use SpecialPowers.exactGC() (with some additional forceCC and forceGC in the callback), and then do SpecialPowers.nondeterministicGetWeakMapKeys() to see if anything is still in the weak map. An example of this is js/xpconnect/tests/mochitest/test_weakmaps.html, but if you look for SpecialPowers.nondeterministicGetWeakMapKeys() you'll find a few more examples.
Comment 18•5 years ago
•
|
||
Ah, that is a nice trick. Had totally forgotten it.
Assignee | ||
Comment 19•5 years ago
|
||
Nice, the suggestion in comment 17 works beatifully for a test!
Comment 20•5 years ago
|
||
Assignee | ||
Comment 21•5 years ago
|
||
Comment on attachment 9075880 [details]
Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug
Beta/Release Uplift Approval Request
- User impact if declined: Memory leaks for the lifetime of the website on some sites.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): It's changing some cycle collector logic, which is always a bit risky. That said, I think as cycle collection patches go this is pretty safe.
- String changes made/needed:
Assignee | ||
Updated•5 years ago
|
Comment 22•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/955593e562a5
https://hg.mozilla.org/mozilla-central/rev/e4a2b09bc49c
Updated•5 years ago
|
Comment 23•5 years ago
|
||
Comment on attachment 9075878 [details]
Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink
Cycle collector fix to avoid hanging on to memory for too long on some websites. Thanks for writing a patch too. Approved for 69.0b4.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 24•5 years ago
|
||
Fwiw, esr60 is affected too, for the more convoluted testcase. That said, I agree there's not much point in uplifting to 60.
It's worth uplifting to 68.
Comment 25•5 years ago
|
||
Sure, let's revisit 68 after this has had some bake time on Beta.
Assignee | ||
Comment 26•5 years ago
|
||
Comment on attachment 9075880 [details]
Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixing a site-lifetime memory leak that is somewhat easy to trigger.
- User impact if declined: Unbounded memory usage growth on some websites.
- Fix Landed on Version: 70
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): It's changing some cycle collector logic, which is always a bit risky. That said, I think as cycle collection patches go this is pretty safe.
- String or UUID changes made by this patch: None
Assignee | ||
Updated•5 years ago
|
Comment 27•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Updated•5 years ago
|
Comment 28•5 years ago
|
||
I was trying to reproduce the bug based on the steps from comment 2, but I really can't understand step 4 (I click on the "file:// content" link over on the right of about:memory.)
Could you please clarify it to me?
Thanks.
Assignee | ||
Comment 29•5 years ago
|
||
The idea is to look for the memory report for the process that hosts the testcase in about:memory. In about:memory, once you have done a measurement, over on the right there is a box titled "Process index" that lists the processes with links you can click to go to the report for the relevant process. If you load the testcase from file://, that process will be labeled "file:// Content (pid NNNNN)" for some number NNNNN.
Updated•5 years ago
|
Comment 30•5 years ago
|
||
Verified as fixed on the latest Firefox Nightly 70.0a1 (2019-07-15) and on Firefox 69.0b4 on Windows 10 x64, Mac OS X 10.14 and on Ubuntu 18.04 x64.
Comment 31•5 years ago
|
||
Comment on attachment 9075878 [details]
Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink
This has baked on Nightly and Beta for a couple weeks without any known issues and has been verified by QA. Approved for 68.1esr.
Updated•5 years ago
|
Comment 32•5 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Description
•