Closed Bug 1561887 Opened 5 years ago Closed 5 years ago

Memory leak in iframe when attaching functions to events

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla70

Tracking Flags:

Tracking

Status

firefox-esr60

---

wontfix

firefox-esr68

69+

fixed

firefox67

---

wontfix

firefox68

---

wontfix

firefox69

verified

firefox70

verified

People

(Reporter: grzegorz.w.1597, Assigned: bzbarsky)

References

(Regression)

Details

(Keywords: regression)

Attachments

(4 files)

index.html 5 years ago grzegorz.w.1597 2.46 KB, text/html		Details
Modified testcase that leaks before bug 1523843 5 years ago Boris Zbarsky [:bzbarsky] 2.02 KB, text/html		Details
Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink 5 years ago Boris Zbarsky [:bzbarsky] 47 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta+ RyanVM : approval-mozilla-esr68+	Details \| Review
Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug 5 years ago Boris Zbarsky [:bzbarsky] 47 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta+ RyanVM : approval-mozilla-esr68+	Details \| Review

grzegorz.w.1597

Reporter

Description

•

5 years ago

Attached file index.html — Details

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

create IFrame with button
attach function from global context to buttons onclick event, with local function work ok
remove iframe

Reproduce witch attached file:
Press "single" button, it will create iframe, attach action to button and remove it from body after 450 ms. "start" button will do it repeatedly in 500ms interval, pressing it again will stop it.
Unchecking "useBug" will change attaching global function to local function.

Leak start to appear after commit
https://hg.mozilla.org/mozilla-central/rev/1b528e3cac94793db32568726080b6cc1209b0e7
from bug 1523843. I bisected sources with attached script.
Commenting line

return aOptions.setExistingCompartment(data.compartment);
prevents memory leak

Workaround is using addEventListener or wrapping function call in closure.

Actual results:

Data from iframe is not freed.

Expected results:

Memory freed after removing iframe.

Adrian Florinescu [:aflorinescu]

Comment 1

•

5 years ago

:bz, according to the comment 0 bisection, this looks like a regression introduced by bug 1523843, can you please take a look?

Blocks: 1523843

status-firefox67: --- → affected

status-firefox68: --- → affected

status-firefox69: --- → affected

Component: Untriaged → DOM: Core & HTML

Flags: needinfo?(bzbarsky)

Keywords: regression

Product: Firefox → Core

Regressed by: 1523843

Neha Kochar [:neha]

Updated

•

5 years ago

status-firefox67: affected → wontfix

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

5 years ago

So just to make sure we're on the same page, here's what I see.

I load the testcase from a file:// URL; the "useBug" and "useEventListener" checkboxes are both checked.
I click the "start" button and wait for the iframe to flicker 20 times, then click the "stop" button.
I load about:memory in a separate tab, click "Minimize memory usage", then click "Measure".
I click on the "file:// content" link over on the right of about:memory.
I see a bunch of top(none)/detached/window(file:///Users/bzbarsky/test.html) window objects still live.

If I uncheck "useEventListener" before step 2, that does not seem to affect the behavior; I still get the live detached windows.

If I uncheck "useBug", then the detached windows get collected when I minimize memory, no matter whether "useEventListener" is checked or not.

This doesn't quite match my understanding of the comment 0 description, so I would like to understand whether we're really seeing different things or whether I misunderstood the description.

Flags: needinfo?(grzegorz.w.1597)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

5 years ago

Oh, and I do see that creating a new compartment for the subframe while following steps 1-5 with "useBug" checked makes the problem go away, so this definitely seems like a regression from bug 1523843.

No longer blocks: 1523843

Status: UNCONFIRMED → NEW

Ever confirmed: true

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

5 years ago

OK, if I ask for a "full" GC/CC log in about:memory, the buttons inside the iframe show up as "garbage" in the trace after the ========== bit and get collected.

If I ask for a "concise" GC/CC log, then they don't show up in the trace after ========== at all (though they do show up earlier, via the GetParent() edge from their child CharacterData and via the mFirstChild edge from the <body>. Which you'd think would be enough to collect them, since their refcount is 2 (!).

Flags: needinfo?(bzbarsky)

Andrew McCreight [:mccr8]

Comment 5

•

5 years ago

If an object doesn't show up after ====, that means the number of references is equal to the refcount (as you noticed) and the object is not garbage. It isn't garbage because some other object holding onto it is being held alive. You need to use the find_roots.py script in my heapgraph repo to figure out what. https://github.com/amccreight/heapgraph/

If it is being collected from a verbose GC log but not a concise that likely means some CC optimization is incorrectly being applied. Probably the easiest way to figure this out is to run the find_roots.py script on the concise log to figure out what object is holding the buttons alive, then find the equivalent object in the full CC log and see what the extra reference is (assuming that it is a refcounted object acting as the root.). If a GCed object is holding the button alive, then you can see what find_roots.py on the GC log, and the same rooting object, says, with the GC log from the concise case.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

5 years ago

If I uncheck "useEventListener" before step 2, that does not seem to affect the behavior; I still get the live detached windows.

Ah, I see. This is due to a typo in the testcase. useEventListener starts out false, but the checkbox state is initially set to useBug, so it's checked. If I fix that state initialization, then the bug reproduces precisely when useEventListener is false and useBug is true.

I think we're definitely on the same page here. Thank you for finding this and for the bug report! Analysis coming up.

Flags: needinfo?(grzegorz.w.1597)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

5 years ago

Thank you for finding this and for the bug report!

And for bisecting it to the right change, of course!

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

5 years ago

The issue in the provided testcase is that after bug 1523843 the callback and callback global of a CallbackObject may actually be from different (same-compartment) globals on the web. Before bug 1523843 this could only happen in system code with compartment-sharing...

In particular, this line in the testcase:

b.onclick = testFunction;

calls into the onclick setter from the subframe, so the current global in that setter is the subframe global. The function passed in, on the other hand, is from the parent global (before bug 1523843 we got a CCW from the subframe global to the parent global instead).

Now we go to cycle-collect this nsJSEventListener. Its CanSkip implementation returns true if the CallbackObject's callable is not gray. Before bug 1523843 this was false: the callable was the CCW, and the only thing holding on to the CCW was the CallbackObject itself, so it was in fact gray.

But now the callable is testFunction, which is of course black, because it's a global function in the parent window, so is definitely live. So we skip CCing the JSEventHandler, miss the edge to the CallbackObject, and now have a cycle we don't break: the CallbackObject references its mCallbackGlobal, which is the subframe's window, which references the subframe document via mDoc, which via mFirstChild/mNextSibling chains reaches the button b, which of course holds a ref to the JSEventHandler.

Before bug 1523843 this specific chain could not arise, because even if the callable CCW were not gray it would then have mCallbackGlobal as its actual global, so would be keeping it alive anyway.

But a similar leak can be produced before bug 1523843 by having the callback be non-gray (and in fact not a CCW) but making its mIncumbentJSGlobal be a different global from the callback's global and entrain things as described above.

Assignee: nobody → bzbarsky

Boris Zbarsky [:bzbarsky]

Assignee

Comment 9

•

5 years ago

Attached file Modified testcase that leaks before bug 1523843 — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 10

•

5 years ago

So really, JSEventHandler::IsBlackForCC is kinda buggy. It says:

  // We can claim to be black if all the things we reference are
  // effectively black already.

but then only tests the color of mTypedHandler.Ptr()->mCallback, but references all the other things that CallbackObject traces or traverses: mCallbackGlobal, mCreationStack, mIncumbentJSGlobal, and mIncumbentGlobal.

We can pretty easily add color-testing for the other three JS members (mCallbackGlobal, mCreationStack, mIncumbentJSGlobal). I guess mIncumbentJSGlobal, if black, keeps mIncumbentGlobal alive, so it's good enough to just check mIncumbentJSGlobal's color.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

5 years ago

Attached file Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 12

•

5 years ago

Attached file Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

5 years ago

What I don't know is how to write a test for this, or whether it's even sanely possible... Andrew, Olli, any ideas?

Flags: needinfo?(continuation)

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 14

•

5 years ago

I don't think we have too good ways to test some particular leaks, - well if one can make the leak a shutdown leak, then it should be easier.
We should or could add some CC log analysis thingie.

Flags: needinfo?(bugs)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

5 years ago

Yeah, I can't really make this one a shutdown leak, because once the parent page is unloaded the callback is no longer black and the cycle gets collected.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

5 years ago

One could perhaps utilize ghost window detection for the test case. mccr8 is more familiar with that.
But that is perhaps a tad heavy way.

Andrew McCreight [:mccr8]

Comment 17

•

5 years ago

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #13)

What I don't know is how to write a test for this, or whether it's even sanely possible... Andrew, Olli, any ideas?

I didn't read through the details in this bug very closely, but I've written some tests along these lines before. The basic idea is that you can hack up a weak reference to any (non-old DOM bindings) object by using it as a key in a weak map, then use SpecialPowers.exactGC() (with some additional forceCC and forceGC in the callback), and then do SpecialPowers.nondeterministicGetWeakMapKeys() to see if anything is still in the weak map. An example of this is js/xpconnect/tests/mochitest/test_weakmaps.html, but if you look for SpecialPowers.nondeterministicGetWeakMapKeys() you'll find a few more examples.

Flags: needinfo?(continuation)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

5 years ago

•

Edited

Ah, that is a nice trick. Had totally forgotten it.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 19

•

5 years ago

Nice, the suggestion in comment 17 works beatifully for a test!

Pulsebot

Comment 20

•

5 years ago

Pushed by bzbarsky@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/955593e562a5 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink https://hg.mozilla.org/integration/autoland/rev/e4a2b09bc49c part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug

Boris Zbarsky [:bzbarsky]

Assignee

Comment 21

•

5 years ago

Comment on attachment 9075880 [details]
Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug

Beta/Release Uplift Approval Request

User impact if declined: Memory leaks for the lifetime of the website on some sites.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Medium
Why is the change risky/not risky? (and alternatives if risky): It's changing some cycle collector logic, which is always a bit risky. That said, I think as cycle collection patches go this is pretty safe.
String changes made/needed:

Attachment #9075880 - Flags: approval-mozilla-beta?

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

5 years ago

Attachment #9075878 - Flags: approval-mozilla-beta?

Andrei Ciure[:aciure]

Comment 22

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/955593e562a5
https://hg.mozilla.org/mozilla-central/rev/e4a2b09bc49c

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox70: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla70

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox68: affected → wontfix

status-firefox-esr60: --- → unaffected

status-firefox-esr68: --- → affected

Ryan VanderMeulen [:RyanVM]

Comment 23

•

5 years ago

Comment on attachment 9075878 [details]
Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink

Cycle collector fix to avoid hanging on to memory for too long on some websites. Thanks for writing a patch too. Approved for 69.0b4.

Attachment #9075878 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Attachment #9075880 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Flags: in-testsuite+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

5 years ago

Fwiw, esr60 is affected too, for the more convoluted testcase. That said, I agree there's not much point in uplifting to 60.

It's worth uplifting to 68.

Ryan VanderMeulen [:RyanVM]

Comment 25

•

5 years ago

Sure, let's revisit 68 after this has had some bake time on Beta.

status-firefox-esr60: unaffected → wontfix

tracking-firefox69: --- → +

tracking-firefox70: --- → +

tracking-firefox-esr68: --- → 69+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 26

•

5 years ago

Comment on attachment 9075880 [details]
Bug 1561887 part 2. Fix cycle-collection skipping of JSEventHandler to work correctly. r=smaug

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixing a site-lifetime memory leak that is somewhat easy to trigger.
User impact if declined: Unbounded memory usage growth on some websites.
Fix Landed on Version: 70
Risk to taking this patch: Medium
Why is the change risky/not risky? (and alternatives if risky): It's changing some cycle collector logic, which is always a bit risky. That said, I think as cycle collection patches go this is pretty safe.
String or UUID changes made by this patch: None

Attachment #9075880 - Flags: approval-mozilla-esr68?

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

5 years ago

Attachment #9075878 - Flags: approval-mozilla-esr68?

Cosmin Sabou [:CosminS]

Comment 27

•

5 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/0af17a6e1a99
https://hg.mozilla.org/releases/mozilla-beta/rev/ad83746d1c80

status-firefox69: affected → fixed

Maria Berlinger [:mberlinger], Services QA

Updated

•

5 years ago

Flags: qe-verify+

Brindusa Tot, Desktop QA

Updated

•

5 years ago

QA Whiteboard: [qa-triaged]

Hani Yacoub, Desktop QA

Comment 28

•

5 years ago

I was trying to reproduce the bug based on the steps from comment 2, but I really can't understand step 4 (I click on the "file:// content" link over on the right of about:memory.)

Could you please clarify it to me?
Thanks.

QA Whiteboard: [qa-triaged]

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 29

•

5 years ago

The idea is to look for the memory report for the process that hosts the testcase in about:memory. In about:memory, once you have done a measurement, over on the right there is a box titled "Process index" that lists the processes with links you can click to go to the report for the relevant process. If you load the testcase from file://, that process will be labeled "file:// Content (pid NNNNN)" for some number NNNNN.

Flags: needinfo?(bzbarsky)

Brindusa Tot, Desktop QA

Updated

•

5 years ago

QA Whiteboard: [qa-triaged]

Hani Yacoub, Desktop QA

Comment 30

•

5 years ago

Verified as fixed on the latest Firefox Nightly 70.0a1 (2019-07-15) and on Firefox 69.0b4 on Windows 10 x64, Mac OS X 10.14 and on Ubuntu 18.04 x64.

Status: RESOLVED → VERIFIED

QA Whiteboard: [qa-triaged]

status-firefox69: fixed → verified

status-firefox70: fixed → verified

Flags: qe-verify+

Ryan VanderMeulen [:RyanVM]

Comment 31

•

5 years ago

Comment on attachment 9075878 [details]
Bug 1561887 part 1. Add an overload of JS::ObjectIsMarkedGray that works on TenuredHeap. r=sfink

This has baked on Nightly and Beta for a couple weeks without any known issues and has been verified by QA. Approved for 68.1esr.

Attachment #9075878 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Attachment #9075880 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 32

•

5 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr68/rev/19f09bb6f2af
https://hg.mozilla.org/releases/mozilla-esr68/rev/bde8d8dcdcf1

status-firefox-esr68: affected → fixed

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.