XHR with unknown edge in CC graph keeps its reflector alive past ~CycleCollectedJSRuntime, which transitively leaks js::ScriptSource objects

NEW
Unassigned

Status

()

3 years ago
3 years ago

People

(Reporter: fitzgen, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments, 2 obsolete attachments)

Discovered via the try push in bug 1211723 comment #76:

> Here is a try push without this patch series that puts all ScriptSources
> into a linked list on the runtime and asserts this list is empty upon
> destruction. If we are already leaking ScriptSources, it should show up in
> this push.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=41855b016a72

Which results in the following assertion failure, confirming that the `ScriptSource`s are outliving their JSRuntime:

> 18:02:54    ERROR -  PROCESS-CRASH | runner.py | application crashed [@ mozilla::LinkedList<js::ScriptSource>::~LinkedList()]
> 18:02:54     INFO -  Crash dump filename: /var/folders/n5/8vlsdw_12tddqc4b5pnh50t000000w/T/tmp4PXDcq.mozrunner/minidumps/64028645-A8CD-432A-A777-5D8333F3F330.dmp
> 18:02:54     INFO -  Operating system: Mac OS X
> 18:02:54     INFO -                    10.10.5 14F27
> 18:02:54     INFO -  CPU: amd64
> 18:02:54     INFO -       family 6 model 69 stepping 1
> 18:02:54     INFO -       4 CPUs
> 18:02:54     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
> 18:02:54     INFO -  Crash address: 0x0
> 18:02:54     INFO -  Process uptime: 8 seconds
> 18:02:54     INFO -  Thread 0 (crashed)
> 18:02:54     INFO -   0  XUL!mozilla::LinkedList<js::ScriptSource>::~LinkedList() [LinkedList.h:41855b016a72 : 329 + 0x0]
> 18:02:54     INFO -      rax = 0x0000000000000000   rdx = 0x00007fff794851f8
> 18:02:54     INFO -      rcx = 0x0000000000000000   rbx = 0x00007fff79485c50
> 18:02:54     INFO -      rsi = 0x00004e0000004e00   rdi = 0x00004d0000004e03
> 18:02:54     INFO -      rbp = 0x00007fff5fbfee20   rsp = 0x00007fff5fbfee10
> 18:02:54     INFO -       r8 = 0x00007fff5fbfedc0    r9 = 0x00007fff78a30300
> 18:02:54     INFO -      r10 = 0x0000000000000001   r11 = 0x0000000000000206
> 18:02:54     INFO -      r12 = 0x000000010f613000   r13 = 0x000000010f616b78
> 18:02:54     INFO -      r14 = 0x000000010f616a48   r15 = 0x000000010f613440
> 18:02:54     INFO -      rip = 0x00000001065076f2
> 18:02:54     INFO -      Found by: given as instruction pointer in context
> 18:02:54     INFO -   1  XUL!JSRuntime::~JSRuntime() [LinkedList.h:41855b016a72 : 328 + 0x5]
> 18:02:54     INFO -      rbx = 0x000000010f616b18   rbp = 0x00007fff5fbfeea0
> 18:02:54     INFO -      rsp = 0x00007fff5fbfee30   r12 = 0x000000010f613000
> 18:02:54     INFO -      r13 = 0x000000010f616b78   r14 = 0x000000010f616a48
> 18:02:54     INFO -      r15 = 0x000000010f613440   rip = 0x00000001064b59f5
> 18:02:54 INFO - Found by: call frame info

If the source is in the compressed source set, and is accessed again after the JSRuntime shuts down, I think this would lead to a UAF.
Still can't reproduce locally, but here is a try push with some logging of ScriptSource::incref and ScriptSource::decref: https://treeherder.mozilla.org/#/jobs?repo=try&revision=224b244cd640
Circling back to this. Was able to reproduce on a loaner machine, but I don't have debug symbols. Here is a try push with a patch to upload full symbols to the debug server: https://treeherder.mozilla.org/#/jobs?repo=try&revision=01c80f405f72
Created attachment 8743548 [details]
stack when we usually run js::ScriptSourceObject::finalize

The leaks are coming about because we are not finalizing `js::ScriptSourceObject`s when doing a final GC under ~JSRuntime. Attached is the stack we *usually* see when running `js::ScriptSourceObject::finalize`. Now we just need to figure out where things go wrong when doing s DESTROY_RUNTIME gc.
Perhaps we are just leaking ScriptSourceObjects past the final gc in ~JSRuntime? This try push adds a bunch of printfs to dump the GC retaining paths of each ScriptSourceObject that survives the DESTROY_RUNTIME gc.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d169a5698ea5
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #4)
> Perhaps we are just leaking ScriptSourceObjects past the final gc in
> ~JSRuntime? This try push adds a bunch of printfs to dump the GC retaining
> paths of each ScriptSourceObject that survives the DESTROY_RUNTIME gc.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d169a5698ea5

That doesn't yield any retaining paths after the DESTROY_RUNTIME gc, but to be extra paranoid, here is one that dumps paths right before the runtime's final gc to verify whether this logging is working at all.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=06ebb69070b8

If there really aren't any leaking ScriptSourceObjects, then that means we aren't calling finalizers some of the time, despoite finalizing the object.

Comment 7

3 years ago
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #5)
> If there really aren't any leaking ScriptSourceObjects, then that means we
> aren't calling finalizers some of the time, despoite finalizing the object.

... we don't try to run those finalizers off-main-thread, do we?
Er -- actual better version of that try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c188660d113f

/me sighs
Ok! Progress! (Finally!)

For some reason, I can't find the leaking ScriptSourceObjects after the final runtime GC -- I think we force-free the zones? However, I have a set of SSOs found via iterating all cells from all zones *before* the final GC. This set of SSOs are only *potentially* leaking, because they can get finalized in the GC that is about to happen. Well, I'm also logging SSO finalizations, so I can remove from this set all the SSOs that did get finalized and what I am left with is the set of SSOs that did not get their finalizers run and presumably leaked past the final DESTROY_RUNTIME gc.

I counted 9 SSOs in this leaking set. That matches the 9 ScriptSources I counted in the mozilla::LinkedList<ScriptSource> on the JSRuntime that I added for debugging. Pretty strong evidence that these are indeed the ScriptSourceObjects that are leaking and whose finalizers are not being run, resulting in leaking ScriptSources as well!

Will attach their logged retaining paths in one second.
Created attachment 8744635 [details]
logged retaining paths of SSOs known to be leaking
* 0x1132763c0 -- The only thing that really stands out to me about these paths are the `aLambda` edge. I DXR'd and the only (non irrelevant test) uses in tree are lazy module import getters: https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=aLambda&redirect=true

* 0x113276a00 -- The `scheduleWalkerLoop` and `schedulePromise` edges point to toolkit/modules/Promise-backend.js which IIRC is the implementation Promise.jsm. This file uses `defineLazyServiceGetter` which uses the `XPCOMUtils.defineLazyGetter` hinted at by the previous leaking SSO. https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/toolkit/modules/Promise-backend.js#56 Promise.jsm is used in the devtools server/transport which Marionette builds on top of. Perhaps this is why we only see this failure in marionette tests?

* 0x113276a40 -- Seeing `resolve` and `walkerLoop` edges. 99% sure this is more Promise.jsm stuff.

The others all seem to be self-hosting related, which makes sense in that if we are leaking any self hosted functions (which leaking Promise.jsm stuff is sure to do since there are arrays and bound functions) then we end up leaking the self hosted sources too.

I think we may need an "abort/cancel all promises for abrubt shutdown" function on Promise.jsm or something so we can break these retaining paths and delete the lazy module loaders.
Created attachment 8745444 [details]
leaking promises and dom exceptions

mAllocationStack: this is a DOM promise holding a SavedFrame stack alive

mStack: this is a DOMException holding a SavedFrame stack alive

The weakmap entry values: not 100% sure where these are coming from yet

Newest theory: the DOMExceptions and Promises are held by some runnable in Gecko's event queue, which are never going to run because we are about to restart. Perhaps the nsIAppStartup.quit method needs to clear the event queue?

It owuld really help if I could get CC logs out of marionette, but it seems to spawn firefox with a fresh environment :-/ I think it is time to hack up the test runner...
Created attachment 8745542 [details]
cc-edges.29089-8.log

Modified mozrunner to set up the CC logging environment variables that wouldn't get passed through marionette's test runner.

Here is the CC's path to the first root resulting in a leaking ScriptSourceObject:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_promiseToBlocker]--> 0x125423e80 [JS Object (Map)]
>     --[value]--> 0x125440520 [JS Object (Object)]
>     --[unboxed_object]--> 0x125477ab0 [JS Object (Function - Barrier/this.client.addBlocker/blocker.getOrigin)]
>     --[fun_environment]--> 0x125845800 [JS Object (Call)]
>     --[topFrame]--> 0x125475370 [JS Object (XPCWrappedNative_NoHelper)]
>     --[js::GetObjectPrivate(obj)]--> 0x121e41ac0 [XPCWrappedNative]
>     --[mIdentity]--> 0x121ffbe80 [JSStackFrame]
>     --[mStack]--> 0x125476400 [JS Object (SavedFrame)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

Here is the next:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_conditionToPromise]--> 0x125423e50 [JS Object (Map)]
>     --[value]--> 0x1254719a0 [JS Object (Proxy)]
>     --[private]--> 0x125e31d80 [JS Object (Object)]
>     --[{private:internals:14}]--> 0x125e5f040 [JS Object (Object)]
>     --[handlers]--> 0x125e6f040 [JS Object (Array)]
>     --[objectElements[0]]--> 0x125e74a40 [JS Object (Object)]
>     --[onResolve]--> 0x1254719e0 [JS Object (Proxy)]
>     --[private]--> 0x125477bf0 [JS Object (Function - PromiseSet.prototype.add/<)]
>     --[fun_environment]--> 0x125e371c0 [JS Object (Call)]
>     --[indirection]--> 0x1254719c0 [JS Object (Proxy)]
>     --[private]--> 0x125474560 [JS Object (Object)]
>     --[resolve]--> 0x125477b50 [JS Object (Function)]
>     --[nativeReserved[0]]--> 0x1254753a0 [JS Object (Promise)]
>     --[UnwrapDOMObject(obj)]--> 0x124e79380 [Promise]
>     --[mAllocationStack]--> 0x1254765e0 [JS Object (SavedFrame)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

And the last one:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_conditionToPromise]--> 0x125423e50 [JS Object (Map)]
>     --[key]--> 0x125471880 [JS Object (Proxy)]
>     --[private]--> 0x125477880 [JS Object (Function - Sanitizer.onStartup</<)]
>     --[fun_environment]--> 0x125e37220 [JS Object (Call)]
>     --[callee_slot]--> 0x125e27a40 [JS Object (Function - Sanitizer.onStartup<)]
>     --[fun_environment]--> 0x125464490 [JS Object (Block)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

------------------------------------------------------------------------------------------------

These three objects, which are rooted in the GC by the CC and "live" past the JSRuntime's final GC, are keeping alive about six ScriptSourceObjects, which then leak 1-6 ScriptSources permanently since the SSO's finalizers don't run.

Phew!
Marco, Yoric:

The nsIAsyncShutdownBarrier added in bug 1244650 (commit dae4f2a15d70) is holding JSObjects alive during the JSRuntime's final GC, which results in the whole set of other JSObjects reachable from that original leaking set to leak, which prevents the finalizers of those things from running, which ends up leaking ScriptSource instances and their associated source text. See comment 13.

https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/toolkit/components/places/Shutdown.h#111

Is there any way we can (a) not hold any JSObjects in this nsIAsyncShutdownBarrier at all, or (b) clear them out at some point before JS_DestroyRuntime gets called?

This is blocking me right now, so your help is very appreciated. Thanks!
Flags: needinfo?(mak77)
Flags: needinfo?(dteller)
Blocks: 1244650
I assume marionette here is just opening/closing Firefox and thus all the shutdown topics are properly notified.
That is strange, we null out both mParentClient and mBarrier at profile-before-change (cause the blocker is spinning over it)... on the other side those are nsMainThreadPtrHandle and, if we are off the main-thread, those could be released "at a later time", that means the runnable freeing those could be scheduled later than we expect...
I don't think setting those to null earlier (like at the end of BlockShutdown) would help, would it? We would still enqueue the proxy release runnable...
We need both the parent and the barrier to be able to properly block shutdown, and since we can be invoked off the main thread, we need to proxy release.

I really don't have good ideas atm on how to better solve this in a thread-safe manner...
Flags: needinfo?(mak77)
At which point in shutdown does the final GC happen? could we spin the events loop before doing it to be sure all proxy releases are done?
I wonder if there could be idiom problems with event loop spinning now that we have the microtask queue?  nsThread::ProcessNextEvent on the main thread will always end up invoking CycleCollectedJSRuntime::AfterProcessTask which performs a microtask checkpoint even if no event was processed.  If the micro-task queue enqueues an event, this can end up faking out things like NS_ProcessPendingEvents() which breaks as soon as !processedEvent.  (And ShutdownXPCOM does use NS_ProcessPendingEvents).

If this is happening, then modifying the NS_ProcessPendingEvents loop's clause:
    if (NS_FAILED(rv) || !processedEvent) {
      break;
    }
to
    if (NS_FAILED(rv) || (!processedEvent && !hasPendingEvents(aThread)) {
      break;
    }
should fix the problem.

At first glance it's not clear how to get the event loop into this state since the microtask queue always runs to completion and it should run after every event.  But it wouldn't surprise me if given the way native event loops invert event processing that it could happen somehow.
Marionette is using nsIAppStartup.quit() with the eRestart flag set in the failing test case.

It would make sense to me to simply empty the event and micro task queues in this situation, but I don't know the full repercussions this would have on gecko code.

The GC happens under ~JSRuntime, which is under JS_DestroyRuntime, which is under ~CycleCollectedJSRuntime[0]. That presumably runs when the CC determines that there are no more references to the CycleCollectedJSRuntime instance. Perhaps the nsIAsyncShutdownBarrier should hold a reference to the CycleCollectedJSRuntime until it finishes its work?

[0] https://dxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#471
(In reply to Andrew Sutherland [:asuth] from comment #17)
> If this is happening, then modifying the NS_ProcessPendingEvents loop's
> clause:
>     if (NS_FAILED(rv) || !processedEvent) {
>       break;
>     }
> to
>     if (NS_FAILED(rv) || (!processedEvent && !hasPendingEvents(aThread)) {
>       break;
>     }
> should fix the problem.

This change had no effect on the leaks, unfortunately.
For posterity:

Originally, I wasn't able to reproduce this locally, only on my loaner. Using my local builds on the loaner wouldn't reproduce either.

I was able to start reproducing the failure locally with this change:

--- a/testing/marionette/driver.js
+++ b/testing/marionette/driver.js
@@ -2753,6 +2753,12 @@ GeckoDriver.prototype.quitApplication = function(cmd, resp) {
   resp.send();
 
   this.sessionTearDown();
+
+  try {
+    let promiseJsm = Cu.import("resource://gre/modules/Promise.jsm", {});
+    promiseJsm.PromiseWalker = null;
+  } catch (_) { }
+
   Services.startup.quit(flags);
 };

Clearly this change will break various functionality, but we shouldn't end up leaking either way.
Ok, I think the Promise related issues are actually a /different/ bug of the same class.

Here is what I am getting as the CC's paths to the GC's roots keeping ScriptSourceObjects alive past the final GC and therefore leaking the ScriptSource objects on my loaner machine without the Promise.jsm changes:

A bunch of SSOs kept alive from this object:

0x127794800 [DOMEventTargetHelper ]
    --[mListenerManager]--> 0x126fc5fb0 [EventListenerManager]
    --[mListeners event=onerror listenerType=1 [i]]--> 0x1253a0670 [JSEventHandler handlerName=onerror]
    --[mTypedHandler.Ptr()]--> 0x1254855c0 [CallbackObject]
    --[mCreationStack]--> 0x12a500460 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a5002e0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500280 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500220 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a5001c0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500160 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c4160 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c4100 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c40a0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x126398fa0 [JS Object (SavedFrame)]
    --[global]--> 0x114d45d80 [JS Object (BackstagePass)]

    Root 0x127794800 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x128ed7a60 [JS Object (XMLHttpRequest)] --[UnwrapDOMObject(obj)]--> 0x127794800

A handful more SSOs kept alive from this one:

0x127794800 [DOMEventTargetHelper ]
    --[mListenerManager]--> 0x126fc5fb0 [EventListenerManager]
    --[mListeners event=onerror listenerType=1 [i]]--> 0x1253a0670 [JSEventHandler handlerName=onerror]
    --[mTypedHandler.Ptr()]--> 0x1254855c0 [CallbackObject]
    --[mCallback]--> 0x12a55ab00 [JS Object (Function - fetchRegionDefault/</request.onerror)]

    Root 0x127794800 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x128ed7a60 [JS Object (XMLHttpRequest)] --[UnwrapDOMObject(obj)]--> 0x127794800

And those two GC roots are keeping alive all the leaking SSOs.
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #18)
> The GC happens under ~JSRuntime, which is under JS_DestroyRuntime, which is
> under ~CycleCollectedJSRuntime[0]. That presumably runs when the CC
> determines that there are no more references to the CycleCollectedJSRuntime
> instance. Perhaps the nsIAsyncShutdownBarrier should hold a reference to the
> CycleCollectedJSRuntime until it finishes its work?

I'm still trying to understand the issue. The implementation of nsIAsyncShutdownBarrier is in JS, so how can the JSRuntime be taken down while we're still running JS code?

Also, I'm almost sure that we still hold instances of DOM Promise, which themselves require the CycleCollectedJSRuntime.
Flags: needinfo?(dteller) → needinfo?(nfitzgerald)
Created attachment 8747785 [details] [diff] [review]
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault`

When we didn't remove these handlers, we ended up leaking a bunch of stuff past
shutdown. We should really fix XHR to not hold GC roots past shutdown and
JS_DestroyRuntime, but this patch unblocks my other work.
Attachment #8747785 - Flags: review?(MattN+bmo)
Created attachment 8747794 [details] [diff] [review]
Clear JS holders in ~CycleCollectedJSRuntime; r=?

:smaug says this may be the more general fix. We'll find out soon...

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e64ff886f0e1
Created attachment 8747827 [details] [diff] [review]
Clear JS holders in ~CycleCollectedJSRuntime; r=?

Fix build.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fa0d34f2efc
Attachment #8747794 - Attachment is obsolete: true
Flags: needinfo?(nfitzgerald)
Comment on attachment 8747785 [details] [diff] [review]
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault`

Review of attachment 8747785 [details] [diff] [review]:
-----------------------------------------------------------------

This seems to be paving over a platform problem since most code in the tree doesn't null out its listeners therefore this is just going to postpone the problem to some other listener. i.e. I think we shouldn't need to null out the listeners as a general pattern so changing this one consumer to do so just adds confusion to readers about why we don't do this elsewhere.

I haven't reviewed search stuff in a long time so I'll defer to Florian who is more of an owner of this code since I'm not a fan of this one-off workaround. At a minimum I think `cleanUpAndResolve` should have a comment explaining why we're doing this but don't for other browser XHR consumers.

::: toolkit/components/search/nsSearchService.js
@@ +699,3 @@
>      // This notification is just for tests...
>      Services.obs.notifyObservers(request, SEARCH_SERVICE_TOPIC, "geoip-lookup-xhr-starting");
>      request.timeout = 100000; // 100 seconds as the last-chance fallback

Did you try lowering this timeout to 1 second to see if it also fixes the problem? Maybe the XHR is still waiting for a response by the time the test ends since automation points this to a fake hostname. Or maybe you're right that the XHR would hold the reference regardless of the XHR state.
Attachment #8747785 - Flags: review?(MattN+bmo) → review?(florian)
Comment on attachment 8747785 [details] [diff] [review]
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault`

I agree with everything Matt said in comment 26: this seems to be a workaround for a platform issue, and it'll be confusing to anybody looking at this code, so either a better solution should be found on the platform side, or a very descriptive comment should be added to avoid confusion.
Attachment #8747785 - Flags: review?(florian) → review-
(In reply to Matthew N. [:MattN] (behind on reviews) from comment #26)
> Comment on attachment 8747785 [details] [diff] [review]
> Remove event handlers from `request` in `fetchCountryCode` and
> `fetchRegionDefault`
> 
> Review of attachment 8747785 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This seems to be paving over a platform problem

True and fair. Fortunately :smaug's suggestion worked!

> I think we shouldn't need to null out the listeners as a general pattern

You won't leak at the platform level (after this patch, at the very least), but you may still end up holding lots of objects alive much longer than you might have expected that could have otherwise been collected. Garbage collectors don't make memory management go away, they just make it so when you mess up you end up bloating memory rather than creating critical security bugs.
Comment on attachment 8747827 [details] [diff] [review]
Clear JS holders in ~CycleCollectedJSRuntime; r=?

:smaug, this patch worked! Do you want to review this, or would you prefer to bounce it to another reviewer since you pretty much authored this patch?
Attachment #8747827 - Flags: review?(bugs)
Attachment #8747827 - Flags: review?(bugs) → review+
Thanks Andrew (I would have forwarded the request to you anyhow)
Thanks!
Need to null check the JS class map here, since that map goes away in ~XPCJSRuntime which happens before ~CycleCollectedJSRuntime which this XPCWrappedJSClass is being GC'd and destroyed under:

https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJSClass.cpp#172
Flags: needinfo?(nfitzgerald)
Created attachment 8748330 [details] [diff] [review]
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass

Going to keep mole-whacking until I get a green try push...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d6685a96be6
Attachment #8748330 - Flags: review?(continuation)
Attachment #8747785 - Attachment is obsolete: true
Comment on attachment 8748330 [details] [diff] [review]
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass

This looks fine to me, but I'm not too sure what I know about XPConnect shut down. Maybe Blake could look at it.
Attachment #8748330 - Flags: review?(continuation) → review?(mrbkap)
Here is a try push with only this bug's patches, as the next ones in the series seem to have suffered from a bad rebase:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=db7454e8c214
Comment on attachment 8748330 [details] [diff] [review]
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass

Review of attachment 8748330 [details] [diff] [review]:
-----------------------------------------------------------------

This seems a bit wonky to me in that it seems like we're wallpapering over a leak. That being said, after chatting on IRC with you and mccr8, I'm convinced that this isn't too bad of a wallpaper and it does make sense to do it.
Attachment #8748330 - Flags: review?(mrbkap) → review+
I think the ClearJSHolders is just changing us from "destroy all GC things, which clears all JS holders" into "clear all JS holders, then destroy all GC things", so we're just doing the same thing slightly earlier. It certainly could be hiding leaks but it is preserving the status quo at least.
Blocks: 1268992
No longer blocks: 1211723
Going to just continue leaking the script sources in bug 1211723, which avoids the UAF pitfalls that would otherwise happen.
Assignee: nfitzgerald → nobody
Status: ASSIGNED → NEW
Component: JavaScript Engine → XPCOM
Flags: needinfo?(nfitzgerald)
Summary: `ScriptSource`s outlive their JSRuntime in marionette tests → XHR with unknown edge in CC graph keeps its reflector alive past ~CycleCollectedJSRuntime, which transitively leaks js::ScriptSource objects
You need to log in before you can comment on or make changes to this bug.