1261106 - XHR with unknown edge in CC graph keeps its reflector alive past ~CycleCollectedJSRuntime, which transitively leaks js::ScriptSource objects

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Description

•

8 years ago

Discovered via the try push in bug 1211723 comment #76:

> Here is a try push without this patch series that puts all ScriptSources
> into a linked list on the runtime and asserts this list is empty upon
> destruction. If we are already leaking ScriptSources, it should show up in
> this push.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=41855b016a72

Which results in the following assertion failure, confirming that the `ScriptSource`s are outliving their JSRuntime:

> 18:02:54    ERROR -  PROCESS-CRASH | runner.py | application crashed [@ mozilla::LinkedList<js::ScriptSource>::~LinkedList()]
> 18:02:54     INFO -  Crash dump filename: /var/folders/n5/8vlsdw_12tddqc4b5pnh50t000000w/T/tmp4PXDcq.mozrunner/minidumps/64028645-A8CD-432A-A777-5D8333F3F330.dmp
> 18:02:54     INFO -  Operating system: Mac OS X
> 18:02:54     INFO -                    10.10.5 14F27
> 18:02:54     INFO -  CPU: amd64
> 18:02:54     INFO -       family 6 model 69 stepping 1
> 18:02:54     INFO -       4 CPUs
> 18:02:54     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
> 18:02:54     INFO -  Crash address: 0x0
> 18:02:54     INFO -  Process uptime: 8 seconds
> 18:02:54     INFO -  Thread 0 (crashed)
> 18:02:54     INFO -   0  XUL!mozilla::LinkedList<js::ScriptSource>::~LinkedList() [LinkedList.h:41855b016a72 : 329 + 0x0]
> 18:02:54     INFO -      rax = 0x0000000000000000   rdx = 0x00007fff794851f8
> 18:02:54     INFO -      rcx = 0x0000000000000000   rbx = 0x00007fff79485c50
> 18:02:54     INFO -      rsi = 0x00004e0000004e00   rdi = 0x00004d0000004e03
> 18:02:54     INFO -      rbp = 0x00007fff5fbfee20   rsp = 0x00007fff5fbfee10
> 18:02:54     INFO -       r8 = 0x00007fff5fbfedc0    r9 = 0x00007fff78a30300
> 18:02:54     INFO -      r10 = 0x0000000000000001   r11 = 0x0000000000000206
> 18:02:54     INFO -      r12 = 0x000000010f613000   r13 = 0x000000010f616b78
> 18:02:54     INFO -      r14 = 0x000000010f616a48   r15 = 0x000000010f613440
> 18:02:54     INFO -      rip = 0x00000001065076f2
> 18:02:54     INFO -      Found by: given as instruction pointer in context
> 18:02:54     INFO -   1  XUL!JSRuntime::~JSRuntime() [LinkedList.h:41855b016a72 : 328 + 0x5]
> 18:02:54     INFO -      rbx = 0x000000010f616b18   rbp = 0x00007fff5fbfeea0
> 18:02:54     INFO -      rsp = 0x00007fff5fbfee30   r12 = 0x000000010f613000
> 18:02:54     INFO -      r13 = 0x000000010f616b78   r14 = 0x000000010f616a48
> 18:02:54     INFO -      r15 = 0x000000010f613440   rip = 0x00000001064b59f5
> 18:02:54 INFO - Found by: call frame info

If the source is in the compressed source set, and is accessed again after the JSRuntime shuts down, I think this would lead to a UAF.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 1

•

8 years ago

Still can't reproduce locally, but here is a try push with some logging of ScriptSource::incref and ScriptSource::decref: https://treeherder.mozilla.org/#/jobs?repo=try&revision=224b244cd640

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 2

•

8 years ago

Circling back to this. Was able to reproduce on a loaner machine, but I don't have debug symbols. Here is a try push with a patch to upload full symbols to the debug server: https://treeherder.mozilla.org/#/jobs?repo=try&revision=01c80f405f72

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 3

•

8 years ago

Attached file stack when we usually run js::ScriptSourceObject::finalize — Details

The leaks are coming about because we are not finalizing `js::ScriptSourceObject`s when doing a final GC under ~JSRuntime. Attached is the stack we *usually* see when running `js::ScriptSourceObject::finalize`. Now we just need to figure out where things go wrong when doing s DESTROY_RUNTIME gc.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 4

•

8 years ago

Perhaps we are just leaking ScriptSourceObjects past the final gc in ~JSRuntime? This try push adds a bunch of printfs to dump the GC retaining paths of each ScriptSourceObject that survives the DESTROY_RUNTIME gc.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d169a5698ea5

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 5

•

8 years ago

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #4)
> Perhaps we are just leaking ScriptSourceObjects past the final gc in
> ~JSRuntime? This try push adds a bunch of printfs to dump the GC retaining
> paths of each ScriptSourceObject that survives the DESTROY_RUNTIME gc.
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d169a5698ea5

That doesn't yield any retaining paths after the DESTROY_RUNTIME gc, but to be extra paranoid, here is one that dumps paths right before the runtime's final gc to verify whether this logging is working at all.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=06ebb69070b8

If there really aren't any leaking ScriptSourceObjects, then that means we aren't calling finalizers some of the time, despoite finalizing the object.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 6

•

8 years ago

Better version of the last try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=37c1797bd941

Jim Blandy :jimb

Comment 7

•

8 years ago

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #5)
> If there really aren't any leaking ScriptSourceObjects, then that means we
> aren't calling finalizers some of the time, despoite finalizing the object.

... we don't try to run those finalizers off-main-thread, do we?

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 8

•

8 years ago

Er -- actual better version of that try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c188660d113f

/me sighs

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 9

•

8 years ago

Ok! Progress! (Finally!)

For some reason, I can't find the leaking ScriptSourceObjects after the final runtime GC -- I think we force-free the zones? However, I have a set of SSOs found via iterating all cells from all zones *before* the final GC. This set of SSOs are only *potentially* leaking, because they can get finalized in the GC that is about to happen. Well, I'm also logging SSO finalizations, so I can remove from this set all the SSOs that did get finalized and what I am left with is the set of SSOs that did not get their finalizers run and presumably leaked past the final DESTROY_RUNTIME gc.

I counted 9 SSOs in this leaking set. That matches the 9 ScriptSources I counted in the mozilla::LinkedList<ScriptSource> on the JSRuntime that I added for debugging. Pretty strong evidence that these are indeed the ScriptSourceObjects that are leaking and whose finalizers are not being run, resulting in leaking ScriptSources as well!

Will attach their logged retaining paths in one second.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 10

•

8 years ago

Attached file logged retaining paths of SSOs known to be leaking — Details

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 11

•

8 years ago

* 0x1132763c0 -- The only thing that really stands out to me about these paths are the `aLambda` edge. I DXR'd and the only (non irrelevant test) uses in tree are lazy module import getters: https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=aLambda&redirect=true

* 0x113276a00 -- The `scheduleWalkerLoop` and `schedulePromise` edges point to toolkit/modules/Promise-backend.js which IIRC is the implementation Promise.jsm. This file uses `defineLazyServiceGetter` which uses the `XPCOMUtils.defineLazyGetter` hinted at by the previous leaking SSO. https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/toolkit/modules/Promise-backend.js#56 Promise.jsm is used in the devtools server/transport which Marionette builds on top of. Perhaps this is why we only see this failure in marionette tests?

* 0x113276a40 -- Seeing `resolve` and `walkerLoop` edges. 99% sure this is more Promise.jsm stuff.

The others all seem to be self-hosting related, which makes sense in that if we are leaking any self hosted functions (which leaking Promise.jsm stuff is sure to do since there are arrays and bound functions) then we end up leaking the self hosted sources too.

I think we may need an "abort/cancel all promises for abrubt shutdown" function on Promise.jsm or something so we can break these retaining paths and delete the lazy module loaders.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 12

•

8 years ago

Attached file leaking promises and dom exceptions — Details

mAllocationStack: this is a DOM promise holding a SavedFrame stack alive

mStack: this is a DOMException holding a SavedFrame stack alive

The weakmap entry values: not 100% sure where these are coming from yet

Newest theory: the DOMExceptions and Promises are held by some runnable in Gecko's event queue, which are never going to run because we are about to restart. Perhaps the nsIAppStartup.quit method needs to clear the event queue?

It owuld really help if I could get CC logs out of marionette, but it seems to spawn firefox with a fresh environment :-/ I think it is time to hack up the test runner...

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 13

•

8 years ago

Attached file cc-edges.29089-8.log — Details

Modified mozrunner to set up the CC logging environment variables that wouldn't get passed through marionette's test runner.

Here is the CC's path to the first root resulting in a leaking ScriptSourceObject:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_promiseToBlocker]--> 0x125423e80 [JS Object (Map)]
>     --[value]--> 0x125440520 [JS Object (Object)]
>     --[unboxed_object]--> 0x125477ab0 [JS Object (Function - Barrier/this.client.addBlocker/blocker.getOrigin)]
>     --[fun_environment]--> 0x125845800 [JS Object (Call)]
>     --[topFrame]--> 0x125475370 [JS Object (XPCWrappedNative_NoHelper)]
>     --[js::GetObjectPrivate(obj)]--> 0x121e41ac0 [XPCWrappedNative]
>     --[mIdentity]--> 0x121ffbe80 [JSStackFrame]
>     --[mStack]--> 0x125476400 [JS Object (SavedFrame)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

Here is the next:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_conditionToPromise]--> 0x125423e50 [JS Object (Map)]
>     --[value]--> 0x1254719a0 [JS Object (Proxy)]
>     --[private]--> 0x125e31d80 [JS Object (Object)]
>     --[{private:internals:14}]--> 0x125e5f040 [JS Object (Object)]
>     --[handlers]--> 0x125e6f040 [JS Object (Array)]
>     --[objectElements[0]]--> 0x125e74a40 [JS Object (Object)]
>     --[onResolve]--> 0x1254719e0 [JS Object (Proxy)]
>     --[private]--> 0x125477bf0 [JS Object (Function - PromiseSet.prototype.add/<)]
>     --[fun_environment]--> 0x125e371c0 [JS Object (Call)]
>     --[indirection]--> 0x1254719c0 [JS Object (Proxy)]
>     --[private]--> 0x125474560 [JS Object (Object)]
>     --[resolve]--> 0x125477b50 [JS Object (Function)]
>     --[nativeReserved[0]]--> 0x1254753a0 [JS Object (Promise)]
>     --[UnwrapDOMObject(obj)]--> 0x124e79380 [Promise]
>     --[mAllocationStack]--> 0x1254765e0 [JS Object (SavedFrame)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

And the last one:

> 0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)]
>     --[mJSObj]--> 0x1254741a0 [JS Object (Object)]
>     --[_moduleBarrier]--> 0x125471300 [JS Object (Proxy)]
>     --[private]--> 0x125474060 [JS Object (Object)]
>     --[_conditionToPromise]--> 0x125423e50 [JS Object (Map)]
>     --[key]--> 0x125471880 [JS Object (Proxy)]
>     --[private]--> 0x125477880 [JS Object (Function - Sanitizer.onStartup</<)]
>     --[fun_environment]--> 0x125e37220 [JS Object (Call)]
>     --[callee_slot]--> 0x125e27a40 [JS Object (Function - Sanitizer.onStartup<)]
>     --[fun_environment]--> 0x125464490 [JS Object (Block)]
> 
>     Root 0x121e83290 is a ref counted object with 1 unknown edge(s).
>     known edges:
>        0x121e83290 [nsXPCWrappedJS (nsIAsyncShutdownBarrier)] --[self]--> 0x121e83290

------------------------------------------------------------------------------------------------

These three objects, which are rooted in the GC by the CC and "live" past the JSRuntime's final GC, are keeping alive about six ScriptSourceObjects, which then leak 1-6 ScriptSources permanently since the SSO's finalizers don't run.

Phew!

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 14

•

8 years ago

Marco, Yoric:

The nsIAsyncShutdownBarrier added in bug 1244650 (commit dae4f2a15d70) is holding JSObjects alive during the JSRuntime's final GC, which results in the whole set of other JSObjects reachable from that original leaking set to leak, which prevents the finalizers of those things from running, which ends up leaking ScriptSource instances and their associated source text. See comment 13.

https://dxr.mozilla.org/mozilla-central/rev/fc15477ce628599519cb0055f52cc195d640dc94/toolkit/components/places/Shutdown.h#111

Is there any way we can (a) not hold any JSObjects in this nsIAsyncShutdownBarrier at all, or (b) clear them out at some point before JS_DestroyRuntime gets called?

This is blocking me right now, so your help is very appreciated. Thanks!

Flags: needinfo?(mak77)

Flags: needinfo?(dteller)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

Blocks: 1244650

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 15

•

8 years ago

I assume marionette here is just opening/closing Firefox and thus all the shutdown topics are properly notified.
That is strange, we null out both mParentClient and mBarrier at profile-before-change (cause the blocker is spinning over it)... on the other side those are nsMainThreadPtrHandle and, if we are off the main-thread, those could be released "at a later time", that means the runnable freeing those could be scheduled later than we expect...
I don't think setting those to null earlier (like at the end of BlockShutdown) would help, would it? We would still enqueue the proxy release runnable...
We need both the parent and the barrier to be able to properly block shutdown, and since we can be invoked off the main thread, we need to proxy release.

I really don't have good ideas atm on how to better solve this in a thread-safe manner...

Flags: needinfo?(mak77)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 16

•

8 years ago

At which point in shutdown does the final GC happen? could we spin the events loop before doing it to be sure all proxy releases are done?

Andrew Sutherland [:asuth] (he/him)

Comment 17

•

8 years ago

I wonder if there could be idiom problems with event loop spinning now that we have the microtask queue?  nsThread::ProcessNextEvent on the main thread will always end up invoking CycleCollectedJSRuntime::AfterProcessTask which performs a microtask checkpoint even if no event was processed.  If the micro-task queue enqueues an event, this can end up faking out things like NS_ProcessPendingEvents() which breaks as soon as !processedEvent.  (And ShutdownXPCOM does use NS_ProcessPendingEvents).

If this is happening, then modifying the NS_ProcessPendingEvents loop's clause:
    if (NS_FAILED(rv) || !processedEvent) {
      break;
    }
to
    if (NS_FAILED(rv) || (!processedEvent && !hasPendingEvents(aThread)) {
      break;
    }
should fix the problem.

At first glance it's not clear how to get the event loop into this state since the microtask queue always runs to completion and it should run after every event.  But it wouldn't surprise me if given the way native event loops invert event processing that it could happen somehow.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 18

•

8 years ago

Marionette is using nsIAppStartup.quit() with the eRestart flag set in the failing test case.

It would make sense to me to simply empty the event and micro task queues in this situation, but I don't know the full repercussions this would have on gecko code.

The GC happens under ~JSRuntime, which is under JS_DestroyRuntime, which is under ~CycleCollectedJSRuntime[0]. That presumably runs when the CC determines that there are no more references to the CycleCollectedJSRuntime instance. Perhaps the nsIAsyncShutdownBarrier should hold a reference to the CycleCollectedJSRuntime until it finishes its work?

[0] https://dxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#471

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 19

•

8 years ago

(In reply to Andrew Sutherland [:asuth] from comment #17)
> If this is happening, then modifying the NS_ProcessPendingEvents loop's
> clause:
>     if (NS_FAILED(rv) || !processedEvent) {
>       break;
>     }
> to
>     if (NS_FAILED(rv) || (!processedEvent && !hasPendingEvents(aThread)) {
>       break;
>     }
> should fix the problem.

This change had no effect on the leaks, unfortunately.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 20

•

8 years ago

For posterity:

Originally, I wasn't able to reproduce this locally, only on my loaner. Using my local builds on the loaner wouldn't reproduce either.

I was able to start reproducing the failure locally with this change:

--- a/testing/marionette/driver.js
+++ b/testing/marionette/driver.js
@@ -2753,6 +2753,12 @@ GeckoDriver.prototype.quitApplication = function(cmd, resp) {
   resp.send();
 
   this.sessionTearDown();
+
+  try {
+    let promiseJsm = Cu.import("resource://gre/modules/Promise.jsm", {});
+    promiseJsm.PromiseWalker = null;
+  } catch (_) { }
+
   Services.startup.quit(flags);
 };

Clearly this change will break various functionality, but we shouldn't end up leaking either way.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 21

•

8 years ago

Ok, I think the Promise related issues are actually a /different/ bug of the same class.

Here is what I am getting as the CC's paths to the GC's roots keeping ScriptSourceObjects alive past the final GC and therefore leaking the ScriptSource objects on my loaner machine without the Promise.jsm changes:

A bunch of SSOs kept alive from this object:

0x127794800 [DOMEventTargetHelper ]
    --[mListenerManager]--> 0x126fc5fb0 [EventListenerManager]
    --[mListeners event=onerror listenerType=1 [i]]--> 0x1253a0670 [JSEventHandler handlerName=onerror]
    --[mTypedHandler.Ptr()]--> 0x1254855c0 [CallbackObject]
    --[mCreationStack]--> 0x12a500460 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a5002e0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500280 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500220 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a5001c0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x12a500160 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c4160 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c4100 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x1266c40a0 [JS Object (SavedFrame)]
    --[**UNKNOWN SLOT 5**]--> 0x126398fa0 [JS Object (SavedFrame)]
    --[global]--> 0x114d45d80 [JS Object (BackstagePass)]

    Root 0x127794800 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x128ed7a60 [JS Object (XMLHttpRequest)] --[UnwrapDOMObject(obj)]--> 0x127794800

A handful more SSOs kept alive from this one:

0x127794800 [DOMEventTargetHelper ]
    --[mListenerManager]--> 0x126fc5fb0 [EventListenerManager]
    --[mListeners event=onerror listenerType=1 [i]]--> 0x1253a0670 [JSEventHandler handlerName=onerror]
    --[mTypedHandler.Ptr()]--> 0x1254855c0 [CallbackObject]
    --[mCallback]--> 0x12a55ab00 [JS Object (Function - fetchRegionDefault/</request.onerror)]

    Root 0x127794800 is a ref counted object with 1 unknown edge(s).
    known edges:
       0x128ed7a60 [JS Object (XMLHttpRequest)] --[UnwrapDOMObject(obj)]--> 0x127794800

And those two GC roots are keeping alive all the leaking SSOs.

David Teller [:Yoric] - still alive but not very active

Comment 22

•

8 years ago

(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #18)
> The GC happens under ~JSRuntime, which is under JS_DestroyRuntime, which is
> under ~CycleCollectedJSRuntime[0]. That presumably runs when the CC
> determines that there are no more references to the CycleCollectedJSRuntime
> instance. Perhaps the nsIAsyncShutdownBarrier should hold a reference to the
> CycleCollectedJSRuntime until it finishes its work?

I'm still trying to understand the issue. The implementation of nsIAsyncShutdownBarrier is in JS, so how can the JSRuntime be taken down while we're still running JS code?

Also, I'm almost sure that we still hold instances of DOM Promise, which themselves require the CycleCollectedJSRuntime.

Flags: needinfo?(dteller) → needinfo?(nfitzgerald)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 23

•

8 years ago

Attached patch Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault` (obsolete) — Details — Splinter Review

When we didn't remove these handlers, we ended up leaking a bunch of stuff past
shutdown. We should really fix XHR to not hold GC roots past shutdown and
JS_DestroyRuntime, but this patch unblocks my other work.

Attachment #8747785 - Flags: review?(MattN+bmo)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 24

•

8 years ago

Attached patch Clear JS holders in ~CycleCollectedJSRuntime; r=? (obsolete) — Details — Splinter Review

:smaug says this may be the more general fix. We'll find out soon...

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e64ff886f0e1

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 25

•

8 years ago

Attached patch Clear JS holders in ~CycleCollectedJSRuntime; r=? — Details — Splinter Review

Fix build.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fa0d34f2efc

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

Attachment #8747794 - Attachment is obsolete: true

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

Flags: needinfo?(nfitzgerald)

Matthew N. [:MattN]

Comment 26

•

8 years ago

Comment on attachment 8747785 [details] [diff] [review]
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault`

Review of attachment 8747785 [details] [diff] [review]:
-----------------------------------------------------------------

This seems to be paving over a platform problem since most code in the tree doesn't null out its listeners therefore this is just going to postpone the problem to some other listener. i.e. I think we shouldn't need to null out the listeners as a general pattern so changing this one consumer to do so just adds confusion to readers about why we don't do this elsewhere.

I haven't reviewed search stuff in a long time so I'll defer to Florian who is more of an owner of this code since I'm not a fan of this one-off workaround. At a minimum I think `cleanUpAndResolve` should have a comment explaining why we're doing this but don't for other browser XHR consumers.

::: toolkit/components/search/nsSearchService.js
@@ +699,3 @@
>      // This notification is just for tests...
>      Services.obs.notifyObservers(request, SEARCH_SERVICE_TOPIC, "geoip-lookup-xhr-starting");
>      request.timeout = 100000; // 100 seconds as the last-chance fallback

Did you try lowering this timeout to 1 second to see if it also fixes the problem? Maybe the XHR is still waiting for a response by the time the test ends since automation points this to a fake hostname. Or maybe you're right that the XHR would hold the reference regardless of the XHR state.

Attachment #8747785 - Flags: review?(MattN+bmo) → review?(florian)

Florian Quèze [:florian]

Comment 27

•

8 years ago

Comment on attachment 8747785 [details] [diff] [review]
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault`

I agree with everything Matt said in comment 26: this seems to be a workaround for a platform issue, and it'll be confusing to anybody looking at this code, so either a better solution should be found on the platform side, or a very descriptive comment should be added to avoid confusion.

Attachment #8747785 - Flags: review?(florian) → review-

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 28

•

8 years ago

(In reply to Matthew N. [:MattN] (behind on reviews) from comment #26)
> Comment on attachment 8747785 [details] [diff] [review]
> Remove event handlers from `request` in `fetchCountryCode` and
> `fetchRegionDefault`
> 
> Review of attachment 8747785 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This seems to be paving over a platform problem

True and fair. Fortunately :smaug's suggestion worked!

> I think we shouldn't need to null out the listeners as a general pattern

You won't leak at the platform level (after this patch, at the very least), but you may still end up holding lots of objects alive much longer than you might have expected that could have otherwise been collected. Garbage collectors don't make memory management go away, they just make it so when you mess up you end up bloating memory rather than creating critical security bugs.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 29

•

8 years ago

Comment on attachment 8747827 [details] [diff] [review]
Clear JS holders in ~CycleCollectedJSRuntime; r=?

:smaug, this patch worked! Do you want to review this, or would you prefer to bounce it to another reviewer since you pretty much authored this patch?

Attachment #8747827 - Flags: review?(bugs)

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Attachment #8747827 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 30

•

8 years ago

Thanks Andrew (I would have forwarded the request to you anyhow)

Pulsebot

Comment 31

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b343790604a9

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 32

•

8 years ago

Thanks!

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 33

•

8 years ago

I had to back this out for xpcshell failures: 
https://treeherder.mozilla.org/logviewer.html#?job_id=27144694&repo=mozilla-inbound

https://hg.mozilla.org/integration/mozilla-inbound/rev/104d808770ee

Flags: needinfo?(nfitzgerald)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 34

•

8 years ago

Need to null check the JS class map here, since that map goes away in ~XPCJSRuntime which happens before ~CycleCollectedJSRuntime which this XPCWrappedJSClass is being GC'd and destroyed under:

https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedJSClass.cpp#172

Flags: needinfo?(nfitzgerald)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 35

•

8 years ago

Attached patch Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass — Details — Splinter Review

Going to keep mole-whacking until I get a green try push...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d6685a96be6

Attachment #8748330 - Flags: review?(continuation)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

Attachment #8747785 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Comment 36

•

8 years ago

Comment on attachment 8748330 [details] [diff] [review]
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass

This looks fine to me, but I'm not too sure what I know about XPConnect shut down. Maybe Blake could look at it.

Attachment #8748330 - Flags: review?(continuation) → review?(mrbkap)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 37

•

8 years ago

Here is a try push with only this bug's patches, as the next ones in the series seem to have suffered from a bad rebase:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=db7454e8c214

Blake Kaplan (:mrbkap) (inactive)

Comment 38

•

8 years ago

Comment on attachment 8748330 [details] [diff] [review]
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass

Review of attachment 8748330 [details] [diff] [review]:
-----------------------------------------------------------------

This seems a bit wonky to me in that it seems like we're wallpapering over a leak. That being said, after chatting on IRC with you and mccr8, I'm convinced that this isn't too bad of a wallpaper and it does make sense to do it.

Attachment #8748330 - Flags: review?(mrbkap) → review+

Andrew McCreight [:mccr8]

Comment 39

•

8 years ago

I think the ClearJSHolders is just changing us from "destroy all GC things, which clears all JS holders" into "clear all JS holders, then destroy all GC things", so we're just doing the same thing slightly earlier. It certainly could be hiding leaks but it is preserving the status quo at least.

Pulsebot

Comment 40

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/07b0959c9c35
https://hg.mozilla.org/integration/mozilla-inbound/rev/bbc385c27e0e

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 41

•

8 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/b37420e92c68 for intermittent-yet-frequent xpcshell failures like https://treeherder.mozilla.org/logviewer.html#?job_id=27234831&repo=mozilla-inbound

Flags: needinfo?(nfitzgerald)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

Blocks: 1268992

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

8 years ago

No longer blocks: 1211723

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 42

•

8 years ago

Going to just continue leaking the script sources in bug 1211723, which avoids the UAF pitfalls that would otherwise happen.

Assignee: nfitzgerald → nobody

Status: ASSIGNED → NEW

Component: JavaScript Engine → XPCOM

Flags: needinfo?(nfitzgerald)

Summary: `ScriptSource`s outlive their JSRuntime in marionette tests → XHR with unknown edge in CC graph keeps its reflector alive past ~CycleCollectedJSRuntime, which transitively leaks js::ScriptSource objects

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Andrew McCreight [:mccr8]

Updated

•

1 year ago

Component: XPCOM → Cycle Collector

stack when we usually run js::ScriptSourceObject::finalize 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.62 KB, text/plain		Details
logged retaining paths of SSOs known to be leaking 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 48.34 KB, text/plain		Details
leaking promises and dom exceptions 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 17.32 KB, text/plain		Details
cc-edges.29089-8.log 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 386.94 KB, text/x-log		Details
Remove event handlers from `request` in `fetchCountryCode` and `fetchRegionDefault` 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.17 KB, patch	florian : review-	Details \| Diff \| Splinter Review
Clear JS holders in ~CycleCollectedJSRuntime; r=? 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 1.10 KB, patch		Details \| Diff \| Splinter Review
Clear JS holders in ~CycleCollectedJSRuntime; r=? 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 4.07 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
Null check GetWrappedJSClassMap in ~nsXPCWrappedJSClass 8 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 1012 bytes, patch	mrbkap : review+	Details \| Diff \| Splinter Review