Closed Bug 633016 Opened 13 years ago Closed 13 years ago

Memory corruption on scriptObject->getGlobal()

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows Server 2003
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: sfink, Assigned: sfink)

Details

(Whiteboard: [hardblocker][has patch][fixed-in-tracemonkey])

Attachments

(2 files, 3 obsolete files)

I thought this was part of bug 631742 since I always observed it when enumerating JSDScripts, but it turns out this is a separate and larger issue.

The symptom is crashing in JSObject::getParent() because the 'this' object is corrupt. The stack includes a cross-compartment call using a JSScript* to set the compartment.

This has started happening fairly recently for me when running the Firebug test suite, and happens often and reliably.
I believe I have the root cause. Imagine these actions:

1. EvalKernel creates a new script object
2. JSD observes the script creation and constructs a JSDScript wrapper for it. Note that script->u.object == NULL because it's an eval-created script.
3. EvalKernel is done with the script and sets script->u.nextToGC. JSD does not get any sort of notification, as it would have had the script been murdered during GC.
4. JSD enumerates all of its JSDScripts and constructs a jsdScript object for each one (jsdScript::FromPtr).
5. To create this jsdScript object, it must do a cross-compartment call (for JSD_GetClosestPC which calls JS_LineNumberToPC).
6. The cross-compartment call uses the JSScript* object to set the scope chain via JS_EnterCrossCompartmentCallScript().
7. The latter function checks whether the script already has an owning object it can use as the scope. If not, it constructs a dummy global object in the appropriate compartment and uses that.

In our case, script->u.object is not NULL -- because script->u.nextToGC is set, and the space is shared. So it ends up using that pointer for its scope, and when it calls getGlobal() on it, it may or may not fall on its face depending on the current state of that object.

Note that in step 5, the cross-compartment call probably isn't strictly necessary, since js_LineNumberToPC doesn't really need to run in a matching compartment. But this still seems very brittle. What's happening is that JSD is allowed to observe JSScript objects when they are created, but assumes that it will be informed when they are destroyed. In the EvalKernel case, this isn't done.

I don't know if any/all of the other stuff in DestroyScript should be executed in this case.

Just to confirm, if I comment out the target->u.object check in JS_EnterCrossCompartmentCallScript (so it unconditionally creates a dummy global object) then my crash goes away.
blocking2.0: --- → ?
Assignee: nobody → general
Component: JavaScript Debugging APIs → JavaScript Engine
QA Contact: jsd → general
The cross-compartment call in jsd_xpc.cpp, as well as the JS_EnterCrossCompartmentCallScript() JSAPI entry, were added in changeset 6539f1fcda72 for bug 617870 and bug 609141. (On Dec 14 2010; not sure why they only just started getting triggered for me. FBTest change, maybe?)
(In reply to comment #2)
> The cross-compartment call in jsd_xpc.cpp, as well as the
> JS_EnterCrossCompartmentCallScript() JSAPI entry, were added in changeset
> 6539f1fcda72 for bug 617870 and bug 609141. (On Dec 14 2010; not sure why they
> only just started getting triggered for me. FBTest change, maybe?)

Jan 18 we added FBTest for jsd.asyncOn() when we discovered it failed in some cases. We filed Bug 626830 which landed on Feb 8 at 5pm. Every nightly build since has crashed. I guess that your fix on Bug 626830 revealed this one.
Assignee: general → sphink
Fixes the observed problem by invoking create/destroy script hooks when eval-cached scripts are reused/returned to the cache.

This patch also clears traps on eval-cached scripts when done with them.
Attachment #511254 - Flags: review?(lw)
Comment on attachment 511254 [details] [diff] [review]
Give cached scripts a lifetime from the point of view of JSD

Nice detectiving!  Only nits:

>     if (evalType == DIRECT_EVAL && caller->isFunctionFrame() && !caller->isEvalFrame())
>         script = EvalCacheLookup(cx, linearStr, caller, staticLevel, principals, scopeobj, bucket);
> 
>+    if (script)
>+        js_CallNewScriptHook(cx, script, NULL);
>+
>     /*
>      * We can't have a callerFrame (down in js::Execute's terms) if we're in
>      * global code (or if we're an indirect eval).
>      */
>     JSStackFrame *callerFrame = (staticLevel != 0) ? caller : NULL;
>     if (!script) {

Perhaps, to lexically highlight the dichotomy, you could make the "if (!script)" be the 'else' branch of the 'if (script)' you added (and move that callerFrame comment into the else branch).

Second, could you precede said 'if' with a comment, something to the effect of:

"Although, from the JS engine perspective, a script in the eval cache is alive, from a jsdbgapi user's perspective, each execution of eval() creates and destroys a script. In addition to hiding implementation details, this prevents jsdbgapi clients from calling JS_GetScriptObject on a scripts in the eval cache, which is invalid since script->u.object aliases script->u.nextToGC."

followed by a MUST_FLOW_THROUGH("destroy") (with MUST_FLOW_LABEL("destroy") right before the call to js_CallDestroyScriptHook).
Attachment #511254 - Flags: review?(lw) → review+
blocking2.0: ? → final+
Whiteboard: [hardblocker][has patch]
Comment on attachment 511254 [details] [diff] [review]
Give cached scripts a lifetime from the point of view of JSD

>     JSScript *script = NULL;
>     JSScript **bucket = EvalCacheHash(cx, linearStr);
>     if (evalType == DIRECT_EVAL && caller->isFunctionFrame() && !caller->isEvalFrame())
>         script = EvalCacheLookup(cx, linearStr, caller, staticLevel, principals, scopeobj, bucket);
> 
>+    if (script)
>+        js_CallNewScriptHook(cx, script, NULL);

Please put the +'ed lines in a braced then-block for the first if shown above, since only in that if's consequent could script become non-null.

/be
(In reply to comment #5)
> Comment on attachment 511254 [details] [diff] [review]
> Give cached scripts a lifetime from the point of view of JSD
> 
> Nice detectiving!  Only nits:
> 
> >     if (evalType == DIRECT_EVAL && caller->isFunctionFrame() && !caller->isEvalFrame())
> >         script = EvalCacheLookup(cx, linearStr, caller, staticLevel, principals, scopeobj, bucket);
> > 
> >+    if (script)
> >+        js_CallNewScriptHook(cx, script, NULL);
> >+
> >     /*
> >      * We can't have a callerFrame (down in js::Execute's terms) if we're in
> >      * global code (or if we're an indirect eval).
> >      */
> >     JSStackFrame *callerFrame = (staticLevel != 0) ? caller : NULL;
> >     if (!script) {
> 
> Perhaps, to lexically highlight the dichotomy, you could make the "if
> (!script)" be the 'else' branch of the 'if (script)' you added (and move that
> callerFrame comment into the else branch).

See comment 6. The "if (!script)" after callerFrame's decl cannot be moved up, since callerFrame is needed both in that if's consequent, and further in a successor block. But the if (script) I mentioned should go into the if (evalType == DIRECT_EVAL && ...) consequent, since otherwise script is null.

/be
OS: Linux → Windows Server 2003
This is the patch I intend to land when I'll be awake long enough to watch the tree. It just addresses the review comments, no need to re-review.

I mainly just wanted to point out to jjb that this is a behavior change that you might want to be aware of: previously, it looks like eval'd code would notify you of a script created the first time that text was eval'd, and probabilistically thereafter. (I think the cache gets flushed on GC or something.) Each create/destroy pair thus corresponded to at least one but possibly multiple eval calls. After this patch, you should see exactly one create/destroy pair for every eval.

(The cache key is actually more than just the text of the script; it incorporates various other bits of information that are required to be the same in order to reuse the same script object.)

It probably doesn't matter much most of the time, but I know you've worked a lot at digging into the exact meaning of the events that the JS engine throws at you, and this might explain some weirdness you've seen. Or not.

I'm sure someone else will correct the inevitable inaccuracies in my description above...
When eyeballing this patch for the last time before I landed it, it looked wrong. It appears that it would double-call js_CallDestroyScriptHook for any eval-cached script -- once when put onto the nextToGC list, once when GC trashed it.

Am I confusing myself, or is this one better? (I think I remember you explicitly saying that I'd need to avoid calling the destroy hooks.)
Attachment #511254 - Attachment is obsolete: true
Attachment #511309 - Attachment is obsolete: true
Attachment #511438 - Flags: review?(lw)
Comment on attachment 511438 [details] [diff] [review]
Give cached scripts a lifetime from the point of view of JSD

Oh gosh, yeah, we talked about it but I glossed right over it; good catch.
Attachment #511438 - Flags: review?(lw) → review+
http://hg.mozilla.org/tracemonkey/rev/1b55728f51ad
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch][fixed-in-tracemonkey]
I think I know a (hopefully the) leak: the js_CallDestroyScriptHook call was removed from js_DestroyScriptFromGC assuming it was only called from js_DestroyScriptsToGC (which are scripts we've already called the hook on), but this function is also called from fun_finalize and script_finalize.

So the fix seems to be to switch fun_finalize/script_finalize to use js_DestroyScript.  This also makes me think that js_DestroyScriptFromGC is a bad name.  Perhaps we should name it js_DestroyScriptInEvalCache instead so there are no mistakes like this in the future?
Another note: you'll want s/destroy:/MUST_FLOW_LABEL(destroy)/ to avoid warnings.
(In reply to comment #14)
> Another note: you'll want s/destroy:/MUST_FLOW_LABEL(destroy)/ to avoid
> warnings.

Hm. That's a new one to me. It seems to be only used in one place currently (jstracer.cpp). Ok, will do.

(In reply to comment #13)
> I think I know a (hopefully the) leak: the js_CallDestroyScriptHook call was
> removed from js_DestroyScriptFromGC assuming it was only called from
> js_DestroyScriptsToGC (which are scripts we've already called the hook on), but
> this function is also called from fun_finalize and script_finalize.
> 
> So the fix seems to be to switch fun_finalize/script_finalize to use
> js_DestroyScript.  This also makes me think that js_DestroyScriptFromGC is a
> bad name.  Perhaps we should name it js_DestroyScriptInEvalCache instead so
> there are no mistakes like this in the future?

I really thought I had grepped for other occurrences, but I guess not.

I've made this change. (I called it js_DestroyCachedScript since it could be used by any hypothetical script caching mechanism, not just the eval cache.) Unfortunately, that's not the problem. Oddly, it appears to be excess calls to js_Call*Destroy*ScriptHook that cause the leak.

Oh well, shouldn't be too hard to track down.
Yow! Uh, timeless, I may need some help here...

I wasn't imagining things. The leak is induced by extra js_CallDestroyScriptHook invocations. This is because it invokes the JSDScript cleanup handler (jsd_DestroyScriptHookProc), which first checks whether it has a jsdScript cleanup handler (jsds_ScriptHookProc, stored in JSDContext.scriptHook) and invokes it if so, then nukes the JSDScript. The jsdScript cleanup handler destroys the jsdScript; it is the one that is not getting invoked and hence triggering our leak. It does not get invoked because it is not set (jsdc->scriptHook is NULL at the time when the leaked scripts are destroyed.)

The JSDScript is the only thing that contains a reference to the jsdScript, so there is no other way to clean it up. (But it's just stashed in a private data field, so the JSDScript doesn't know to clean it up directly.)

If the extra js_CallDestroyScriptHooks are *not* invoked, then some stale JSDScripts will hang around until the owning JSDContext is destroyed, at which time they will be iterated through and properly destroyed (as in, their private jsdScripts will be nuked too.)

What I'm puzzled about is why JSDContext.scriptHook is ever unset, since there are lots of ways of creating jsdScript interfaces to the JSDScripts, and scriptHook is the only way to get rid of them. (jsdScript::FromPtr is called when looking at a script for a stack frame or a value.)

On the other hand, it's unclear when JSDContext.scriptHook is supposed to be set. It can only be set via jsdService::SetScriptHook, which is never invoked from jsd code. That means it'll only get called if the debugger client calls it through the IDL. Which means that JSD will always leak memory unless the debugger client calls 'jsd.scriptHook = function () {}'.

Then again, all real debuggers probably do set the script hook, and it's only my test case that doesn't. (It does the minimum necessary to set a few breakpoints and step through some code.) So perhaps the memory leak isn't really a hardblocking issue.

I have a patch that seems to address all issues. With it, I am not seeing any leaks. I'll split it up into 3 parts, though, because it's reasonable to take any of them independently and the reviews for them are rather different.
Add a new js_DestroyCachedScript for use in the EvalKernel case, to leave the other js_DestroyScriptFromGC as they were. This doesn't seem to have any effect on leaking, but is obviously more correct.
Attachment #511657 - Flags: review?(lw)
On second thought, I split off one of the pieces into bug 633459, so now there are only two pieces here.
Attachment #511438 - Attachment is obsolete: true
Force the jsds_ScriptHookProc to be installed for every JSDContext so that script lifetimes can be tracked. This resolves the memory leak introduced/exposed by the other patch. (It's fine if the debugger client sets their own script hook; the logic is already there to handle that properly.)
Attachment #511663 - Flags: review?(timeless)
Attachment #511657 - Flags: review?(lw) → review+
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/1b55728f51ad
http://hg.mozilla.org/mozilla-central/rev/0c7b2e76db07 (backout)
Note: not marking as fixed because last changeset is a backout.
Comment on attachment 511663 [details] [diff] [review]
Set the script creation callback to manage jsdScripts

I've been recommended to switch to brendan for this review given timeless's timezone and the desire to have this make it into beta12.
Attachment #511663 - Flags: review?(timeless) → review?(brendan)
Comment on attachment 511663 [details] [diff] [review]
Set the script creation callback to manage jsdScripts

r=me, lets keep a review up for timeless to double check and we can land pending that due to scheduling pressure
Attachment #511663 - Flags: review?(timeless)
Attachment #511663 - Flags: review?(brendan)
Attachment #511663 - Flags: review+
Sorry, multitasking. r=me, please land, timeless will review later
http://hg.mozilla.org/tracemonkey/rev/ced73d81ce18
http://hg.mozilla.org/tracemonkey/rev/51e2af8b2698

timeless, let me know if I need to backout.
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch][fixed-in-tracemonkey]
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #511663 - Flags: review?(timeless)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: