Closed Bug 402087 Opened 17 years ago Closed 17 years ago

Setting GC-Zeal before JS_CompileScript() causes null-deref (obj->map == 0x0) in JSOP_DEFFUN

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: wes, Assigned: igor)

Details

(Keywords: fixed1.8.0.15, fixed1.8.1.12, testcase, Whiteboard: [sg:critical?])

Attachments

(5 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.8) Gecko/20071008 Firefox/2.0.0.8
Build Identifier: Spidermonkey Post 1.8 -- CVS-most-recent on Friday, Oct 26 2007

If I set GC-Zeal in my embedding before calling JS_CompileScript(), many strange things happen. Sometimes, things will work perfectly. Other times, I will get nonsensical assertions in JS_ExecuteScript(), lose my C debugging symbols while walking through the code with gdb, etc.

If I unset GC-Zeal, or use the debugger to not allow js_GC to run, the program behaves as expected.  The good news is, the behaviour is quite deterministic. A particular embedding/JS-code combination tends to yield the same behaviour consistently.

I have reduced the input JavaScript and C embedding to the smallest forms (nearly trivial) which demonstrates this bug reliably on my platform (Solaris 10 zone, dual UltraSPARC-II CPUs, in a zone, distro gcc 3.4.3).

I have not been able to replicate this bug my modifying jc.c or using JS_EvaluateScript. I HAVE been able to replicate it using JS_CompileFileHandle().

I always use gcZeal = 2.

Reproducible: Always

Steps to Reproduce:
1. Compile latest and greatest CVS SpiderMonkey
2. Compile the attached program
3. Observe that it throws a non-sensical assertion
Actual Results:  
Assertion failure: OBJ_GET_CLASS(cx, obj) == &js_FunctionClass, at jsinterp.c:4878


Expected Results:  
"exec ok\n" on stdout

GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
(gdb) run
Starting program: /export/home/wes/cvs/surelynx/src/jseng/go2
warning: Temporarily disabling breakpoints for unloaded shared library "/usr/lib/ld.so.1"
warning: Lowest section in /usr/lib/libpthread.so.1 is .dynamic at 00000074
warning: Lowest section in /usr/lib/libthread.so.1 is .dynamic at 00000074
warning: Lowest section in /usr/lib/libdl.so.1 is .dynamic at 00000094
Assertion failure: OBJ_GET_CLASS(cx, obj) == &js_FunctionClass, at jsinterp.c:4878

Program received signal SIGABRT, Aborted.
0xfeec11e4 in _lwp_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xfeec11e4 in _lwp_kill () from /usr/lib/libc.so.1
#1  0xfee5fde0 in raise () from /usr/lib/libc.so.1
#2  0xfee40000 in abort () from /usr/lib/libc.so.1
#3  0xff288040 in JS_Assert (s=0xff2c2040 "OBJ_GET_CLASS(cx, obj) == &js_FunctionClass", file=0xff2c10c8 "jsinterp.c",
    ln=4878) at jsutil.c:63
#4  0xff1ffe48 in js_Interpret (cx=0x2a7a8, pc=0x328cc "}", result=0xffbff8e4) at jsinterp.c:4878
#5  0xff1e6dcc in js_Execute (cx=0x2a7a8, chain=0xfecc2000, script=0x32890, down=0x0, flags=0, result=0xffbffa20)
    at jsinterp.c:1633
#6  0xff18efd8 in JS_ExecuteScript (cx=0x2a7a8, obj=0xfecc2000, script=0x32890, rval=0xffbffa20) at jsapi.c:4714
#7  0x00010e50 in main (argc=1, argv=0xffbffaa4) at go2.c:37
(gdb)
Can't confirm the bug, but since it is GC-related, I'm CCing Igor.
My embedding (a much larger program than what I attached) stopped exhibiting its 
buggy symptoms this afternoon when I moved JS_SetGCZeal() between Compile and Execute, just like in the example I uploaded (originally it was between JS_InitStandardClasses() and JS_CompileFileHandle()). That's good, it means the work-around appears stable at first blush and I can continue developing.

I also tried a few other things in the debugger to "flush out" the problem -- so far no luck.  I tried putting a LocalRootScope around CompileScript, and I tried manually rooting everything return value from js_NewGCThing() from the debugger.

The technique I used from the debugger for manually rooting is correct, I THINK:

<hr/><pre>Breakpoint 6, js_NewGCThing (cx=0x2a7a8, flags=1, nbytes=8) at jsgc.c:1321
1321        rt = cx->runtime;
(gdb) finish
Run till exit from #0  js_NewGCThing (cx=0x2a7a8, flags=1, nbytes=8) at jsgc.c:1321
0xff28333c in js_NewString (cx=0x2a7a8, chars=0x324e0, length=1) at jsstr.c:2461
2461        str = (JSString *) js_NewGCThing(cx, GCX_STRING, sizeof(JSString));
Value returned is $28 = (void *) 0xfecc3bc8
(gdb) print JS_AddRoot(cx, ( *((int *)malloc(4)) = 0xfecc3bc8))
$29 = 1
(gdb) c
Continuing.<hr/></pre>

.....unfortunately, even doing this the symptoms of the bug did no go away if JS_SetGCZeal() was called before JS_CompileScript().

js_GC() only gets called twice in the sample program, though. Once during JS_CompileScript(), and once during JS_CreateNewScriptObject().

It's called in JS_CompileScript() from Statements->Statement->FunctionStmt->FunctionDef->FunctionBody->Statements->  js_PeekToken(); js_GetToken->js_AtomizeChars->js_AtomizeString->js_NewStringCopyN->js_NewString->js_NewGCThing->js_GC
I believe that Wes' embedding code is correct (I can minimize it a bit; here's the minimized version), and can reproduce this bug on my Mac OS X 10.4 machine.  Here's a slightly smaller version (no NSPR/JS_THREADSAFE stuff; the crash moves but still happens)
Attachment #286987 - Attachment is obsolete: true
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Setting GC-Zeal before JS_CompileScript() causes unstable runtime → Setting GC-Zeal before JS_CompileScript() causes null-deref (obj->map == 0x0) in JSOP_DEFFUN
Crowder and Wes just did a bunch of gdb-ing and figured out what's happening here:

* JS_CompileScript creates a JSFunction for the function "go".  At the time JS_CompileScript returns, this function is unrooted, and it's not an atom, so it's vulnerable to GC regardless of GC_LAST_DITCH's special behavior regarding gcKeepAtoms.

* JS_NewScriptObject allocates a new JSObject by calling js_NewGCThing.  With gczeal, this triggers garbage collection, which collects the function object.

Catch-22.  Without a NewScriptObject, the function is vulnerable to GC.  But calling JS_NewScriptObject triggers GC.

Possible fix:  JS_CreateNewScriptObject could temporarily root the JSScript before calling js_NewObject.  (A JSScript isn't a jsval, so this would require a JSTVU_TRACE TempValueRooter.)
This behavior is not exhibited by js.c, because it does not ever use JS_NewScriptObject() to root a script before executing it.  Execution of a script puts it on a StackFrame which causes it to be traced, thus circumventing this bug.
OS: Other → All
Hardware: Other → All
Version: unspecified → Trunk
More JSParsedObjectBox help needed. Igor, can you patch?

/be
Assignee: general → igor
(In reply to comment #7)
> More JSParsedObjectBox help needed.

No, this is API botch unrelated to JSParsedObjectBox. 

JS_CompileUCScriptForPrincipals and friends returns JSScript *. As the script object contains GC things, it must be rooted in presence of a possible direct or indirect invocations of the GC. Since the JSScript * is not a GC thing, there is no public API to root it. Thus the only thing that an embedding can do is to pass the script into SpiderMonkey API hoping that the API will ensure the rooting of the script argument on its own before the GC has a chance to run. This is exactly what JS_NewScriptObject failed to do. 

As Brian has pointed out, fixing this requires to add a temp value tracer for a script.


But since there is no public API to root the script. 

Attached patch fix v1 (obsolete) — Splinter Review
The fix adds JS_PUSH_TEMP_ROOT_SCRIPT and uses it in JS_NewScriptObject.
Attachment #287180 - Flags: review?(brendan)
Status: NEW → ASSIGNED
Flags: blocking1.9?
Flags: blocking1.8.1.10?
Comment on attachment 287180 [details] [diff] [review]
fix v1

>         if (!JS_SetPrivate(cx, obj, script))

JS_SetPrivate is infallible, and I think we count on that elsewhere. Simplify accordingly here and r/a=me. Thanks for fixing, the ancient API history here indeed is broken. First came JSScript (opaque, held ref-counted atoms), then JS_NewScriptObject later.

/be
Attachment #287180 - Flags: review?(brendan)
Attachment #287180 - Flags: review+
Attachment #287180 - Flags: approval1.9+
Attached patch fix v2Splinter Review
The new version assumes infallible JS_SetPrivate.
Attachment #287180 - Attachment is obsolete: true
Attachment #287336 - Flags: review+
Attachment #287336 - Flags: approval1.9+
I mark the bug as security-sensitive as it is a GC hazard and the browser code uses JS_NewScriptObject.  
Group: security
Comment on attachment 287336 [details] [diff] [review]
fix v2

The patch fixes a GC hazard.
Attachment #287336 - Flags: approvalM9?
Comment on attachment 287336 [details] [diff] [review]
fix v2

a=endgame drivers for M9
Attachment #287336 - Flags: approvalM9? → approvalM9+
I checked in the patch from comment 11 to CVS trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1194346980&maxdate=1194347220&who=igor%mir2.org
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
Clearing blocking 1.9 ? flag as the bug was fixed based on 1.9 approval for the checked-in patch.
Flags: blocking1.9?
Flags: wanted1.8.1.x+
Flags: blocking1.8.1.12?
Flags: blocking1.8.1.12+
Whiteboard: [sg:critical?]
Igor, does the patch in this bug apply to the 1.8 branch? If so, can you request approval? If not, please attach a patch for the 1.8 branch.
Keywords: testcase
This is a version of the fix for 1.8.1 branch. The important part here is changesin jsapi.c. They come from trunk's patch as it applies as-is in jsapi.c. Then there is the extra temp root support for a script. It is a hand-merge with the trunk's version. 

The rest of changes is a synchronization of the way the temp roots are declared on the trunk with 1.8.1 branch. These are safe changes since they are essentially a move of code from one header to another and it is sufficient to compile the code to verify them.
Attachment #294455 - Flags: review?(brendan)
Comment on attachment 294455 [details] [diff] [review]
fix for 1.8 branch

Looks good, thanks.

/be
Attachment #294455 - Flags: review?(brendan) → review+
Attachment #294455 - Flags: approval1.8.1.12?
Comment on attachment 294455 [details] [diff] [review]
fix for 1.8 branch

approved for 1.8.1.12, a=dveditz for release-drivers
Attachment #294455 - Flags: approval1.8.1.12? → approval1.8.1.12+
I checked in the patch from comment 18 to MOZILLA_1_8_BRANCH:

Checking in jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.214.2.39; previous revision: 3.214.2.38
done
Checking in jscntxt.h;
/cvsroot/mozilla/js/src/jscntxt.h,v  <--  jscntxt.h
new revision: 3.80.4.22; previous revision: 3.80.4.21
done
Checking in jsgc.c;
/cvsroot/mozilla/js/src/jsgc.c,v  <--  jsgc.c
new revision: 3.104.2.35; previous revision: 3.104.2.34
done
Checking in jsgc.h;
/cvsroot/mozilla/js/src/jsgc.h,v  <--  jsgc.h
new revision: 3.33.4.12; previous revision: 3.33.4.11
done
Checking in jsprvtd.h;
/cvsroot/mozilla/js/src/jsprvtd.h,v  <--  jsprvtd.h
new revision: 3.17.20.5; previous revision: 3.17.20.4
done
Keywords: fixed1.8.1.12
Wesley, can you verify that this is fixed for you now in branch?
Al, I'm not actually sure how to test this as I don't work with the browser.  Triggering the bug requires the JavaScript garbage collector to run at the exact moment between script compilation and rooting -- so likely between JS_Compile*Script*() and JS_NewScriptObject().  This may occur in JS_Evaluate*Script() as well.

If it is possible to turn on gcZeal in the browser, that would have the side-effect of testing this bug, as JS_NewScriptObject() will itself trigger JS_GC() when rt->gcZeal = 2. However, I do not think that Firefox 2 has a way to do this (gcZeal is a post-JS-1.7 feature) and from what I understand, Firefox 3 will not run with gcZeal turned on due to other GC hazards. (My Firefox 3 info is second-hand, I was chatting with either bz or timeless about it in IRC a couple of days ago).  I suppose it may be possible to build Firefox 2 with -DWAY_TOO_MUCH_GC (?), but I really don't know.

I _can_ confirm with 100% certainty that Igor's november patch fixed the problem from my perspective (a heavy JSAPI user using CVS-bleeding-edge).  I wrote a test case specifically for this bug.
(In reply to comment #23)
> If it is possible to turn on gcZeal in the browser

in about:config javascript.options.gczeal = 2 (or 1, or 0, to disable).  This will make your browser stupendously slow.
> in about:config javascript.options.gczeal = 2 (or 1, or 0, to disable).

On the trunk. On the 1.8 branch you need to test outside the browser with a GC compiler define.
Group: security
Attached patch for 1.8.0Splinter Review
Attachment #310216 - Flags: review?(igor)
Attachment #310216 - Flags: review?(igor) → review+
Comment on attachment 310216 [details] [diff] [review]
for 1.8.0

a=asac for 1.8.0 branch
Attachment #310216 - Flags: approval1.8.0.15+
a=asac for 1.8.0.15
MOZILLA_1_8_0_BRANCH:

Checking in js/src/jsapi.c;
/cvsroot/mozilla/js/src/jsapi.c,v  <--  jsapi.c
new revision: 3.214.2.11.2.16; previous revision: 3.214.2.11.2.15
done
Checking in js/src/jscntxt.h;
/cvsroot/mozilla/js/src/jscntxt.h,v  <--  jscntxt.h
new revision: 3.80.4.2.2.12; previous revision: 3.80.4.2.2.11
done
Checking in js/src/jsgc.c;
/cvsroot/mozilla/js/src/jsgc.c,v  <--  jsgc.c
new revision: 3.104.2.3.2.21; previous revision: 3.104.2.3.2.20
done
Checking in js/src/jsgc.h;
/cvsroot/mozilla/js/src/jsgc.h,v  <--  jsgc.h
new revision: 3.33.12.5; previous revision: 3.33.12.4
done
Checking in js/src/jsprvtd.h;
/cvsroot/mozilla/js/src/jsprvtd.h,v  <--  jsprvtd.h
new revision: 3.17.28.2; previous revision: 3.17.28.1
done
Keywords: fixed1.8.0.15
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: