Closed Bug 546611 Opened 14 years ago Closed 14 years ago

TM: "Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp" or "Assertion failure: isInt32(*p), at ../jstracer.cpp"

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
Tracking Status
blocking2.0 --- betaN+
blocking1.9.2 --- .4+
status1.9.2 --- .4-fixed
blocking1.9.1 --- -
status1.9.1 --- unaffected

People

(Reporter: gkw, Assigned: dvander)

References

Details

(4 keywords, Whiteboard: [sg:critical])

Attachments

(2 files)

Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp:3337

This assert came about from jsfunfuzz but I don't have a reproducible testcase. Filing in case someone can figure out the cause. This occurred in 32-bit js debug shell on TM changeset 7c63a6c5ca78 on 10.6.2.

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   js-dbg-32-tm-darwin                 0x001302bd JS_Assert + 71
1   js-dbg-32-tm-darwin                 0x0015a31d js::TraceRecorder::import(nanojit::LIns*, int, long*, js::TraceType_, char const*, unsigned int, JSStackFrame*) + 243
2   js-dbg-32-tm-darwin                 0x0015abd2 js::TraceRecorder::get(long*) + 374
3   js-dbg-32-tm-darwin                 0x00177036 js::TraceRecorder::setProp(long&, JSPropCacheEntry*, JSScopeProperty*, long&, nanojit::LIns*&) + 796
4   js-dbg-32-tm-darwin                 0x001775f6 js::TraceRecorder::record_SetPropHit(JSPropCacheEntry*, JSScopeProperty*) + 96
5   js-dbg-32-tm-darwin                 0x000b0f24 js_SetPropertyHelper + 2161
6   js-dbg-32-tm-darwin                 0x0008af2b js_Interpret + 85475
7   js-dbg-32-tm-darwin                 0x0009f416 js_Execute + 1401
8   js-dbg-32-tm-darwin                 0x0001198a JS_ExecuteScript + 54
9   js-dbg-32-tm-darwin                 0x0000a650 Process(JSContext*, JSObject*, char*, int) + 458 (js.cpp:446)
10  js-dbg-32-tm-darwin                 0x0000b226 ProcessArgs(JSContext*, JSObject*, char**, int) + 1909 (js.cpp:791)
11  js-dbg-32-tm-darwin                 0x0000b765 main + 953 (js.cpp:4876)
12  js-dbg-32-tm-darwin                 0x0000285d _start + 208
13  js-dbg-32-tm-darwin                 0x0000278c start + 40
Hrm, not good. Is it happening more than once?
(In reply to comment #2)
> Hrm, not good. Is it happening more than once?

Apparently, no. It occurred once and never did occur again thus far. I don't have a core dump either.
This happens reliably for me in Thunderbird gloda unit tests on x86_64 just-checked-out mozilla central rev 46484930f4f3, debug build:

Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at comm-central/mozilla/js/src/jstracer.cpp:3559


I just pulled the tracemonkey branch and still see it, rev 15df4512f52f:

Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at /home/visbrero/rev_control/hg/comm-central/mozilla/js/src/jstracer.cpp:3337


For example, in a comm-central build with a mozilla-central mozilla/ dir, in objdir/mailnews/db/gloda/test invoking the following will do it:

make SOLO_FILE=test_query_messages_imap_online.js check-one


It has actually been like this for several weeks or maybe even a few months.  I never bothered saying anything before because I assumed it would be a short-lived problem and Thunderbird is currently targeting mozilla-1.9.2 anyways, but I just tried again and saw it was still happening...

The unit test and exercised code in question are nowhere near trivial enough for me to attempt to try and reduce them without knowing what the trace is of.  Having said that, please let me know what I can do to help!
(In reply to comment #4)
> The unit test and exercised code in question are nowhere near trivial enough
> for me to attempt to try and reduce them without knowing what the trace is of. 
> Having said that, please let me know what I can do to help!

I think in cases like this where the testcase is difficult to reduce/obtain, having a core dump *might* be useful. gal, dvander, would it help if asuth is able to send you a dump?
That would be incredibly helpful. I could also try the STR - what OS were you using in comment #4, asuth?
I'm not sure what the general policy is for getting a core dump out of an xpcshell test dying on an assertion.  I used check-interactive with gdb attached and then used gdb's generate-core-file.  You can find it here:

http://clicky.visophyte.org/scratch/tracemonkey-tt-double-core-1.bz2

If there's a better way to get a more useful core, please let me know.

It's 244 megs when un-bzipped, 25 bzipped.

I'm on fedora12 x86_64.  I believe the same problem also occurred on Ubuntu 9.04 x86_64.
I see the same issue on 64 bit windows 7, running the same gloda tests w/ a debug build.
(In reply to comment #8)

Is this a 64-bit build on Windows 7, or a 32-bit build on Windows 7?
32 bit build running on 64 bit windows 7 (64 bit windows 7 probably irrelevant, I know, but better tmi than not enough i.)
I also see this on Linux 32bit. 3 gloda tests fail reproducibly.
Keywords: testcase-wanted
I also see this on a Mac 32 bit build running on a Mac 32 bit system. Updating to all/all.
OS: Mac OS X → All
Hardware: x86 → All
I can reproduce this. Investigating.
Attached patch fixSplinter Review
This problem was exposed by line 2777 in this file:

mozilla/dist/bin/modules/gloda/datastore.js

>          for each ([attrID, values] in
>              this._convertToDBValuesAndGroupByAttributeID(attrDef,

Here, |attrID| (and |values|) were not declared with |let| or |var|, so they are binding as globals.

This exposed a corner case in the trace recorder where the interpreter could change global types behind its back. Now it will simply abort the trace.
Assignee: general → dvander
Status: NEW → ASSIGNED
Attachment #427644 - Flags: review?(jorendorff)
Comment on attachment 427644 [details] [diff] [review]
fix

Good start, but this isn't enough because we can also get here from other places.

Suppose we record a call to a JSNative. At record time, it's well behaved, but at run time, it calls JS_SetProperty to modify the global object. This goes through JSObject::setProperty to js_SetProperty to js_SetPropertyHelper. At some point we need to deep-bail.

The fix in setElem is good, but a jsapi-test and a fix are needed for the more general case.
Attachment #427644 - Flags: review?(jorendorff)
fwiw, this does fix the mailnews tests, so it's an excellent start from our pov :-)
The testcase in the patch in comment #15 asserts in "Assertion failure: isInt32(*p), at ../jstracer.cpp" fwiw.

autoBisect shows this is probably related to bug 515749:

The first bad revision is:
changeset:   35427:538a07fdf3d2
user:        David Anderson
date:        Fri Dec 11 19:10:36 2009 -0800
summary:     Lazily import stack and global slots (bug 515749, original patch and r=gal).
Blocks: 515749
Keywords: testcase
Summary: TM: "Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp" → TM: "Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp" or "Assertion failure: isInt32(*p), at ../jstracer.cpp"
(In reply to comment #16)
> (From update of attachment 427644 [details] [diff] [review])
> Good start, but this isn't enough because we can also get here from other
> places.
> 
> Suppose we record a call to a JSNative. At record time, it's well behaved, but
> at run time, it calls JS_SetProperty to modify the global object. This goes
> through JSObject::setProperty to js_SetProperty to js_SetPropertyHelper. At
> some point we need to deep-bail.

See js_NativeSet (and js_NativeGet), first line. This is already covered, AFAIK.

/be
I think the problem was described backwards: It's well-defined at run-time, the set will cause the trace to abort (as Brendan said, LeaveTraceIfGlobalObject).

The problem is when these paths are taken at record time. It's okay if the global shape changes because we'll throw out the trace. But if a single type changes, the assert will botch.
Just to clarify, David is exactly right. Sorry for the confusion.
I pushed the fix for gloda not having used a 'let' where it should have to comm-central:
http://hg.mozilla.org/comm-central/rev/06c2c95bdcc6

This obviously does not have any impact on the underlying bug, but it seemed worth noting here.
Group: core-security
Attachment #427644 - Flags: review+
Whiteboard: [sg:critical]
beltzner, we kinda screwed up here a little. A dup of this bug with a test case was open for quite a while. We didn't realize that it was a dup until a few minutes ago. This is a pretty bad bug, likely exploitable, but maybe a notch less bad than a GC rooting bug. The patch is a trivial 1-liner that stops us from tracing this particular case. The only relevant downside risk is a performance regression. However, we never emitted correct code for this in the first place. I recommend the next dot release, but the ship might have sailed. Sorry. Not our week ...
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: --- → ?
Correction: This might stop code from tracing that traced before and accidentally worked. So there is a chance for perf regressions. global[] is unlikely to be a frequent use case, but still.
I filed a bug to improve performance for the globalobj[string] case. bug 561939. I don't want to link dependencies since this one is sensitive.
Based on comment 24 I'm pretty sure we have to block on this.
blocking1.9.1: ? → .10+
blocking1.9.2: ? → .4+
No averse SunSpider or v8 effects, landing.
Attached patch 1.9.2 fixSplinter Review
Attachment #441698 - Flags: review?(gal)
Attachment #441698 - Flags: approval1.9.2.4?
Comment on attachment 441698 [details] [diff] [review]
1.9.2 fix

Please don't check in the test cases until we ship the fix.
Attachment #441698 - Flags: review?(gal) → review+
Attachment #441698 - Flags: approval1.9.2.4? → approval1.9.2.4+
Comment on attachment 441698 [details] [diff] [review]
1.9.2 fix

a=beltzner for mozilla1.9.2 please land on default and GECKO1924_20100413_RELBRANCH

Andreas said he'd check to see if this affects 1.9.1.x
1.9.1 seems safe (tested) because of the guardNotGlobalObject() call inside SETELEM.

Filed follow-up bug 561945 and bug 561954.
blocking1.9.1: .10+ → -
Gary or Andrew, can you verify that this is fixed in 1.9.2 now?
(In reply to comment #35)
> Gary or Andrew, can you verify that this is fixed in 1.9.2 now?

Not I!  The code in question that triggered the bug is no longer so present in Thunderbird and I do not believe ever was triggered on mozilla-1.9.2 because of differing policies on what chrome gets traced.  The analysis/unit test seems pretty compelling to me, though :)
(In reply to comment #35)
> Gary or Andrew, can you verify that this is fixed in 1.9.2 now?

$ cat 1testcase.js | ./js-dbg-32-192-darwin -j -i
js> function f() { }
js> f(this);
js> var obj = this;
js> a1 = 20;
20
js> a2 = 20;
20
js> 
js> for (var i = 1; i < 10; i++)
  a2 = 20;
20
js> 
js> for (var i = 1; i < 10; i++) {
  obj['a' + i] = "";
  f(a2);
}
js> 
js> 
$ ./js-dbg-32-192-darwin -j 1testcase.js
$

The testcase in the patch in comment #15 no longer asserts in 1.9.2 debug shell builds.
Status: RESOLVED → VERIFIED
Keywords: verified1.9.2
Thanks, Gary!
Group: core-security
blocking2.0: ? → betaN+
Filter on qa-project-auto-change:

Bug in removed tracer code, setting in-testsuite- flag.
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: