Last Comment Bug 546611 - TM: "Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp" or "Assertion failure: isInt32(*p), at ../jstracer.cpp"
: TM: "Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp"...
Status: VERIFIED FIXED
[sg:critical]
: assertion, regression, testcase, verified1.9.2
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- critical (vote)
: ---
Assigned To: David Anderson [:dvander]
:
Mentors:
: 546931 (view as bug list)
Depends on:
Blocks: jsfunfuzz 515749
  Show dependency treegraph
 
Reported: 2010-02-17 02:43 PST by Gary Kwong [:gkw] [:nth10sd]
Modified: 2013-03-11 12:42 PDT (History)
16 users (show)
choller: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+
.4+
.4-fixed
-
unaffected


Attachments
fix (1.34 KB, patch)
2010-02-18 13:18 PST, David Anderson [:dvander]
gal: review+
Details | Diff | Review
1.9.2 fix (1.02 KB, patch)
2010-04-26 20:48 PDT, David Anderson [:dvander]
gal: review+
mbeltzner: approval1.9.2.4+
Details | Diff | Review

Description Gary Kwong [:gkw] [:nth10sd] 2010-02-17 02:43:28 PST
Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at ../jstracer.cpp:3337

This assert came about from jsfunfuzz but I don't have a reproducible testcase. Filing in case someone can figure out the cause. This occurred in 32-bit js debug shell on TM changeset 7c63a6c5ca78 on 10.6.2.

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   js-dbg-32-tm-darwin                 0x001302bd JS_Assert + 71
1   js-dbg-32-tm-darwin                 0x0015a31d js::TraceRecorder::import(nanojit::LIns*, int, long*, js::TraceType_, char const*, unsigned int, JSStackFrame*) + 243
2   js-dbg-32-tm-darwin                 0x0015abd2 js::TraceRecorder::get(long*) + 374
3   js-dbg-32-tm-darwin                 0x00177036 js::TraceRecorder::setProp(long&, JSPropCacheEntry*, JSScopeProperty*, long&, nanojit::LIns*&) + 796
4   js-dbg-32-tm-darwin                 0x001775f6 js::TraceRecorder::record_SetPropHit(JSPropCacheEntry*, JSScopeProperty*) + 96
5   js-dbg-32-tm-darwin                 0x000b0f24 js_SetPropertyHelper + 2161
6   js-dbg-32-tm-darwin                 0x0008af2b js_Interpret + 85475
7   js-dbg-32-tm-darwin                 0x0009f416 js_Execute + 1401
8   js-dbg-32-tm-darwin                 0x0001198a JS_ExecuteScript + 54
9   js-dbg-32-tm-darwin                 0x0000a650 Process(JSContext*, JSObject*, char*, int) + 458 (js.cpp:446)
10  js-dbg-32-tm-darwin                 0x0000b226 ProcessArgs(JSContext*, JSObject*, char**, int) + 1909 (js.cpp:791)
11  js-dbg-32-tm-darwin                 0x0000b765 main + 953 (js.cpp:4876)
12  js-dbg-32-tm-darwin                 0x0000285d _start + 208
13  js-dbg-32-tm-darwin                 0x0000278c start + 40
Comment 2 David Anderson [:dvander] 2010-02-17 02:45:10 PST
Hrm, not good. Is it happening more than once?
Comment 3 Gary Kwong [:gkw] [:nth10sd] 2010-02-17 02:46:41 PST
(In reply to comment #2)
> Hrm, not good. Is it happening more than once?

Apparently, no. It occurred once and never did occur again thus far. I don't have a core dump either.
Comment 4 Andrew Sutherland [:asuth] 2010-02-17 06:19:40 PST
This happens reliably for me in Thunderbird gloda unit tests on x86_64 just-checked-out mozilla central rev 46484930f4f3, debug build:

Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at comm-central/mozilla/js/src/jstracer.cpp:3559


I just pulled the tracemonkey branch and still see it, rev 15df4512f52f:

Assertion failure: isNumber(*p) == (t == TT_DOUBLE), at /home/visbrero/rev_control/hg/comm-central/mozilla/js/src/jstracer.cpp:3337


For example, in a comm-central build with a mozilla-central mozilla/ dir, in objdir/mailnews/db/gloda/test invoking the following will do it:

make SOLO_FILE=test_query_messages_imap_online.js check-one


It has actually been like this for several weeks or maybe even a few months.  I never bothered saying anything before because I assumed it would be a short-lived problem and Thunderbird is currently targeting mozilla-1.9.2 anyways, but I just tried again and saw it was still happening...

The unit test and exercised code in question are nowhere near trivial enough for me to attempt to try and reduce them without knowing what the trace is of.  Having said that, please let me know what I can do to help!
Comment 5 Gary Kwong [:gkw] [:nth10sd] 2010-02-17 08:09:37 PST
(In reply to comment #4)
> The unit test and exercised code in question are nowhere near trivial enough
> for me to attempt to try and reduce them without knowing what the trace is of. 
> Having said that, please let me know what I can do to help!

I think in cases like this where the testcase is difficult to reduce/obtain, having a core dump *might* be useful. gal, dvander, would it help if asuth is able to send you a dump?
Comment 6 David Anderson [:dvander] 2010-02-17 11:18:38 PST
That would be incredibly helpful. I could also try the STR - what OS were you using in comment #4, asuth?
Comment 7 Andrew Sutherland [:asuth] 2010-02-17 11:44:33 PST
I'm not sure what the general policy is for getting a core dump out of an xpcshell test dying on an assertion.  I used check-interactive with gdb attached and then used gdb's generate-core-file.  You can find it here:

http://clicky.visophyte.org/scratch/tracemonkey-tt-double-core-1.bz2

If there's a better way to get a more useful core, please let me know.

It's 244 megs when un-bzipped, 25 bzipped.

I'm on fedora12 x86_64.  I believe the same problem also occurred on Ubuntu 9.04 x86_64.
Comment 8 David :Bienvenu 2010-02-17 14:06:43 PST
I see the same issue on 64 bit windows 7, running the same gloda tests w/ a debug build.
Comment 9 David Anderson [:dvander] 2010-02-17 14:21:35 PST
(In reply to comment #8)

Is this a 64-bit build on Windows 7, or a 32-bit build on Windows 7?
Comment 10 David :Bienvenu 2010-02-17 14:55:21 PST
32 bit build running on 64 bit windows 7 (64 bit windows 7 probably irrelevant, I know, but better tmi than not enough i.)
Comment 11 David :Bienvenu 2010-02-18 07:31:31 PST
*** Bug 546931 has been marked as a duplicate of this bug. ***
Comment 12 Ben Bucksch (:BenB) 2010-02-18 08:19:27 PST
I also see this on Linux 32bit. 3 gloda tests fail reproducibly.
Comment 13 Mark Banner (:standard8) 2010-02-18 08:35:23 PST
I also see this on a Mac 32 bit build running on a Mac 32 bit system. Updating to all/all.
Comment 14 David Anderson [:dvander] 2010-02-18 10:19:43 PST
I can reproduce this. Investigating.
Comment 15 David Anderson [:dvander] 2010-02-18 13:18:22 PST
Created attachment 427644 [details] [diff] [review]
fix

This problem was exposed by line 2777 in this file:

mozilla/dist/bin/modules/gloda/datastore.js

>          for each ([attrID, values] in
>              this._convertToDBValuesAndGroupByAttributeID(attrDef,

Here, |attrID| (and |values|) were not declared with |let| or |var|, so they are binding as globals.

This exposed a corner case in the trace recorder where the interpreter could change global types behind its back. Now it will simply abort the trace.
Comment 16 Jason Orendorff [:jorendorff] 2010-02-18 14:58:59 PST
Comment on attachment 427644 [details] [diff] [review]
fix

Good start, but this isn't enough because we can also get here from other places.

Suppose we record a call to a JSNative. At record time, it's well behaved, but at run time, it calls JS_SetProperty to modify the global object. This goes through JSObject::setProperty to js_SetProperty to js_SetPropertyHelper. At some point we need to deep-bail.

The fix in setElem is good, but a jsapi-test and a fix are needed for the more general case.
Comment 17 David :Bienvenu 2010-02-18 16:29:56 PST
fwiw, this does fix the mailnews tests, so it's an excellent start from our pov :-)
Comment 18 Gary Kwong [:gkw] [:nth10sd] 2010-02-18 17:19:55 PST
The testcase in the patch in comment #15 asserts in "Assertion failure: isInt32(*p), at ../jstracer.cpp" fwiw.

autoBisect shows this is probably related to bug 515749:

The first bad revision is:
changeset:   35427:538a07fdf3d2
user:        David Anderson
date:        Fri Dec 11 19:10:36 2009 -0800
summary:     Lazily import stack and global slots (bug 515749, original patch and r=gal).
Comment 19 Brendan Eich [:brendan] 2010-02-18 19:04:15 PST
(In reply to comment #16)
> (From update of attachment 427644 [details] [diff] [review])
> Good start, but this isn't enough because we can also get here from other
> places.
> 
> Suppose we record a call to a JSNative. At record time, it's well behaved, but
> at run time, it calls JS_SetProperty to modify the global object. This goes
> through JSObject::setProperty to js_SetProperty to js_SetPropertyHelper. At
> some point we need to deep-bail.

See js_NativeSet (and js_NativeGet), first line. This is already covered, AFAIK.

/be
Comment 20 David Anderson [:dvander] 2010-02-18 19:20:47 PST
I think the problem was described backwards: It's well-defined at run-time, the set will cause the trace to abort (as Brendan said, LeaveTraceIfGlobalObject).

The problem is when these paths are taken at record time. It's okay if the global shape changes because we'll throw out the trace. But if a single type changes, the assert will botch.
Comment 21 Jason Orendorff [:jorendorff] 2010-02-22 11:58:10 PST
Just to clarify, David is exactly right. Sorry for the confusion.
Comment 22 Andrew Sutherland [:asuth] 2010-03-16 18:52:08 PDT
I pushed the fix for gloda not having used a 'let' where it should have to comm-central:
http://hg.mozilla.org/comm-central/rev/06c2c95bdcc6

This obviously does not have any impact on the underlying bug, but it seemed worth noting here.
Comment 23 Andreas Gal :gal 2010-04-26 19:23:48 PDT
*** Bug 558633 has been marked as a duplicate of this bug. ***
Comment 24 Andreas Gal :gal 2010-04-26 19:31:40 PDT
beltzner, we kinda screwed up here a little. A dup of this bug with a test case was open for quite a while. We didn't realize that it was a dup until a few minutes ago. This is a pretty bad bug, likely exploitable, but maybe a notch less bad than a GC rooting bug. The patch is a trivial 1-liner that stops us from tracing this particular case. The only relevant downside risk is a performance regression. However, we never emitted correct code for this in the first place. I recommend the next dot release, but the ship might have sailed. Sorry. Not our week ...
Comment 25 Andreas Gal :gal 2010-04-26 19:35:42 PDT
Correction: This might stop code from tracing that traced before and accidentally worked. So there is a chance for perf regressions. global[] is unlikely to be a frequent use case, but still.
Comment 26 Andreas Gal :gal 2010-04-26 19:41:23 PDT
I filed a bug to improve performance for the globalobj[string] case. bug 561939. I don't want to link dependencies since this one is sensitive.
Comment 27 Mike Beltzner [:beltzner, not reading bugmail] 2010-04-26 20:04:32 PDT
Based on comment 24 I'm pretty sure we have to block on this.
Comment 28 David Anderson [:dvander] 2010-04-26 20:07:37 PDT
No averse SunSpider or v8 effects, landing.
Comment 30 David Anderson [:dvander] 2010-04-26 20:48:37 PDT
Created attachment 441698 [details] [diff] [review]
1.9.2 fix
Comment 31 Andreas Gal :gal 2010-04-26 20:51:33 PDT
Comment on attachment 441698 [details] [diff] [review]
1.9.2 fix

Please don't check in the test cases until we ship the fix.
Comment 32 Mike Beltzner [:beltzner, not reading bugmail] 2010-04-26 21:02:04 PDT
Comment on attachment 441698 [details] [diff] [review]
1.9.2 fix

a=beltzner for mozilla1.9.2 please land on default and GECKO1924_20100413_RELBRANCH

Andreas said he'd check to see if this affects 1.9.1.x
Comment 34 David Anderson [:dvander] 2010-04-26 22:03:18 PDT
1.9.1 seems safe (tested) because of the guardNotGlobalObject() call inside SETELEM.

Filed follow-up bug 561945 and bug 561954.
Comment 35 [On PTO until 6/29] 2010-04-29 17:45:19 PDT
Gary or Andrew, can you verify that this is fixed in 1.9.2 now?
Comment 36 Andrew Sutherland [:asuth] 2010-04-29 18:42:54 PDT
(In reply to comment #35)
> Gary or Andrew, can you verify that this is fixed in 1.9.2 now?

Not I!  The code in question that triggered the bug is no longer so present in Thunderbird and I do not believe ever was triggered on mozilla-1.9.2 because of differing policies on what chrome gets traced.  The analysis/unit test seems pretty compelling to me, though :)
Comment 37 Gary Kwong [:gkw] [:nth10sd] 2010-04-29 21:00:51 PDT
(In reply to comment #35)
> Gary or Andrew, can you verify that this is fixed in 1.9.2 now?

$ cat 1testcase.js | ./js-dbg-32-192-darwin -j -i
js> function f() { }
js> f(this);
js> var obj = this;
js> a1 = 20;
20
js> a2 = 20;
20
js> 
js> for (var i = 1; i < 10; i++)
  a2 = 20;
20
js> 
js> for (var i = 1; i < 10; i++) {
  obj['a' + i] = "";
  f(a2);
}
js> 
js> 
$ ./js-dbg-32-192-darwin -j 1testcase.js
$

The testcase in the patch in comment #15 no longer asserts in 1.9.2 debug shell builds.
Comment 38 [On PTO until 6/29] 2010-04-30 14:04:56 PDT
Thanks, Gary!
Comment 39 Christian Holler (:decoder) 2013-03-11 12:42:27 PDT
Filter on qa-project-auto-change:

Bug in removed tracer code, setting in-testsuite- flag.

Note You need to log in before you can comment on or make changes to this bug.