Closed Bug 636795 Opened 10 years ago Closed 10 years ago

PROCESS-CRASH | test_index_messages_imap_online_to_offline.js (and others) crash after latest tracemonkey merge [@ js::TraceRecorder::determineSlotType

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0
Tracking Status
blocking2.0 --- .x+

People

(Reporter: standard8, Assigned: dvander)

References

Details

(Keywords: crash, regression, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

Somewhere in the latest tracemonkey, a crash has been introduced into the Thunderbird xpcshell-tests. Regression range:

http://hg.mozilla.org/mozilla-central/rev/032e7061575a

Example Log:

http://tinderbox.mozilla.org/showlog.cgi?tree=Thunderbird&errorparser=unittest&logfile=1298659317.1298660517.29120.gz&buildtime=1298659317&buildname=WINNT%205.2%20comm-central%20test%20xpcshell&fulltext=1

Stack:

0  mozjs.dll!js::TraceRecorder::determineSlotType(js::Value *) [jstracer.cpp:0f1f5cc97ed5 : 4154 + 0x2]
    eip = 0x012d2827   esp = 0x0012dd4c   ebp = 0x00000002   ebx = 0x00000001
    esi = 0x02e00298   edi = 0x04a1f000   eax = 0x00000000   ecx = 0x00000000
    edx = 0x0324f3a0   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  mozjs.dll!js::VisitFrameSlots<js::DetermineTypesVisitor> [jstracer.cpp:0f1f5cc97ed5 : 1871 + 0x1f]
    eip = 0x012d62d0   esp = 0x0012dd60   ebp = 0x0012ddc8
    Found by: call frame info
 2  mozjs.dll!js::TraceRecorder::snapshot(js::ExitType) [jstracer.cpp:0f1f5cc97ed5 : 4268 + 0x46]
    eip = 0x012daeba   esp = 0x0012dd7c   ebp = 0x032060a0   ebx = 0x0489c180
    Found by: call frame info
 3  mozjs.dll!js::TraceRecorder::guardPropertyCacheHit(nanojit::LIns *,JSObject *,JSObject *,js::PropertyCacheEntry *,js::PCVal &) [jstracer.cpp:0f1f5cc97ed5 : 9704 + 0x8]
    eip = 0x012dcfd0   esp = 0x0012ddd4   ebp = 0x0323a970   ebx = 0x037235d0
    Found by: call frame info
 4  mozjs.dll!js::TraceRecorder::test_property_cache(JSObject *,nanojit::LIns *,JSObject * &,js::PCVal &) [jstracer.cpp:0f1f5cc97ed5 : 9694 + 0x19]
    eip = 0x012e572b   esp = 0x0012ddf4   ebp = 0x0323a970   ebx = 0x037235d0
    Found by: call frame info
 5  mozjs.dll!js::TraceRecorder::record_JSOP_CALLPROP() [jstracer.cpp:0f1f5cc97ed5 : 16152 + 0x16]
    eip = 0x012e8252   esp = 0x0012de24   ebp = 0x04a7c078   ebx = 0x04a7c078
    Found by: call frame info
 6  mozjs.dll!js::TraceRecorder::monitorRecording(JSOp) [jsopcode.tbl:0f1f5cc97ed5 : 448 + 0x6]
    eip = 0x012eb239   esp = 0x0012de5c   ebp = 0x0332f09c   ebx = 0x00000000
    Found by: call frame info
 7  mozjs.dll!js::Interpret(JSContext *,JSStackFrame *,unsigned int,JSInterpMode) [jsinterp.cpp:0f1f5cc97ed5 : 2719 + 0xb]
    eip = 0x0124b36b   esp = 0x0012de7c   ebp = 0x0012e5d8   ebx = 0x0273d460
    Found by: call frame info
 8  mozcrt19.dll!__dllonexit [onexit.c:678a973a7e1c : 276 + 0x5]
    eip = 0x7813d208   esp = 0x0012e5c4   ebp = 0x79425ff0   ebx = 0x03723400
    Found by: call frame info
 9  mozjs.dll!js::ctypes::jsvalToPtrExplicit [CTypes.cpp:0f1f5cc97ed5 : 1508 + 0xc]
    eip = 0x01349d9d   esp = 0x0012e5c8   ebp = 0x79425ff0
    Found by: stack scanning
10  mozjs.dll!js::RunScript(JSContext *,JSScript *,JSStackFrame *) [jsinterp.cpp:0f1f5cc97ed5 : 653 + 0xa]
    eip = 0x012492a6   esp = 0x0012e5e0   ebp = 0x79425ff0
    Found by: call frame info
11  mozjs.dll!SendToGenerator [jsiter.cpp:0f1f5cc97ed5 : 1286 + 0x14]
    eip = 0x0125912d   esp = 0x0012e608   ebp = 0x04a0cf20
    Found by: call frame info with scanning
12  mozjs.dll!generator_op [jsiter.cpp:0f1f5cc97ed5 : 1399 + 0x23]
    eip = 0x0125940f   esp = 0x0012e644   ebp = 0x049d7870   ebx = 0x02e001b8
    Found by: call frame info
13  mozjs.dll!generator_next [jsiter.cpp:0f1f5cc97ed5 : 1414 + 0x14]
    eip = 0x01259475   esp = 0x0012e664   ebp = 0x0012edf0
    Found by: call frame info
14  mozjs.dll!js::Interpret(JSContext *,JSStackFrame *,unsigned int,JSInterpMode) [jsinterp.cpp:0f1f5cc97ed5 : 4791 + 0xf]
    eip = 0x012536e9   esp = 0x0012e674   ebp = 0x0012edf0   ebx = 0x0273d460
    Found by: call frame info
15  mozjs.dll!js::RunScript(JSContext *,JSScript *,JSStackFrame *) [jsinterp.cpp:0f1f5cc97ed5 : 653 + 0xa]
    eip = 0x012492a6   esp = 0x0012edf8   ebp = 0x794267f0   ebx = 0x0338a020
    Found by: call frame info with scanning
16  mozjs.dll!js::Invoke(JSContext *,js::CallArgs const &,unsigned int) [jsinterp.cpp:0f1f5cc97ed5 : 740 + 0x11]
    eip = 0x01249da4   esp = 0x0012ee20   ebp = 0x0320c230
    Found by: call frame info
17  mozjs.dll!js::ExternalInvoke(JSContext *,js::Value const &,js::Value const &,unsigned int,js::Value *,js::Value *) [jsinterp.cpp:0f1f5cc97ed5 : 856 + 0xc]
    eip = 0x0124a4e1   esp = 0x0012ee6c   ebp = 0x0012f10c   ebx = 0x00000000
    Found by: call frame info
18  mozjs.dll!JS_CallFunctionValue [jsapi.cpp:0f1f5cc97ed5 : 5173 + 0x4c]
    eip = 0x01208bad   esp = 0x0012eea8   ebp = 0x0012f10c   ebx = 0x00002801
    Found by: call frame info
19  xul.dll!nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS *,unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [xpcwrappedjsclass.cpp:0f1f5cc97ed5 : 1672 + 0x32]
    eip = 0x00a01428   esp = 0x0012eed4   ebp = 0x0012f10c
    Found by: call frame info
20  xul.dll!nsXPCWrappedJS::CallMethod(unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [xpcwrappedjs.cpp:0f1f5cc97ed5 : 588 + 0x12]
    eip = 0x009fec2b   esp = 0x0012f178   ebp = 0x0012f19c
    Found by: call frame info
21  xul.dll!PrepareAndDispatch [xptcstubs.cpp:0f1f5cc97ed5 : 114 + 0x14]
    eip = 0x00cc237f   esp = 0x0012f1a4   ebp = 0x0012f250
    Found by: call frame info
22  xul.dll!SharedStub [xptcstubs.cpp:0f1f5cc97ed5 : 141 + 0x4]
    eip = 0x00cc23e9   esp = 0x0012f258   ebp = 0x0012f26c   ebx = 0x00000000
    Found by: call frame info
23  xul.dll!nsThread::ProcessNextEvent(int,int *) [nsThread.cpp:0f1f5cc97ed5 : 633 + 0x5]
    eip = 0x00cba70f   esp = 0x0012f274   ebp = 0x0012f26c
    Found by: call frame info with scanning
Requesting blocking as we're seeing this in several tests and its now seen on multiple platforms.
blocking2.0: --- → ?
Keywords: crash, regression
OS: Windows Server 2003 → All
(In reply to comment #0)
> Somewhere in the latest tracemonkey, a crash has been introduced into the
> Thunderbird xpcshell-tests. Regression range:
> 
> http://hg.mozilla.org/mozilla-central/rev/032e7061575a

Does this mean that 032e7061575a is proven to be exactly the regressing changeset?
blocking2.0: ? → final+
Whiteboard: [hardblocker]
(In particular, 032e7061575a was not part of the TM merge)
Assignee: general → dvander
Sorry, that was a bad paste, it should have been:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a9a670d16a92&tochange=0f1f5cc97ed5

I'm just starting work on seeing if I can narrow it down a bit.
On my local debug build, I'm seeing this just before the crash:

Assertion failure: *(uint32 *)slot != 0, at /Users/moztest/comm/main/src/mozilla/js/src/jsvalue.h:209
Hg bisect says:

The first bad revision is:
changeset:   63085:b19abe19a212
user:        brendan@mozilla.org
date:        Tue Feb 22 22:25:10 2011 -0800
summary:     Unqualified function invocation doesn't use the global object the property was gotten from as |this| (bug 634590, r=gal).
Blocks: 634590
Thanks for the bisect - that bug changed the interpreter, but either JIT. I can reproduce an assert in jsvalue.h, I'll try from before that patch.
Confirmed that bug 634590 regressed this.
The problem is that CALLNAME pushes undefined while the interpreter may push an object from a |this| hook. If we bail out before the CALL, the types don't match.

It's not enough to abort in CALLNAME, we need to move or duplicate the callee guard there, I think.
We could move the test from CALL into all CALL* ops (CALLNAME, CALLPROP, CALLGNAME, etc).
will take a patch for Firefox 4
blocking2.0: final+ → .x+
Whiteboard: [hardblocker]
David, do you have an ETA for a fix for this?

(I'm asking as the failures are holding the Thunderbird tree closed, we can work around the issue, but I'd prefer to know the ETA first).
I will try to have a fix for this today.
Attached patch fix (obsolete) — Splinter Review
Attachment #515771 - Flags: review?(gal)
Comment on attachment 515771 [details] [diff] [review]
fix

diff --git a/toolkit/components/places/tests/unit/default.sqlite b/toolkit/components/places/tests/unit/default.sqlite ?

We call a method for every function we inline into? That will be extremely slow. scopeObj will be mostly == globalObj, no?
No. There is a long chain of conditions before we reach that guard, including:
 (1) The callee must be found on a global.
 (2) The callee must not be constant (i.e. callobj intervened).
 (3) The tree is not in compile-and-go code.

The function call is also doing the same check the interpreter would do in ComputeImplicitThis().
Also, this applies only to CALLNAME.
(In reply to comment #17)
> Also, this applies only to CALLNAME.

and CALLGNAME -- but never in compileAndGo scripts -- which saves the day.

Good patch for a hard case, I think Andreas will r+ as soon as I can get his attention.

On IRC I suggested blaming XBL (or JS_CloneFunctionObject at least) by name in the comments, since it is hard to see how the bad case can arise otherwise.

/be
(In reply to comment #18)
> On IRC I suggested blaming XBL (or JS_CloneFunctionObject at least) by name in
> the comments, since it is hard to see how the bad case can arise otherwise.

I am unclear if you are talking about a case other than the one exercised by our failing Thunderbird test.  If you are talking about our failing case, it's worth noting that there should be no XBL involved.
The line of code this was failing on was a btoa() call in mailnews/test/resources/messageInjection.js, which was apparently scoped by a different global than that file.
Ah, yeah.  I wanted a free btoa() implementation and JS modules get it for free, so I stole it from a JS module.  (messageInjection.js is evaluated in the global xpcshell namespace.)
(In reply to comment #21)
> Ah, yeah.  I wanted a free btoa() implementation and JS modules get it for
> free, so I stole it from a JS module.  (messageInjection.js is evaluated in the
> global xpcshell namespace.)

And that uses the XPConnect JS module loader, which makes non-compileAndGo scripts and functions?

And not the JSSubScriptLoader, which calls JS_Evaluate*Script* which does make compileAndGo-optimized scripts?

Just making sure here.

/be
(David: so the comment should fairly blame pre-compilation in general and the consequent loss of compileAndGo, and not blame XBL or JS_CloneFunctionObject in particular.)

/be
(In reply to comment #22)
> (In reply to comment #21)
> > Ah, yeah.  I wanted a free btoa() implementation and JS modules get it for
> > free, so I stole it from a JS module.  (messageInjection.js is evaluated in the
> > global xpcshell namespace.)
> 
> And that uses the XPConnect JS module loader, which makes non-compileAndGo
> scripts and functions?

The compileAndGo semantics are currently unknown to me, so I'll just elaborate and point at the JS script compilation points per my understanding.  My apologies if this is already well-known to all active parties:

The btoa came from a Components.utils.import() loaded module.  That uses xpcIJSModuleLoader whose impl is mozJSComponentLoader which uses:
  JS_CompileScriptForPrincipalsVersion(
    cx, global, jsPrincipals, buf, fileSize32, nativePath.get(), 1,
    JSVERSION_LATEST)

http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1128
 
> And not the JSSubScriptLoader, which calls JS_Evaluate*Script* which does make
> compileAndGo-optimized scripts?

messageInjection.js got loaded via xpcshell's Load method (exposed as "load"), which uses:
  JS_CompileFileHandleForPrincipals(cx, obj, filename.ptr(),                                                 
    file, gJSPrincipals).

http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/shell/xpcshell.cpp#491

However, it is worth noting that xpcshell is bootstrapped via one or more calls to:
  JS_EvaluateScriptForPrincipals(cx, obj, gJSPrincipals, argv[i],
    strlen(argv[i]), "-e", 1, &rval);

http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/shell/xpcshell.cpp#1309
Comment on attachment 515771 [details] [diff] [review]
fix

Please remove the unrelated change to default.sqlite
Attachment #515771 - Flags: review?(gal) → review+
I pushed this to Thunderbird's try server earlier. It seems to fix the gloda items, however, there's several new failures.

(I'm assuming that this patch doesn't rely on anything in tracemonkey repo, but just applied straight to mozilla-central)

I'm seeing a failure in make check:

TEST-UNEXPECTED-FAIL | jit_test.py | /builds/slave/tryserver-macosx/build/mozilla/js/src/jit-test/tests/v8-v5/check-earley-boyer.js: 
...
FAILURES:
    -j /builds/slave/tryserver-macosx/build/mozilla/js/src/jit-test/tests/v8-v5/check-earley-boyer.js

In xpcshell-tests there are crases in a couple of places:

TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux64-opt-unittest-xpcshell/build/xpcshell/tests/dom/src/json/test/unit/test_encode.js | test failed (with xpcshell return code: 1), see following log:

Thread 0 (crashed)
 0  libxul.so!js::TraceRecorder::record_JSOP_CALLNAME [jsvalue.h:ec8a059bea58 : 487 + 0x0]
    rbx = 0x01ac2f40   r12 = 0x80208120   r13 = 0x01ac2f40   r14 = 0x00000039
    r15 = 0x01a24df8   rip = 0x8ad71d65   rsp = 0x5b224600   rbp = 0x00000000
    Found by: given as instruction pointer in context
 1  libxul.so!js::TraceRecorder::monitorRecording [jsopcode.tbl:ec8a059bea58 : 180 + 0x7]
    rbx = 0x00000000   r12 = 0x00000039   r13 = 0x01ac2f40   r14 = 0x00000039
    r15 = 0x01a24df8   rip = 0x8ad72e0d   rsp = 0x5b2246a0   rbp = 0x00000000
    Found by: call frame info


TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux64-opt-unittest-xpcshell/build/xpcshell/tests/dom/src/json/test/unit/test_wrappers.js | test failed (with xpcshell return code: 1), see following log:

 0  libxul.so!js::TraceRecorder::record_JSOP_CALLNAME [jsvalue.h:ec8a059bea58 : 487 + 0x0]
    rbx = 0x01beff40   r12 = 0x32008120   r13 = 0x01beff40   r14 = 0x00000039
    r15 = 0x01b51df8   rip = 0x42b82d65   rsp = 0x9d129570   rbp = 0x00000000
    Found by: given as instruction pointer in context
 1  libxul.so!js::TraceRecorder::monitorRecording [jsopcode.tbl:ec8a059bea58 : 180 + 0x7]
    rbx = 0x00000000   r12 = 0x00000039   r13 = 0x01beff40   r14 = 0x00000039
    r15 = 0x01b51df8   rip = 0x42b83e0d   rsp = 0x9d129610   rbp = 0x00000000
    Found by: call frame info
Attached patch v2Splinter Review
Whoops, thanks. scopeChainProp() can return a NULL vp, in which case you have to read |nr.v| instead.
Attachment #515771 - Attachment is obsolete: true
Attachment #515952 - Flags: review?(gal)
Comment on attachment 515952 [details] [diff] [review]
v2

default.sqlite is _still_ in this patch. Be careful when you push.
Attachment #515952 - Flags: review?(gal) → review+
Comment on attachment 515952 [details] [diff] [review]
v2

Thanks, I have no idea where that's coming from but I won't check it in.
Attachment #515952 - Flags: approval2.0?
FTR this is looking good on Thunderbird's try server tests. Thanks for the update.
dvander: you nominated this for approval without giving us an assessment of the risk v. reward - can you please do so?
Blocks: 637876
(In reply to comment #31)
> dvander: you nominated this for approval without giving us an assessment of the
> risk v. reward - can you please do so?

It fixes a recent regression caused by bug 634590. The biggest risk is that it could not completely fix the regression caused by bug 634590. The reward is that the Thunderbird tree can re-open, its test suite is crashing.
Sorry, elaboration on the crash: it's near-NULL, not security sensitive and only affects chrome code, so we decided not to block on this, but to take a patch if one is ready because of thunderbird.
(In reply to comment #32)
> The reward is
> that the Thunderbird tree can re-open, its test suite is crashing.

FTR we're now fixing the Thunderbird tree on a known good version of mozilla-central until this is resolved (at risk of missing more regressions, although we  do have one or two extra builders watching occasionally). However, I'm not sure I'd want to ship a version of Thunderbird on gecko with this crasher as we'd have to use our own relbranch which is possible, but concerning.
(In reply to comment #32)
> (In reply to comment #31)
> > dvander: you nominated this for approval without giving us an assessment of the
> > risk v. reward - can you please do so?
> 
> It fixes a recent regression caused by bug 634590. The biggest risk is that it
> could not completely fix the regression caused by bug 634590. The reward is
> that the Thunderbird tree can re-open, its test suite is crashing.

FWIW (not much these days!), I think this patch is good to go in Fx4.

/be
and also due to fixing to a known good revision, it means we can't test the latest changes, e.g. bug 636465 comment 11.
Attachment #515952 - Flags: approval2.0? → approval2.0+
David (Anderson) pushed this to the tracemonkey repo earlier:

http://hg.mozilla.org/tracemonkey/rev/657915e80805

I spoke to him on irc and he was happy for me to push it directly to mozilla-central if it was green on tracemonkey, which I've now done:

http://hg.mozilla.org/mozilla-central/rev/8b4dbf4b6ac5

Thanks for all the work on fixing this. I'll keep an eye out for any more issues.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
Target Milestone: --- → mozilla2.0
Blocks: 671947
You need to log in before you can comment on or make changes to this bug.