Last Comment Bug 261321 - missing SAVE_SP before js_ConcatStrings call from JSOP_ADD: case in js_Interpret
: missing SAVE_SP before js_ConcatStrings call from JSOP_ADD: case in js_Interpret
Status: RESOLVED FIXED
: crash, fixed-aviary1.0, fixed1.7.5, js1.5
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: P1 critical (vote)
: mozilla1.8alpha4
Assigned To: Brendan Eich [:brendan]
: Phil Schwartau
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-23 19:02 PDT by timeless
Modified: 2005-10-13 19:23 PDT (History)
2 users (show)
brendan: blocking1.7.5+
brendan: blocking‑aviary1.0+
bob: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch against aviary branch (1.29 KB, patch)
2004-09-23 19:13 PDT, Brendan Eich [:brendan]
brendan: review+
brendan: approval‑aviary+
brendan: approval1.7.5+
brendan: approval1.8a4+
Details | Diff | Splinter Review

Description timeless 2004-09-23 19:02:42 PDT
<timeless> we just got a crash at js_AllocGCThing
 <timeless> thing = 0x3
 <timeless> or at least, i think that's what it is... silly optimizer is making 
it really hard for me to figure out what's happening
 <timeless> the crash is dereferencing 0x3, and it looks like that's: + cx-
>runtime->gcFreeList  0x00000003 {next=??? flagp=??? } JSGCThing *
 <timeless> anyway, you made the last spooky changes to the code - For the gory 
details, see Mozilla bug 162779
 <timeless> fred says this might be reproducable
 <timeless> unfortunately it takes about an hour to crash :(
 <timeless> i don't suppose you have any ideas how gcFreeList could end up w/ 3 
instead of 0 or a valid (possibly tagged) pointer?
 <timeless> and i don't suppose you have time w/in the next few days, ambarish 
seems to be eager to pay MF to work on a reproducable crash
 <timeless> this one (minus the fact that our code is complicated, and that it 
takes an hour to trigger) seems like a good candidate to me
 <brendan> aren't you guys using js+xpconnect from multiple threads?
 <brendan> and there are known xpconnect thread safety bugs?
 <timeless> there are, but this code doesn't
 <timeless> really the only time we use threads are for prompts and proxy. no 
one is really using the proxy. i'm fairly certain that there were no dialogs 
popping up during this code execution
 <brendan> this is the kind of bug where it helps to get a short reproducible 
testcase, and run under purify or valgrind
 <brendan> if you can't shorten the testcase, maybe you could use mec to rewind 
from the crash
 <timeless> yeah... too bad i've had absolutely no luck trying to come up with 
anything resembling that
 * brendan doesn't know if mec is still being maintained
 <timeless> what's mec?
... michael elizabeth chastain
 <timeless> qtrace is the new name, i'll keep that in mind for some other time, 
but it's still limited to linux
 <brendan> and you're on windows?
 <timeless> yeah
 <timeless> fwiw i finally got pc2line to tell me exactly which line of js died
 * timeless blesses jsshell
 <timeless> (not that it was really so useful to know that it was this s += c; 
instead of some other one, but...)
 <brendan> you need better tools, lacking a short reproducible test case
 <timeless> yeah
 <timeless> unfortunately DumpJSStack and friends are only on raistlin or debug 
builds, and this is neither
 <timeless> i really need some way to cause this problem to happen a *lot* 
faster
 <timeless> then purify would work, as would anything else
 <timeless> i have this feeling that qtrace (the program formerly known as mec) 
would run out of space if it had to sift through an hour's worth of mozilla 
execution
 <brendan> have you looked closely at the memory that seems corrupted
 <brendan> i mean like 10 words around the 0x3
 <brendan> and tried to see what might be valid, what's not, and what pattern 
there might be to what's not
 <timeless> well, runtime->gcLocksHash->ops looks fine
 <timeless> gcFreeAtoms is 0
 <timeless> which i presume is reasonable
 <timeless> gcLevel is 0
 <timeless> gcNumber is 2822, which i suppose is reasonable
 <timeless> gcPoke is 1
 <timeless> gcRunning is 0
 <brendan> so gcFreeList could be trashed if a free JSGCThing's next link were 
trashed
 <brendan> 3 in next suggests the thing was not free, but contained a JSString 
(whose length member overlays next)
 <timeless> fwiw 3 matches the left side
 <brendan> if so, what 6 chars (3 jschars) were at rt->gcFreeList->flagp
 <brendan> left side?
 <timeless> small excerpt from stack trace:
 <timeless> js_AllocGCThing
 <timeless> js_NewString
 <timeless> js_ConcatStrings <-
 <timeless> in there
 <timeless> + left 0x1521a1b8 {length=3 chars=0x152cd918 "8081" } JSString *
 <timeless> + ls 0x152cd918 "8081" unsigned short *
 <timeless> + rs 0x130c14bc "1/kelev/scripts/kelevcommon.js" unsigned short *
 <brendan> what is *rt->gcFreeList
 <timeless> checking
 <timeless> err, cx->runtime->gcFreeList is 0x3
 <timeless> * that doesn't work :)
 <brendan> oh, ok
 <brendan> but that strongly suggests that left was on gcFreeList, then 
allocated from it
 <brendan> where in js_AllocGCThing is the stack?
 * brendan thinks this is about to become clear
 <timeless> warning, i don't quite trust devenv here
 <timeless>         if (!thing) {
 <timeless>             if (!tried_gc) {
 <timeless>                 rt->gcPoke = JS_TRUE;
 <timeless>                 js_GC(cx, GC_KEEP_ATOMS | GC_ALREADY_LOCKED);
 <timeless>                 tried_gc = JS_TRUE;
 <timeless> ^^ it claims tried_gc = JS_TRUE
 <timeless> *but*
 <brendan> so is js_GC on the stack, or has it just returned
 <timeless> it isn't on the stack
 <brendan> i bet it just returned
 <timeless> the line preceding it definitely claims that
 <timeless> problem is, i can't figure out what code it's actually 
executing 'next' which would dereference that 3
 <brendan> is this stack showing a crash?
 <timeless> ok if i paste 8 or so lines of disassembly?
 <brendan> if so, at what pc?
 <brendan> yeah, sure
 <timeless>                 rt->gcPoke = JS_TRUE;
 <timeless>                 js_GC(cx, GC_KEEP_ATOMS | GC_ALREADY_LOCKED);
 <timeless> 00AEA696  push        5    
 <timeless> 00AEA698  push        dword ptr [cx] 
 <timeless> 00AEA69B  mov         byte ptr [esi+60h],1 
 <timeless> 00AEA69F  call        js_GC (0AE9E7Eh) 
 <timeless> 00AEA6A4  pop         ecx  
 <timeless> 00AEA6A5  pop         ecx  
 <brendan> and the source line the debugger thinks the crash occurred at
 <timeless>                 tried_gc = JS_TRUE;
 <timeless> 00AEA6A6  mov         dword ptr [tried_gc],1 
 <timeless> 00AEA6AD  mov         edx,dword ptr [esi+44h] 
 <timeless> 00AEA6B0  cmp         edx,ebx 
 <timeless> 00AEA6B2  je          retry (0AEA648h) 
 <timeless> 00AEA6B4  mov         eax,dword ptr [edx] 
 <brendan> what is $pc or $eip
 <timeless> it's blaming that last line
 <timeless> edx is 3
 <brendan> gimme the register in hex
 <brendan> eip
 <timeless>  eip,x 0x00aea6b4 unsigned long
 <timeless> oh right
 <timeless> Unhandled exception at 0x00aea6b4 (js3250.dll) in mozilla.exe: 
0xC0000005: Access violation reading location 0x00000003.
 <brendan> things are reordered
 <timeless> yeah
 <timeless> very much so
 <brendan> what's eax?
 <timeless> mind if i just paste all the basic registers?
 <brendan> in hex, please
 <timeless> EAX = 00000000 EBX = 00000000 ECX = 009C45D0 EDX = 00000003 ESI = 
009E8008 
 <timeless> EDI = 00000001 EIP = 00AEA6B4 ESP = 0012E760 EBP = 0012E770 EFL = 
00000206 
 <brendan> go to js_ConcatStrings
 <brendan> what is left in hex?
 <timeless> i already pasted that a while ago, but here goes:
 <timeless> + left 0x1521a1b8 {length=3 chars=0x152cd918 "8081" } JSString *
 <timeless> (ignore everything paste the {, it's the debugger hinting about 
other bits)
 <brendan> thanks for repasting
 <brendan> lemme think
 <brendan> 1023 is 3ff -- what is *(JSGCPageInfo *)0x1521a000) ?
 <brendan> er, lose the final )
 <brendan> if your debugger doesn't know what JSGCPageInfo is, just dump two 32-
bit words at 0x1521a000
 <brendan> you there?
 <timeless> sorry
 <timeless> + *(JSGCPageInfo *)0x1521a000 {split=0x15219000 "" 
flags=0x15218ea0 "e" } JSGCPageInfo
 <brendan> ok, lemme calculate
 <brendan> what is the byte at 15218ED7
 <brendan> in hex, of course
 <timeless>  *(unsigned char*) 0x15218ED7,x 0x03 unsigned char
 <brendan> a mutable string
 <brendan> a live mutable string
 <brendan> the 3 here is not related to the 3 in rt->gcFreeList, btw
 <timeless> ok, good to know
 <brendan> can you give me the top five frames of the stack?
 <timeless>   js3250.dll!js_AllocGCThing(JSContext * cx=0x01acb5b8, unsigned 
int flags=3)  Line 530 + 0xe C
 <timeless>   js3250.dll!js_NewString(JSContext * cx=0x01acb5b8, unsigned short 
* chars=0x152cd918, unsigned int length=4, unsigned int gcflag=2)  Line 2439 C
 <timeless> > js3250.dll!js_ConcatStrings(JSContext * cx=0x01acb5b8, JSString * 
left=0x1521a1b8, JSString * right=0x152cd918)  Line 182 + 0xc C
 <timeless>   js3250.dll!js_Interpret(JSContext * cx=0x002a01a8, long * 
result=0x152404e0)  Line 2417 + 0xe C
 <timeless>   js3250.dll!js_Invoke(JSContext * cx=0x00000000, unsigned int 
argc=2752936, unsigned int flags=354682080)  Line 958 + 0xa C
 <brendan> go to the js_Interpret frame
 <brendan> print sp and fp->sp
 <brendan> and confirm that we're in the JSOP_ADD: case
 <timeless> we are in JSOP_ADD
 <timeless> + fp->sp 0x115dcfac long *
 <timeless> + sp 0x002a01a8 long *
 <brendan> something's screwy -- optimized
 <brendan> what's cx->fp->sp
 * timeless knew that was coming
 <timeless> + cx->fp->sp 0x002a0218 long *
Comment 1 Brendan Eich [:brendan] 2004-09-23 19:10:46 PDT
Patch momentarily. I'm going to self-approve for 1.8a4, 1.7, and aviary.  This
is the very model of a safe (virtual-)one-line fix.

/be
Comment 2 Brendan Eich [:brendan] 2004-09-23 19:13:24 PDT
Created attachment 159927 [details] [diff] [review]
patch against aviary branch

This patch will apply to trunk and 1.7 cleanly.

/be
Comment 3 Brendan Eich [:brendan] 2004-09-23 19:14:48 PDT
Comment on attachment 159927 [details] [diff] [review]
patch against aviary branch

Obviously correct.

/be
Comment 4 Brendan Eich [:brendan] 2004-09-23 19:18:04 PDT
Fixed everywhere.

Big thanks to timeless for digging into this!

/be

Note You need to log in before you can comment on or make changes to this bug.