Closed Bug 572428 Opened 14 years ago Closed 14 years ago

Crash [@ js_CallGCMarker]

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta2+
status1.9.2 --- unaffected
status1.9.1 --- unaffected

People

(Reporter: gkw, Assigned: igor)

References

Details

(Keywords: crash, regression, testcase, Whiteboard: [ccbr][sg:critical] fixed-in-tracemonkey)

Crash Data

Attachments

(1 file, 7 obsolete files)

try {
    evalcx("")
    f
} catch(e) {}
gc()
for each(b in <x/>) {}
for each(l in [0, 0]) {
    gc()
}

crashes js opt shell on TM tip without -j at js_CallGCMarker.

s-s and assuming [sg:critical?] because it seems to access memory at a scary location (0x780065).

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00780065
0x0005322d in js_CallGCMarker ()
(gdb) bt
#0  0x0005322d in js_CallGCMarker ()
#1  0x00111cf9 in js_TraceXML ()
#2  0x00053290 in js_CallGCMarker ()
#3  0x00053635 in js::ConservativeGCStackMarker::markWord ()
#4  0x00053a8b in js::ConservativeGCStackMarker::markRange ()
#5  0x000544e9 in js_TraceRuntime ()
#6  0x000548c4 in js_GC ()
#7  0x0000f9ba in JS_GC ()
#8  0x000061e3 in GC ()
#9  0x00060b5c in js_Interpret ()
#10 0x000677d6 in js_Execute ()
#11 0x000109cc in JS_ExecuteScript ()
#12 0x00005046 in Process ()
#13 0x000091e7 in shell ()
#14 0x000095f8 in main ()
(gdb) x/i $eip
0x5322d <_Z15js_CallGCMarkerP8JSTracerPvj+637>: mov    (%eax),%eax
(gdb) x/b $eax
0x780065:       Cannot access memory at address 0x780065
This is awesome. This is exactly the same crash we are seeing on the tinderbox. Should be easy to fix.
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   43310:7ee77bc4bc8a
user:        Igor Bukanov
date:        Fri Jun 04 16:22:28 2010 +0200
summary:     bug 516832 - conservative stack scanning. This is based on the initial work by Anreas Gal and Gregor Wagner. r=brendan,gal
Blocks: 516832
Assignee: general → igor
I cannot reproduce the crash on Linux debug/optimized 32/64 bit. So this patch adds extra asserts to check that the things are right. Perhaps this will capture the problem on Mac debug build?
fwiw, comment #0 was reproduced in a 32-bit optimized Mac build when passed in as a CLI argument on TM changesets 7ee77bc4bc8a and c287fcea6684:

$ ./js-opt-32-tm-darwin 572428.js 
Segmentation fault
(In reply to comment #3)
> Created an attachment (id=451711) [details]
> extra assert coverage in jsgc.cpp
> 
> I cannot reproduce the crash on Linux debug/optimized 32/64 bit. So this patch
> adds extra asserts to check that the things are right. Perhaps this will
> capture the problem on Mac debug build?

Nope, the problem doesn't show up in a Mac debug build, either with or without this extra assert patch.
blocking2.0: --- → ?
(In reply to comment #4)
> fwiw, comment #0 was reproduced in a 32-bit optimized Mac build when passed in
> as a CLI argument on TM changesets 7ee77bc4bc8a and c287fcea6684:
> 
> $ ./js-opt-32-tm-darwin 572428.js 
> Segmentation fault

Is it a thread-safe build of js shell?
Gary: could you in debugger at the moment of the crash go to the js_TraceXML frame and do something like print *xml? Also could your print *ainfo in the markWord frame?
Gary: could you also run the test using the latest TM tip under valgrind to see if it would also crash. If yes, could you also apply the patch from https://bugzilla.mozilla.org/show_bug.cgi?id=572678 and run under valgrind again (note that you would need to pass --enable-valgrind with that patch when compiling the build that crashes).
(In reply to comment #6)
> Is it a thread-safe build of js shell?

Don't think so. I'm compiling with `CC="gcc-4.2 -arch i386" CXX="g++-4.2 -arch i386" HOST_CC="gcc-4.2" HOST_CXX="g++-4.2" RANLIB=ranlib AR=ar AS=$CC LD=ldSTRIP="strip -x -S" CROSS_COMPILE=1sh ../configure --target=i386-apple-darwin8.0.0 --disable-optimize --enable-debug` on Mac 10.6..

(In reply to comment #7)
> Gary: could you in debugger at the moment of the crash go to the js_TraceXML
> frame and do something like print *xml? Also could your print *ainfo in the
> markWord frame?

Nope. I can't reproduce in a debug build. Opt shells don't seem to have symbols, unless another configuration setting, whose name currently eludes me, is passed in, if I'm not wrong.

There isn't yet Valgrind on 10.6 for me to test it out...
Gary: could you try to compile an optimized build and run TM tip with this patch applied? It just enables asserts in optimized build for jsgc.cpp. Hopefully some of those asserts will be triggered.
(In reply to comment #10)
> Created an attachment (id=452057) [details]
> enabling asserts for jsgc.cpp in optimized builds
> 
> Gary: could you try to compile an optimized build and run TM tip with this
> patch applied? It just enables asserts in optimized build for jsgc.cpp.
> Hopefully some of those asserts will be triggered.

There you go:

Assertion failure: trc->debugPrinter || trc->debugPrintArg, at ../jsgc.cpp:2370

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00109a92 in JS_Assert ()
(gdb) bt
#0  0x00109a92 in JS_Assert ()
#1  0x000563ab in js_CallGCMarker ()
#2  0x0002036a in js_pinned_atom_tracer ()
#3  0x0002dabd in JS_DHashTableEnumerate ()
#4  0x0002029c in js_TraceAtomState ()
#5  0x00057c45 in js_TraceRuntime ()
#6  0x000580f7 in GCUntilDone ()
#7  0x00059652 in js_GC ()
#8  0x0000fa3a in JS_GC ()
#9  0x000061b3 in GC ()
#10 0x00064d2c in js_Interpret ()
#11 0x0006c126 in js_Execute ()
#12 0x000104cc in JS_ExecuteScript ()
#13 0x00005016 in Process ()
#14 0x000091b7 in shell ()
#15 0x000095c8 in main ()
(In reply to comment #11)
> There you go:

Sorry for a bad assert-enabling patch. It is not that simple to enable asserts just for jsgc.cpp as those depends on many other places cooperating with them.
Gary: could you try this patch instead of the bogus previous one? This one saves initial xml instances and then tries to mark them conservatively. The idea is to check if it is really XML that causes problems.

If this would still crash in an optimized build, could you give a stack?
Attachment #451711 - Attachment is obsolete: true
Attachment #452057 - Attachment is obsolete: true
(In reply to comment #13)
> Created an attachment (id=452265) [details]
> another attempt to find out the problem
> 
> Gary: could you try this patch instead of the bogus previous one? This one
> saves initial xml instances and then tries to mark them conservatively. The
> idea is to check if it is really XML that causes problems.
> 
> If this would still crash in an optimized build, could you give a stack?

I don't seem to get anything with a shell build using this patch..:

$ ./js-opt-32-tm-darwin 572428.js
$
This crash seems to be impacting jsfunfuzz on Mac pretty badly... :(
Gary: could you provide a disassembly of jsgc.cpp for the optimized build? You can get one via something like:

cd shell build dir
objdump -d jsgc.o > output.S
(In reply to comment #16)
> Gary: could you provide a disassembly of jsgc.cpp for the optimized build? You
> can get one via something like:
> 
> cd shell build dir
> objdump -d jsgc.o > output.S

There isn't objdump in Mac OS X 10.6.x though. I somehow couldn't really get MacPorts' version of binutils to work. :(
(In reply to comment #16)
> Gary: could you provide a disassembly of jsgc.cpp for the optimized build? You
> can get one via something like:
> 
> cd shell build dir
> objdump -d jsgc.o > output.S

Can gdb somehow get some information for you?
(In reply to comment #18)
> Can gdb somehow get some information for you?

I think that was a wrong idea. So please forget about it for now.
Attached patch another patch for testing (obsolete) — Splinter Review
Since with the previous patch there was no crash it may indicate that the bug or miscompilation happens in jsxml.cpp So this patch just removes the changes for jsgc.cpp.

Gary, could you try this one?
(In reply to comment #20)
> Created an attachment (id=452595) [details]
> another patch for testing
> 
> Since with the previous patch there was no crash it may indicate that the bug
> or miscompilation happens in jsxml.cpp So this patch just removes the changes
> for jsgc.cpp.
> 
> Gary, could you try this one?

(again, no debug info because this is an opt build)

Yup, this crashes too:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00680063
0x000534fd in js_CallGCMarker ()
(gdb) bt
#0  0x000534fd in js_CallGCMarker ()
#1  0x00108d19 in js_TraceXML ()
#2  0x00053560 in js_CallGCMarker ()
#3  0x00053905 in js::ConservativeGCStackMarker::markWord ()
#4  0x00053d3b in js::ConservativeGCStackMarker::markRange ()
#5  0x00054799 in js_TraceRuntime ()
#6  0x00054b74 in js_GC ()
#7  0x0000fdba in JS_GC ()
#8  0x00006533 in GC ()
#9  0x00060d18 in js_Interpret ()
#10 0x00068216 in js_Execute ()
#11 0x0001084c in JS_ExecuteScript ()
#12 0x00005396 in Process ()
#13 0x00009537 in shell ()
#14 0x00009948 in main ()
(gdb) x/i $eip
0x534fd <_Z15js_CallGCMarkerP8JSTracerPvj+637>: mov    (%eax),%eax
(gdb) x/b $eax
0x680063:       Cannot access memory at address 0x680063
(In reply to comment #21)
> Yup, this crashes too:

So the crash happens when XML pointer is found on the stack but not via some array. Interesting.
Attached patch another patch for testing (obsolete) — Splinter Review
Gary: does this patch crash?
Attachment #452265 - Attachment is obsolete: true
Attachment #452595 - Attachment is obsolete: true
(In reply to comment #23)
> Created an attachment (id=452655) [details]
> another patch for testing
> 
> Gary: does this patch crash?

$ ./js-opt-32-tm-darwin ~/Desktop/2interesting/572428.js 
Cannot find xml thing
Abort trap
Attached patch another patch for testing (obsolete) — Splinter Review
Gary: does this crash? If so, what it prints?
Attachment #452655 - Attachment is obsolete: true
Attached patch another patch for testing (obsolete) — Splinter Review
Gary: please ignore the last patch, try this one instead.
Attachment #452659 - Attachment is obsolete: true
(In reply to comment #26)
> Created an attachment (id=452660) [details]
> another patch for testing
> 
> Gary: please ignore the last patch, try this one instead.

$ ./js-opt-32-tm-darwin ~/Desktop/2interesting/572428.js 
new_xml=0x70a000
gcNumber=1
finalized_xml=0x70a000
new_xml=0x705000
new_xml=0x705048
new_xml=0x705090
gcNumber=2
finalized_xml=0x705048
finalized_xml=0x705090
gcNumber=3
Cannot find xml thing 0x705fc0
Abort trap
Gary: thanks a lot for your efforts! 

Here is a bug: I have missed that if the free list in the arena is empty, it does not mean that all things in the arena are occupied. The free list is assembled on the first allocation from the arena based on the mark bitmap from the last allocation. Thus if there were no allocations from the arena prior the GC, the free list will be empty yet the the arena may contain free things. 

So to fix the bug it is necessary somehow to preserve the information from the mark bitmap at least to the point of the conservative stack scan.
Whiteboard: [ccbr][sg:critical?] → [ccbr][sg:critical]
(In reply to comment #28)
> Here is a bug: I have missed that if the free list in the arena is empty, it
> does not mean that all things in the arena are occupied. 

This is bogus - the free list for finalizable gc things are assembled during the finalization.
Ok, I have found the reason for the bug. sizeof(JSXML) is 72 bytes on i386. That is not a power of two. As such there is unused space at the end of the arena between the last JSXML and the end of the arena but the code does not take that into account and tries to treat as JSXML that area that starts after the last JSXML in the arena.

This tells me that we definitely should increase the test coverage for the conservative scanner. A simple thing thing to do is to start to rely on it. Another possibility is to add self-fuzzing code that would tries to mark in addition to the stack a various random words calculated from the address of recent allocations plus some random offset.
Attached patch v1Splinter Review
Gary: could you check that this fixes the problem and fuzz it little bit?
Attachment #452660 - Attachment is obsolete: true
Attachment #452718 - Flags: review?
Attachment #452718 - Flags: review? → review?(wagnerg)
(In reply to comment #31)
> Created an attachment (id=452718) [details]
> v1
> 
> Gary: could you check that this fixes the problem and fuzz it little bit?

Quick test shows that this seems to fix the problem (no more crash). A little fuzzing is now underway...
Comment on attachment 452718 [details] [diff] [review]
v1

>@@ -1127,27 +1127,40 @@ ConservativeGCStackMarker::dumpConservat
> void
> ConservativeGCStackMarker::markWord(jsuword w)
> {
> #define RETURN(x) do { CONSERVATIVE_METER(stats.x++); return; } while (0)
> 
>     JSRuntime *rt = trc->context->runtime;
> 
>     CONSERVATIVE_METER(stats.unique++);
> 
>-    /* If this is an odd addresses or a tagged integer, skip it. */
>+    /*
>+     * We assume that the compiler never uses sub-word alignment to store
>+     * pointers and does not tag pointers on its own. Thus we exclude words
>+     * with JSVAL_INT (any odd words) or JSVAL_SPECIAL tags as they never
>+     * points to GC things. We also exclude words with a double tag that point

point to GC things.

>+     * into non-double GC thing arena. But, for example, on 32-bit platforms

"into a non-double"   or "thing arenas"

>+     * we cannot exclude a pointer into an object arena tagged with
>+     * JSVAL_STRING. The latter is 4 and a compiler can store a pointer
>+     * not to the object but rather a pointer to its second field.
>+     */
Attachment #452718 - Flags: review?(wagnerg) → review+
http://hg.mozilla.org/tracemonkey/rev/e13ca20af0c5
Whiteboard: [ccbr][sg:critical] → [ccbr][sg:critical] fixed-in-tracemonkey
(In reply to comment #32)
> (In reply to comment #31)
> > Created an attachment (id=452718) [details] [details]
> > v1
> > 
> > Gary: could you check that this fixes the problem and fuzz it little bit?
> 
> Quick test shows that this seems to fix the problem (no more crash). A little
> fuzzing is now underway...

A little fuzz-test overnight seems to show no real problems (yet).
Looks like Linux64 Bo/Bd (JSAPI, make check) tests have been failing since this checkin.
http://hg.mozilla.org/tracemonkey/rev/08a8426164c4 - followup to fix a bad typo
blocking2.0: ? → beta2+
http://hg.mozilla.org/mozilla-central/rev/e13ca20af0c5
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@ js_CallGCMarker]
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: