Closed Bug 1143966 Opened 8 years ago Closed 8 years ago
jsshell startup crash with Ubuntu 12
.04 LTS (gcc 4 .7)
I'm seeing a startup crash on mozilla-central since rev 2b9f5019abf1: Program received signal SIGSEGV, Segmentation fault. js::gc::GetBackgroundAllocKind (kind=js::gc::OBJECT16) at js/src/jsgc.h:219 219 MOZ_ASSERT(!IsBackgroundFinalized(kind)); (gdb) bt #0 js::gc::GetBackgroundAllocKind (kind=js::gc::OBJECT16) at js/src/jsgc.h:219 #1 0x086f6b2d in js::NewObjectWithGivenTaggedProto (cxArg=cxArg@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, proto=proto@entry=..., parentArg=..., allocKind=js::gc::OBJECT16, newKind=newKind@entry=js::SingletonObject) at js/src/jsobj.cpp:1180 #2 0x081537f8 in js::NewObjectWithGivenTaggedProto (cx=cx@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, proto=..., parent=..., newKind=newKind@entry=js::SingletonObject) at js/src/jsobjinlines.h:587 #3 0x08209a88 in NewObjectWithGivenProto (newKind=js::SingletonObject, parent=..., proto=..., clasp=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, cx=0x96b7560) at js/src/jsobjinlines.h:611 #4 js::GlobalObject::createInternal (cx=cx@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>) at js/src/vm/GlobalObject.cpp:241 #5 0x0829e35b in JSRuntime::createSelfHostingGlobal (cx=cx@entry=0x96b7560) at js/src/vm/SelfHosting.cpp:1129 #6 0x082a3022 in JSRuntime::initSelfHosting (this=this@entry=0x968aba0, cx=cx@entry=0x96b7560) at js/src/vm/SelfHosting.cpp:1161 #7 0x086a4b6c in js::NewContext (rt=0x968aba0, stackChunkSize=8192) at js/src/jscntxt.cpp:127 #8 0x086a4bd9 in JS_NewContext (rt=0x968aba0, stackChunkSize=stackChunkSize@entry=8192) at js/src/jsapi.cpp:674 #9 0x0804d9ba in NewContext (rt=<optimized out>) at js/src/shell/js.cpp:5590 #10 0x080684f5 in main (argc=1, argv=0xffffd4c4, envp=0xffffd4cc) at js/src/shell/js.cpp:6337 Bisection says: The first bad revision is: changeset: 233529:2b9f5019abf1 user: Emanuel Hoogeveen date: Fri Mar 13 02:13:00 2015 +0100 summary: Bug 1139552 - Convert js::gc::AllocKind to an enum class and eliminate non-AllocKind indexing. r=terrence The crash only reproduces with the following compiler/build flags combination: Compiler: gcc (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) 4.7.3 Build Flags: --enable-debug --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --target=i686-pc-linux-gnu
I can reproduce this with GCC 4.7 without any changes. I can also reproduce it with GCC 4.8 if I remove the 'inline' keyword from GetBackgroundAllocKind. So it seems the underlying bug is still there, but the inlining saves it. Here's the disassembly (Intel style, because AT&T makes my eyes glaze over): Dump of assembler code for function js::gc::GetBackgroundAllocKind(js::gc::AllocKind): 0x0868c5a0 <+0>: push ebp 0x0868c5a1 <+1>: mov ebp,esp 0x0868c5a3 <+3>: push ebx 0x0868c5a4 <+4>: sub esp,0x14 0x0868c5a7 <+7>: call 0x806902b <__x86.get_pc_thunk.bx> 0x0868c5ac <+12>: add ebx,0xf70a48 0x0868c5b2 <+18>: cmp al,0x16 0x0868c5b4 <+20>: ja 0x868c5cd <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+45> => 0x0868c5b6 <+22>: cmp BYTE PTR [ebx+eax*1-0xbef1fb],0x0 0x0868c5be <+30>: jne 0x868c5f9 <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+89> 0x0868c5c0 <+32>: cmp al,0xb 0x0868c5c2 <+34>: ja 0x868c5d2 <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+50> 0x0868c5c4 <+36>: add esp,0x14 0x0868c5c7 <+39>: add eax,0x1 0x0868c5ca <+42>: pop ebx 0x0868c5cb <+43>: pop ebp 0x0868c5cc <+44>: ret 0x0868c5cd <+45>: call 0x868c4b0 <js::gc::IsBackgroundFinalized(js::gc::AllocKind)> 0x0868c5d2 <+50>: mov DWORD PTR [esp],0xde 0x0868c5d9 <+57>: lea edx,[ebx-0xbf7348] 0x0868c5df <+63>: lea eax,[ebx-0xbf720c] 0x0868c5e5 <+69>: call 0x871bf42 <MOZ_ReportAssertionFailure(char const*, char const*, int)> 0x0868c5ea <+74>: mov DWORD PTR ds:0x0,0xde 0x0868c5f4 <+84>: call 0x804aa20 <abort@plt> 0x0868c5f9 <+89>: mov DWORD PTR [esp],0xdd 0x0868c600 <+96>: lea edx,[ebx-0xbf7348] 0x0868c606 <+102>: lea eax,[ebx-0xc932e1] 0x0868c60c <+108>: call 0x871bf42 <MOZ_ReportAssertionFailure(char const*, char const*, int)> 0x0868c611 <+113>: mov DWORD PTR ds:0x0,0xdd 0x0868c61b <+123>: call 0x804aa20 <abort@plt> From this, I think the problem is that it's using eax instead of al in the offending cmp. *($ebx+$al-0xbef1fb) == 0x100, which would indeed truncate down to 0 and pass the assertion - but *($ebx+$eax-0xbef1fb) cannot be accessed. Unfortunately I have no idea how to work around this except by locally disabling optimizations (I haven't tried this yet, but I imagine it would work). I also couldn't say why GCC is failing in this way, or if it might be doing the same thing in other places.
Oh, interesting. If I wrap GetBackgroundAllocKind in a #pragma GCC optimize ("O0") block, I get a different assertion failure, with kind == 127 (!). So maybe the problematic situation is created higher up the stack.
Disregard that. I think the calling conventions used by -O3 and -O0 just aren't compatible - the AllocKind was reading random data. Using -O1 does work, and |-O2 -fno-expensive-optimizations|. -O3 fails, but none of the exposed options are responsible - I can add all of them to |-O2 -fno-expensive-optimizations|, and it won't change the generated code at all.
This patch fixes the problem for me locally. It might need more work or it might not be the best solution, and I'd like to see if I can get a reduced testcase, but could you confirm that this fixes the crashes for you as well?
Assignee: nobody → emanuel.hoogeveen
Attachment #8579042 - Flags: feedback?(choller)
So what happens here is that it stores the AllocKind on the stack as a uint8_t: 0x08199dc3 <+67>: mov BYTE PTR [esp+0xc],cl Then it reads it back off the stack as a uint32_t, so the high bits are no longer zero: 0x08199ddc <+92>: call 0x86fb710 0x086fb710 <+0>: push ebp 0x086fb711 <+1>: mov ebp,esp 0x086fb719 <+9>: mov eax,DWORD PTR [ebp+0x14] It inlines CanBeFinalizedInBackground(), and does the right thing for the IsBackgroundFinalized() check in there: 0x086fb735 <+37>: movzx edx,al 0x086fb738 <+40>: cmp BYTE PTR [ebx+edx*1-0xc43c1b],0x0 But then it calls into GetBackgroundAllocKind() (not inlined in GCC 4.7), and does the wrong thing: 0x086d9eb6 <+22>: cmp BYTE PTR [ebx+eax*1-0xc43c1b],0x0 I'm not sure what's at fault here: is the second check 'forgetting' that the high bits might not be zero, or is the optimization transforming the load and first (inline) check but forgetting to update the second?
This still reproduces in GCC 4.8.3, but doesn't appear to in GCC 4.9.1. It might just be dodging the bug in a different way though. It'd be really nice to have a reduced testcase, but I'm not sure how to approach reduction (my first naive attempt failed).
Could it be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64037 ?
Well spotted! Yes, it seems to be fixed in 4.8 trunk. It just missed 4.8.4, so I'm trying that now to make sure. As for where that leaves us.. Terrence suggested just changing the underlying type to be word-sized, so this problem can't happen. That will make AllocKind arrays bigger again, but otherwise I don't think there's much of a disadvantage. I don't know whether x64 is sensitive to this, either. That is to say, would we need to make the underlying type intptr_t/uintptr_t/size_t or would int/unsigned suffice? Terrence, what would you prefer? I'm wondering if a signed type might be slightly nicer so we can do things like MOZ_ASSERT(kind >= AllocKind::OBJECT0) again*. * For context, with an unsigned type this is a tautological compare, which makes warn-as-err builds fail, so I removed these assertions in bug 1139552 and added a static_assert that size_t(AllocKind::OBJECT0) == 0.
I'd just remove the type specifier for now and add a fixme referencing this bug. We can revisit in a few months when gcc4.9 is standard. Maybe by then the MSVC issues with range iteration will also be fixed.
I confirmed that the problem still exists on GCC 4.8.4. From the bug jandem linked, I believe it also exists on GCC 4.9.2, though just removing the inline keyword from GetBackgroundAllocKind() wasn't enough to trigger it there. It should be fixed in 4.8.5 and 4.9.3, whenever those are released.
AllocKind is far from the only enum class with a less-than-word-sized underlying type in the tree by the way: https://dxr.mozilla.org/mozilla-central/search?q=%22enum+class%22&case=true&redirect=true I guess those aren't causing known problems though :\
O_O Good point! Some days I wonder how anything works at all, ever.
Comment on attachment 8580271 [details] [diff] [review] Remove type specifier from AllocKind to avoid miscompilations on GCC. Review of attachment 8580271 [details] [diff] [review]: ----------------------------------------------------------------- Great!
Attachment #8580271 - Flags: review?(terrence) → review+
You need to log in before you can comment on or make changes to this bug.