Closed Bug 1143966 Opened 7 years ago Closed 7 years ago

jsshell startup crash with Ubuntu 12.04 LTS (gcc 4.7)

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: decoder, Assigned: ehoogeveen)

Details

(Keywords: crash, Whiteboard: [fuzzblocker])

Attachments

(1 file, 1 obsolete file)

I'm seeing a startup crash on mozilla-central since rev 2b9f5019abf1:


Program received signal SIGSEGV, Segmentation fault.
js::gc::GetBackgroundAllocKind (kind=js::gc::OBJECT16) at  js/src/jsgc.h:219
219	    MOZ_ASSERT(!IsBackgroundFinalized(kind));
(gdb) bt 
#0  js::gc::GetBackgroundAllocKind (kind=js::gc::OBJECT16) at  js/src/jsgc.h:219
#1  0x086f6b2d in js::NewObjectWithGivenTaggedProto (cxArg=cxArg@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, proto=proto@entry=..., 
    parentArg=..., allocKind=js::gc::OBJECT16, newKind=newKind@entry=js::SingletonObject) at  js/src/jsobj.cpp:1180
#2  0x081537f8 in js::NewObjectWithGivenTaggedProto (cx=cx@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, proto=..., parent=..., 
    newKind=newKind@entry=js::SingletonObject) at  js/src/jsobjinlines.h:587
#3  0x08209a88 in NewObjectWithGivenProto (newKind=js::SingletonObject, parent=..., proto=..., clasp=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>, cx=0x96b7560)
    at  js/src/jsobjinlines.h:611
#4  js::GlobalObject::createInternal (cx=cx@entry=0x96b7560, clasp=clasp@entry=0x9651cc0 <JSRuntime::createSelfHostingGlobal(JSContext*)::shgClass>)
    at  js/src/vm/GlobalObject.cpp:241
#5  0x0829e35b in JSRuntime::createSelfHostingGlobal (cx=cx@entry=0x96b7560) at  js/src/vm/SelfHosting.cpp:1129
#6  0x082a3022 in JSRuntime::initSelfHosting (this=this@entry=0x968aba0, cx=cx@entry=0x96b7560) at  js/src/vm/SelfHosting.cpp:1161
#7  0x086a4b6c in js::NewContext (rt=0x968aba0, stackChunkSize=8192) at  js/src/jscntxt.cpp:127
#8  0x086a4bd9 in JS_NewContext (rt=0x968aba0, stackChunkSize=stackChunkSize@entry=8192) at  js/src/jsapi.cpp:674
#9  0x0804d9ba in NewContext (rt=<optimized out>) at  js/src/shell/js.cpp:5590
#10 0x080684f5 in main (argc=1, argv=0xffffd4c4, envp=0xffffd4cc) at  js/src/shell/js.cpp:6337


Bisection says:

The first bad revision is:
changeset:   233529:2b9f5019abf1
user:        Emanuel Hoogeveen
date:        Fri Mar 13 02:13:00 2015 +0100
summary:     Bug 1139552 - Convert js::gc::AllocKind to an enum class and eliminate non-AllocKind indexing. r=terrence


The crash only reproduces with the following compiler/build flags combination:

Compiler: gcc (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) 4.7.3
Build Flags: --enable-debug --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --target=i686-pc-linux-gnu
I can reproduce this with GCC 4.7 without any changes. I can also reproduce it with GCC 4.8 if I remove the 'inline' keyword from GetBackgroundAllocKind. So it seems the underlying bug is still there, but the inlining saves it. Here's the disassembly (Intel style, because AT&T makes my eyes glaze over):

Dump of assembler code for function js::gc::GetBackgroundAllocKind(js::gc::AllocKind):
   0x0868c5a0 <+0>:	push   ebp
   0x0868c5a1 <+1>:	mov    ebp,esp
   0x0868c5a3 <+3>:	push   ebx
   0x0868c5a4 <+4>:	sub    esp,0x14
   0x0868c5a7 <+7>:	call   0x806902b <__x86.get_pc_thunk.bx>
   0x0868c5ac <+12>:	add    ebx,0xf70a48
   0x0868c5b2 <+18>:	cmp    al,0x16
   0x0868c5b4 <+20>:	ja     0x868c5cd <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+45>
=> 0x0868c5b6 <+22>:	cmp    BYTE PTR [ebx+eax*1-0xbef1fb],0x0
   0x0868c5be <+30>:	jne    0x868c5f9 <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+89>
   0x0868c5c0 <+32>:	cmp    al,0xb
   0x0868c5c2 <+34>:	ja     0x868c5d2 <js::gc::GetBackgroundAllocKind(js::gc::AllocKind)+50>
   0x0868c5c4 <+36>:	add    esp,0x14
   0x0868c5c7 <+39>:	add    eax,0x1
   0x0868c5ca <+42>:	pop    ebx
   0x0868c5cb <+43>:	pop    ebp
   0x0868c5cc <+44>:	ret    
   0x0868c5cd <+45>:	call   0x868c4b0 <js::gc::IsBackgroundFinalized(js::gc::AllocKind)>
   0x0868c5d2 <+50>:	mov    DWORD PTR [esp],0xde
   0x0868c5d9 <+57>:	lea    edx,[ebx-0xbf7348]
   0x0868c5df <+63>:	lea    eax,[ebx-0xbf720c]
   0x0868c5e5 <+69>:	call   0x871bf42 <MOZ_ReportAssertionFailure(char const*, char const*, int)>
   0x0868c5ea <+74>:	mov    DWORD PTR ds:0x0,0xde
   0x0868c5f4 <+84>:	call   0x804aa20 <abort@plt>
   0x0868c5f9 <+89>:	mov    DWORD PTR [esp],0xdd
   0x0868c600 <+96>:	lea    edx,[ebx-0xbf7348]
   0x0868c606 <+102>:	lea    eax,[ebx-0xc932e1]
   0x0868c60c <+108>:	call   0x871bf42 <MOZ_ReportAssertionFailure(char const*, char const*, int)>
   0x0868c611 <+113>:	mov    DWORD PTR ds:0x0,0xdd
   0x0868c61b <+123>:	call   0x804aa20 <abort@plt>

From this, I think the problem is that it's using eax instead of al in the offending cmp. *($ebx+$al-0xbef1fb) == 0x100, which would indeed truncate down to 0 and pass the assertion - but *($ebx+$eax-0xbef1fb) cannot be accessed.

Unfortunately I have no idea how to work around this except by locally disabling optimizations (I haven't tried this yet, but I imagine it would work). I also couldn't say why GCC is failing in this way, or if it might be doing the same thing in other places.
Oh, interesting. If I wrap GetBackgroundAllocKind in a #pragma GCC optimize ("O0") block, I get a different assertion failure, with kind == 127 (!). So maybe the problematic situation is created higher up the stack.
Disregard that. I think the calling conventions used by -O3 and -O0 just aren't compatible - the AllocKind was reading random data.

Using -O1 does work, and |-O2 -fno-expensive-optimizations|. -O3 fails, but none of the exposed options are responsible - I can add all of them to |-O2 -fno-expensive-optimizations|, and it won't change the generated code at all.
This patch fixes the problem for me locally. It might need more work or it might not be the best solution, and I'd like to see if I can get a reduced testcase, but could you confirm that this fixes the crashes for you as well?
Assignee: nobody → emanuel.hoogeveen
Attachment #8579042 - Flags: feedback?(choller)
So what happens here is that it stores the AllocKind on the stack as a uint8_t:

0x08199dc3 <+67>:	mov    BYTE PTR [esp+0xc],cl

Then it reads it back off the stack as a uint32_t, so the high bits are no longer zero:

0x08199ddc <+92>:	call   0x86fb710
0x086fb710 <+0>:	push   ebp
0x086fb711 <+1>:	mov    ebp,esp
0x086fb719 <+9>:	mov    eax,DWORD PTR [ebp+0x14]

It inlines CanBeFinalizedInBackground(), and does the right thing for the IsBackgroundFinalized() check in there:

0x086fb735 <+37>:	movzx  edx,al
0x086fb738 <+40>:	cmp    BYTE PTR [ebx+edx*1-0xc43c1b],0x0

But then it calls into GetBackgroundAllocKind() (not inlined in GCC 4.7), and does the wrong thing:

0x086d9eb6 <+22>:	cmp    BYTE PTR [ebx+eax*1-0xc43c1b],0x0

I'm not sure what's at fault here: is the second check 'forgetting' that the high bits might not be zero, or is the optimization transforming the load and first (inline) check but forgetting to update the second?
This still reproduces in GCC 4.8.3, but doesn't appear to in GCC 4.9.1. It might just be dodging the bug in a different way though. It'd be really nice to have a reduced testcase, but I'm not sure how to approach reduction (my first naive attempt failed).
Well spotted! Yes, it seems to be fixed in 4.8 trunk. It just missed 4.8.4, so I'm trying that now to make sure.

As for where that leaves us.. Terrence suggested just changing the underlying type to be word-sized, so this problem can't happen. That will make AllocKind arrays bigger again, but otherwise I don't think there's much of a disadvantage.

I don't know whether x64 is sensitive to this, either. That is to say, would we need to make the underlying type intptr_t/uintptr_t/size_t or would int/unsigned suffice? Terrence, what would you prefer?

I'm wondering if a signed type might be slightly nicer so we can do things like MOZ_ASSERT(kind >= AllocKind::OBJECT0) again*.

* For context, with an unsigned type this is a tautological compare, which makes warn-as-err builds fail, so I removed these assertions in bug 1139552 and added a static_assert that size_t(AllocKind::OBJECT0) == 0.
Flags: needinfo?(terrence)
I'd just remove the type specifier for now and add a fixme referencing this bug. We can revisit in a few months when gcc4.9 is standard. Maybe by then the MSVC issues with range iteration will also be fixed.
Flags: needinfo?(terrence)
I confirmed that the problem still exists on GCC 4.8.4. From the bug jandem linked, I believe it also exists on GCC 4.9.2, though just removing the inline keyword from GetBackgroundAllocKind() wasn't enough to trigger it there. It should be fixed in 4.8.5 and 4.9.3, whenever those are released.
Attachment #8579042 - Attachment is obsolete: true
Attachment #8579042 - Flags: feedback?(choller)
Attachment #8580271 - Flags: review?(terrence)
Status: NEW → ASSIGNED
AllocKind is far from the only enum class with a less-than-word-sized underlying type in the tree by the way:
https://dxr.mozilla.org/mozilla-central/search?q=%22enum+class%22&case=true&redirect=true

I guess those aren't causing known problems though :\
O_O

Good point! Some days I wonder how anything works at all, ever.
Comment on attachment 8580271 [details] [diff] [review]
Remove type specifier from AllocKind to avoid miscompilations on GCC.

Review of attachment 8580271 [details] [diff] [review]:
-----------------------------------------------------------------

Great!
Attachment #8580271 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/1d9012617957
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.