Closed Bug 1269705 Opened 8 years ago Closed 7 years ago

Crash [@ js::ToPrimitiveSlow] with OOM and SIGBUS

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 909094
Tracking Status
firefox47 --- fix-optional
firefox48 --- fix-optional
firefox49 + wontfix
firefox50 + wontfix

People

(Reporter: decoder, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, testcase, Whiteboard: [jsbugmon:])

Crash Data

The following testcase crashes on mozilla-central revision 77cead2cd203 (build with --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --target=i686-pc-linux-gnu --disable-tests --enable-simulator=arm --disable-debug, run with --fuzzing-safe --thread-count=2 --disable-oom-functions --ion-eager --baseline-eager --ion-offthread-compile=off --ion-extra-checks --arm-asm-nop-fill=1 --arm-sim-icache-checks --arm-hwcap=vfp):

function testSetTypedFloat32Array(k) {
    var ar = new Float32Array(8);
    var t = k + 555;
    var L = ar[k+7] = t & 5;
    if (t !== -1) {
  testSetTypedFloat32Array('#2: y = 1; y /= -1; y === -1. Actual: ' + (t));
}
}
try {
for (var i = 0; i <= 10; i++) {
    testSetTypedFloat32Array(0);
}
} catch(exc1) {}
function assertEqArr(a1, a2) {}
var order = "";
var o1 = { toString: function() { order += "b"; return print +  (this      )     -  98        >=  a1    ; }};
var o2 = { valueOf:  function() { order += "a"; return 1; }};
var res = "xyz-xyz".split(o1, o2);



Backtrace:

Program terminated with signal SIGBUS, Bus error.
#0  js::ToPrimitiveSlow (cx=cx@entry=0xf7275040, preferredType=preferredType@entry=JSTYPE_VOID, vp=vp@entry=...) at js/src/jsobj.cpp:3087
(gdb) bt 16
#0  js::ToPrimitiveSlow (cx=cx@entry=0xf7275040, preferredType=preferredType@entry=JSTYPE_VOID, vp=vp@entry=...) at js/src/jsobj.cpp:3087
#1  0x084e3c8e in ToPrimitive (vp=..., cx=0xf7275040) at js/src/jsobj.h:1034
#2  AddOperation (res=..., rhs=..., lhs=..., cx=0xf7275040) at js/src/vm/Interpreter.cpp:1334
#3  js::AddValues (cx=cx@entry=0xf7275040, lhs=lhs@entry=..., rhs=rhs@entry=..., res=res@entry=...) at js/src/vm/Interpreter.cpp:4410
#4  0x082c5eeb in js::jit::DoBinaryArithFallback (cx=0xf7275040, payload=payload@entry=0x0, stub_=stub_@entry=0x78780010, lhs=lhs@entry=..., rhs=rhs@entry=..., ret=ret@entry=...)
    at js/src/jit/SharedIC.cpp:928
#5  0x08370f89 in js::jit::Simulator::softwareInterrupt (this=0xf721c000, instr=0xf7202d34) at js/src/jit/arm/Simulator-arm.cpp:2380
#6  0x0837130d in js::jit::Simulator::decodeType7 (this=this@entry=0xf721c000, instr=instr@entry=0xf7202d34) at js/src/jit/arm/Simulator-arm.cpp:3502
#7  0x083727fc in js::jit::Simulator::instructionDecode (this=this@entry=0xf721c000, instr=instr@entry=0xf7202d34)
    at js/src/jit/arm/Simulator-arm.cpp:4424
[...]
(gdb) info reg
eax            0xffa01158       -6287016
ecx            0xf51f4298       -182500712
edx            0xf51f4298       -182500712
ebx            0x94e3418        156120088
esp            0xffa00fa0       0xffa00fa0
ebp            0xffa01158       0xffa01158
esi            0xf7275040       -148418496
edi            0x25     37
eip            0x83faa78        0x83faa78 <js::ToPrimitiveSlow(JSContext*, JSType, JS::MutableHandle<JS::Value>)+24>
eflags         0x10286  [ PF SF IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99
(gdb) x /i $pc
=> 0x83faa78 <js::ToPrimitiveSlow(JSContext*, JSType, JS::MutableHandle<JS::Value>)+24>:        call   0x807eb60 <__x86.get_pc_thunk.bx>


This could be a hit on the stack space boundary (call instruction + esp looks like it could be near end of address space), but I'm not entirely sure. Marking s-s until investigated by a developer.
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:bisect]
JSBugMon: Cannot process bug: Unable to automatically reproduce, please track manually.
Whiteboard: [jsbugmon:bisect] → [jsbugmon:]
Steve, can you take a look at this?
Flags: needinfo?(sphink)
Tracking for 49 - are other versions affected?
Definitely looking like stack exhaustion so far. Which is strange, since there's a JS_CHECK_RECURSION 3 frames up. Looking...
There are some crashes on 47 and 48 showing on crash-stats, and this issue is still unrated, so if we come up with a fix then please ask for sec-approval and uplift. Thanks
Assignee: nobody → sphink
This particular test will only reproduce on a 32-bit arm simulator build for me. It does not reproduce on either 64-bit or 32-bit linux.

https://crash-stats.mozilla.com/signature/?_sort=-date&signature=js%3A%3AToPrimitiveSlow#aggregations is 82% on FennecAndroid (which makes sense, and indicates that it's not just a simulator issue), but also 17% on desktop firefox. But none of those are SIGBUS. (And note that those are marked with bug 1248343, which is a dupe of bug 1232685, which was fixed in 47. And yet, these crashes are in 47 and later.)

Anyway, I can reproduce on arm-sim, so I'll look into it there.
Bug 1286943 is related -- the script in comment 0 produces a different error message on an arm-sim build than a 32-bit linux build. So far, it seems like the crash in this bug occurs after you've OOMed once, and then you try to report the error. (The arm-sim error appears to be correct, btw; it matches a linux64 build. But I think this bug requires 32 bit.)

Also note that I can trigger the bug much, much faster by doing:

  ulimit -S -s 128 # limit stack size
  ulimit -S -v 500000 # limit memory size

(the -S isn't actually needed, it's just there to prevent you from painting yourself into a corner.)
Ok, documenting my current state of confusion:

What is happening is that the test case has an infinite loop that eats up the C stack. It ends up seg faulting inside of jemalloc, when it tries to write to an address just below the end of the stack. I do not understand why it doesn't just extend the stack.

ulimit -s says the stack is max 8MB.
/proc/<pid>/maps says the stack is currently fff7e000-ffffe000 (512KB)
JS_CHECK_RECURSION would cut things off at around 0xffefd001, which corresponds to a 1027KB stack.
The stack pointer just before going into the crashing code is 0xfff7e0b0, very near the bottom of the mapped range.
The stack pointer at the point of the crash is 0xfff7dfd0, which is indeed in the unmapped range.
gdb won't let me read or write that address either.

I don't know why the kernel isn't growing the stack automatically when that address is touched. There must be something else that is limiting the stack to 512KB?

Needinfo? glandium because he's the only person I can think of who might have a clue about this type of craziness. glandium: do you know of additional limits?
Flags: needinfo?(sphink) → needinfo?(mh+mozilla)
Oh. Interesting. /proc/<pid>/limits lists a hard and soft stack limit of 512KB. Never mind, it looks like I set the limit myself and was running ulimit -s from the wrong shell. :(
Flags: needinfo?(mh+mozilla)
So the easy reproductions I have were from artificially limiting the stack space, which unfortunately makes them invalid. Because I can get a very similar-looking crash when ulimit -s says 8MB. Unfortunately, that generates a truncated 4GB core file that gdb can't make much sense of.
What does /proc/$pid/maps look like when the crash happens?
Ok, so the thing that I can reproduce is another instance of bug 909094: I have ulimit -s set to the default of 8MB. It grows the stack to about 1MB (I'm printing out stack sizes in CallJSNative every time it grows) and the seg faults. Note that this bug refers to SIGBUS, not SIGSEGV, and I'm not sure when you would see a SIGBUS instead, but it seems plausible.

If I wait until the stack grows to 800KB or so and then dump out /proc/self/maps, I see a piece of the heap getting in the way of the stack:

f7ffd000-f7ffe000 rw-p 00022000 fd:00 3984528                            /usr/lib/ld-2.22.so
f8000000-fff00000 rw-p 00000000 00:00 0 
fff37000-ffffe000 rw-p 00000000 00:00 0                                  [stack]

This matches the behavior of the test case. It first OOMs on the heap, thus creating enough pressure for the kernel to decide grab a piece of the stack memory. Then it recurses, growing the stack until it hits the mmapped region.

Given that it appears we are seeing this in the field, it seems like we may need to more seriously consider the mitigations discussed in that bug. Note that this is probably primarily an Android issue, since it has a 32-bit address space as well as low physical memory. So it's annoying if prefaulting the stack is going to commit all of its memory.

Perhaps we could completely defeat the kernel's normal stack mechanism, for the main JS running thread at least, by mmapping the whole stack (up to the rlimit, or up to our own JS_CHECK_RECURSION limits plus a big chunk of slop space) and madvise(MADV_DONTNEED) it?
Marking this fix-optional for now but if this gets a rating and needs uplift, please let relman know or reset the flag to affected.
Group: javascript-core-security
Hi Steve, is this something that will be fixed in time for 50 or 51? It's a pretty low volume crash so I have tagged it as fix-optional.
Flags: needinfo?(sphink)
This is a pretty low volume crash on current release 49.x and also on beta 50. We are a week away from RC, too late, this is now a wontfix for 50.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(sphink)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.