The default bug view has changed. See this FAQ.

Crash [@ js::ToPrimitiveSlow] with OOM and SIGBUS

NEW
Assigned to

Status

()

Core
JavaScript Engine
--
critical
11 months ago
5 months ago

People

(Reporter: decoder, Assigned: sfink, NeedInfo)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs, {crash, regression, testcase})

Trunk
ARM
Linux
crash, regression, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 fix-optional, firefox48 fix-optional, firefox49+ fix-optional, firefox50+ wontfix)

Details

(Whiteboard: [jsbugmon:], crash signature)

(Reporter)

Description

11 months ago
The following testcase crashes on mozilla-central revision 77cead2cd203 (build with --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --target=i686-pc-linux-gnu --disable-tests --enable-simulator=arm --disable-debug, run with --fuzzing-safe --thread-count=2 --disable-oom-functions --ion-eager --baseline-eager --ion-offthread-compile=off --ion-extra-checks --arm-asm-nop-fill=1 --arm-sim-icache-checks --arm-hwcap=vfp):

function testSetTypedFloat32Array(k) {
    var ar = new Float32Array(8);
    var t = k + 555;
    var L = ar[k+7] = t & 5;
    if (t !== -1) {
  testSetTypedFloat32Array('#2: y = 1; y /= -1; y === -1. Actual: ' + (t));
}
}
try {
for (var i = 0; i <= 10; i++) {
    testSetTypedFloat32Array(0);
}
} catch(exc1) {}
function assertEqArr(a1, a2) {}
var order = "";
var o1 = { toString: function() { order += "b"; return print +  (this      )     -  98        >=  a1    ; }};
var o2 = { valueOf:  function() { order += "a"; return 1; }};
var res = "xyz-xyz".split(o1, o2);



Backtrace:

Program terminated with signal SIGBUS, Bus error.
#0  js::ToPrimitiveSlow (cx=cx@entry=0xf7275040, preferredType=preferredType@entry=JSTYPE_VOID, vp=vp@entry=...) at js/src/jsobj.cpp:3087
(gdb) bt 16
#0  js::ToPrimitiveSlow (cx=cx@entry=0xf7275040, preferredType=preferredType@entry=JSTYPE_VOID, vp=vp@entry=...) at js/src/jsobj.cpp:3087
#1  0x084e3c8e in ToPrimitive (vp=..., cx=0xf7275040) at js/src/jsobj.h:1034
#2  AddOperation (res=..., rhs=..., lhs=..., cx=0xf7275040) at js/src/vm/Interpreter.cpp:1334
#3  js::AddValues (cx=cx@entry=0xf7275040, lhs=lhs@entry=..., rhs=rhs@entry=..., res=res@entry=...) at js/src/vm/Interpreter.cpp:4410
#4  0x082c5eeb in js::jit::DoBinaryArithFallback (cx=0xf7275040, payload=payload@entry=0x0, stub_=stub_@entry=0x78780010, lhs=lhs@entry=..., rhs=rhs@entry=..., ret=ret@entry=...)
    at js/src/jit/SharedIC.cpp:928
#5  0x08370f89 in js::jit::Simulator::softwareInterrupt (this=0xf721c000, instr=0xf7202d34) at js/src/jit/arm/Simulator-arm.cpp:2380
#6  0x0837130d in js::jit::Simulator::decodeType7 (this=this@entry=0xf721c000, instr=instr@entry=0xf7202d34) at js/src/jit/arm/Simulator-arm.cpp:3502
#7  0x083727fc in js::jit::Simulator::instructionDecode (this=this@entry=0xf721c000, instr=instr@entry=0xf7202d34)
    at js/src/jit/arm/Simulator-arm.cpp:4424
[...]
(gdb) info reg
eax            0xffa01158       -6287016
ecx            0xf51f4298       -182500712
edx            0xf51f4298       -182500712
ebx            0x94e3418        156120088
esp            0xffa00fa0       0xffa00fa0
ebp            0xffa01158       0xffa01158
esi            0xf7275040       -148418496
edi            0x25     37
eip            0x83faa78        0x83faa78 <js::ToPrimitiveSlow(JSContext*, JSType, JS::MutableHandle<JS::Value>)+24>
eflags         0x10286  [ PF SF IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99
(gdb) x /i $pc
=> 0x83faa78 <js::ToPrimitiveSlow(JSContext*, JSType, JS::MutableHandle<JS::Value>)+24>:        call   0x807eb60 <__x86.get_pc_thunk.bx>


This could be a hit on the stack space boundary (call instruction + esp looks like it could be near end of address space), but I'm not entirely sure. Marking s-s until investigated by a developer.

Updated

11 months ago
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:bisect]

Comment 1

11 months ago
JSBugMon: Cannot process bug: Unable to automatically reproduce, please track manually.

Updated

11 months ago
Whiteboard: [jsbugmon:bisect] → [jsbugmon:]
Steve, can you take a look at this?
Flags: needinfo?(sphink)
Tracking for 49 - are other versions affected?
tracking-firefox49: --- → +
(Assignee)

Comment 4

10 months ago
Definitely looking like stack exhaustion so far. Which is strange, since there's a JS_CHECK_RECURSION 3 frames up. Looking...
There are some crashes on 47 and 48 showing on crash-stats, and this issue is still unrated, so if we come up with a fix then please ask for sec-approval and uplift. Thanks
status-firefox47: --- → fix-optional
status-firefox48: --- → affected
status-firefox50: --- → affected
tracking-firefox50: --- → +
Assignee: nobody → sphink
(Assignee)

Comment 6

8 months ago
This particular test will only reproduce on a 32-bit arm simulator build for me. It does not reproduce on either 64-bit or 32-bit linux.

https://crash-stats.mozilla.com/signature/?_sort=-date&signature=js%3A%3AToPrimitiveSlow#aggregations is 82% on FennecAndroid (which makes sense, and indicates that it's not just a simulator issue), but also 17% on desktop firefox. But none of those are SIGBUS. (And note that those are marked with bug 1248343, which is a dupe of bug 1232685, which was fixed in 47. And yet, these crashes are in 47 and later.)

Anyway, I can reproduce on arm-sim, so I'll look into it there.
(Assignee)

Comment 7

8 months ago
Bug 1286943 is related -- the script in comment 0 produces a different error message on an arm-sim build than a 32-bit linux build. So far, it seems like the crash in this bug occurs after you've OOMed once, and then you try to report the error. (The arm-sim error appears to be correct, btw; it matches a linux64 build. But I think this bug requires 32 bit.)

Also note that I can trigger the bug much, much faster by doing:

  ulimit -S -s 128 # limit stack size
  ulimit -S -v 500000 # limit memory size

(the -S isn't actually needed, it's just there to prevent you from painting yourself into a corner.)
(Assignee)

Comment 8

8 months ago
Ok, documenting my current state of confusion:

What is happening is that the test case has an infinite loop that eats up the C stack. It ends up seg faulting inside of jemalloc, when it tries to write to an address just below the end of the stack. I do not understand why it doesn't just extend the stack.

ulimit -s says the stack is max 8MB.
/proc/<pid>/maps says the stack is currently fff7e000-ffffe000 (512KB)
JS_CHECK_RECURSION would cut things off at around 0xffefd001, which corresponds to a 1027KB stack.
The stack pointer just before going into the crashing code is 0xfff7e0b0, very near the bottom of the mapped range.
The stack pointer at the point of the crash is 0xfff7dfd0, which is indeed in the unmapped range.
gdb won't let me read or write that address either.

I don't know why the kernel isn't growing the stack automatically when that address is touched. There must be something else that is limiting the stack to 512KB?

Needinfo? glandium because he's the only person I can think of who might have a clue about this type of craziness. glandium: do you know of additional limits?
Flags: needinfo?(sphink) → needinfo?(mh+mozilla)
(Assignee)

Comment 9

8 months ago
Oh. Interesting. /proc/<pid>/limits lists a hard and soft stack limit of 512KB. Never mind, it looks like I set the limit myself and was running ulimit -s from the wrong shell. :(
Flags: needinfo?(mh+mozilla)
(Assignee)

Comment 10

8 months ago
So the easy reproductions I have were from artificially limiting the stack space, which unfortunately makes them invalid. Because I can get a very similar-looking crash when ulimit -s says 8MB. Unfortunately, that generates a truncated 4GB core file that gdb can't make much sense of.
What does /proc/$pid/maps look like when the crash happens?
(Assignee)

Comment 12

8 months ago
Ok, so the thing that I can reproduce is another instance of bug 909094: I have ulimit -s set to the default of 8MB. It grows the stack to about 1MB (I'm printing out stack sizes in CallJSNative every time it grows) and the seg faults. Note that this bug refers to SIGBUS, not SIGSEGV, and I'm not sure when you would see a SIGBUS instead, but it seems plausible.

If I wait until the stack grows to 800KB or so and then dump out /proc/self/maps, I see a piece of the heap getting in the way of the stack:

f7ffd000-f7ffe000 rw-p 00022000 fd:00 3984528                            /usr/lib/ld-2.22.so
f8000000-fff00000 rw-p 00000000 00:00 0 
fff37000-ffffe000 rw-p 00000000 00:00 0                                  [stack]

This matches the behavior of the test case. It first OOMs on the heap, thus creating enough pressure for the kernel to decide grab a piece of the stack memory. Then it recurses, growing the stack until it hits the mmapped region.

Given that it appears we are seeing this in the field, it seems like we may need to more seriously consider the mitigations discussed in that bug. Note that this is probably primarily an Android issue, since it has a 32-bit address space as well as low physical memory. So it's annoying if prefaulting the stack is going to commit all of its memory.

Perhaps we could completely defeat the kernel's normal stack mechanism, for the main JS running thread at least, by mmapping the whole stack (up to the rlimit, or up to our own JS_CHECK_RECURSION limits plus a big chunk of slop space) and madvise(MADV_DONTNEED) it?
(Assignee)

Updated

8 months ago
Depends on: 909094
Marking this fix-optional for now but if this gets a rating and needs uplift, please let relman know or reset the flag to affected.
status-firefox48: affected → fix-optional
status-firefox49: affected → fix-optional
Group: javascript-core-security

Comment 14

6 months ago
Hi Steve, is this something that will be fixed in time for 50 or 51? It's a pretty low volume crash so I have tagged it as fix-optional.
status-firefox50: affected → fix-optional
Flags: needinfo?(sphink)

Comment 15

5 months ago
This is a pretty low volume crash on current release 49.x and also on beta 50. We are a week away from RC, too late, this is now a wontfix for 50.
status-firefox50: fix-optional → wontfix
You need to log in before you can comment on or make changes to this bug.