Closed Bug 861989 Opened 11 years ago Closed 3 years ago

Crash on Heap with asm.js and timeout

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: decoder, Unassigned)

Details

(Keywords: crash, testcase, Whiteboard: [jsbugmon:ignore])

Attachments

(1 file, 3 obsolete files)

Check that the program counter, rather than the fault address, is within the asm.js module. 11 years ago Douglas Crosher [:dougc] 1.57 KB, patch		Details \| Diff \| Splinter Review
Hack patch for qemu to workaround it hitting a SIGSEGV in the signal handling paths. 11 years ago Douglas Crosher [:dougc] 749 bytes, patch		Details \| Diff \| Splinter Review
Revised hack patch for qemu-arm-user. 11 years ago Douglas Crosher [:dougc] 1.22 KB, patch		Details \| Diff \| Splinter Review
Revised patch that better manages the page protection in the qemu signal handler paths. 11 years ago Douglas Crosher [:dougc] 5.76 KB, patch		Details \| Diff \| Splinter Review

Christian Holler (:decoder)

Reporter

Description

•

11 years ago

The following testcase crashes on mozilla-central revision 261d6997d1d1 (run with --ion-eager):


const USE_ASM = "'use asm';";
function asmCompile() {
    return Function.apply(null, arguments);
}
function asmLink(f) {
        return f.apply(null, Array.slice(arguments, 1));
}
var g = asmLink(asmCompile(USE_ASM + "function f() {} function g() { while(1) { f() } } return g"));
timeout(1);
g();

Christian Holler (:decoder)

Reporter

Comment 1

•

11 years ago

Seems to cause an invalid jump:

Program received signal SIGSEGV, Segmentation fault.
0xf5bfd050 in ?? ()
(gdb) x /i $pc
=> 0xf5bfd050:  Cannot access memory at address 0xf5bfd050

Whiteboard: [jsbugmon:ignore]

Christian Holler (:decoder)

Reporter

Updated

•

11 years ago

Flags: needinfo?(nicolas.b.pierron)

Douglas Crosher [:dougc]

Comment 2

•

11 years ago

Attached patch Check that the program counter, rather than the fault address, is within the asm.js module. (obsolete) — Details — Splinter Review

One of the issues with qemu-user is that the faulting address passed to the SIGSEGV handler is incorrect.  It seems more appropriate to be checking the program counter anyway, so here is a potential fix and workaround.

Christian Holler (:decoder)

Reporter

Comment 3

•

11 years ago

Thanks dougc :) I will test the patch asap.

Group: core-security

Flags: needinfo?(nicolas.b.pierron)

Douglas Crosher [:dougc]

Comment 4

•

11 years ago

Attached patch Hack patch for qemu to workaround it hitting a SIGSEGV in the signal handling paths. (obsolete) — Details — Splinter Review

The other issue is that qemu does not emulate protected memory well in the 'user' mode.  AsmJS uses mprotect on the code to force a break.  Unfortunately this trips up qemu internally while actually processing a SIGSEGV.  This workaround tries to remove the protection in these paths, and this will probably work out ok since the handler will remove the protection anyway.  This patch is relative to qemu-linaro-1.4.0-2013.03.

Christian Holler (:decoder)

Reporter

Comment 5

•

11 years ago

(In reply to Douglas Crosher [:dougc] from comment #4)
> Created attachment 737951 [details] [diff] [review]
> Hack patch for qemu to workaround it hitting a SIGSEGV in the signal
> handling paths.

Should I try to forward this patch to the qemu maintainers?

Douglas Crosher [:dougc]

Comment 6

•

11 years ago

(In reply to Christian Holler (:decoder) from comment #5)
> (In reply to Douglas Crosher [:dougc] from comment #4)
> > Created attachment 737951 [details] [diff] [review]
> > Hack patch for qemu to workaround it hitting a SIGSEGV in the signal
> > handling paths.
> 
> Should I try to forward this patch to the qemu maintainers?

No, I would not both them as it's not a comprehensive fix, and
only works for us because it is known that the signal handler
will remove the protection anyway.  Further the problem has
not been analysed well and there is a concern that in some
cases it might remove the protection too aggressively and we
might miss a signal.

Combined these patches allow the example to run correctly.
They also allow more of the asm.js jit-tests to work.
There is still one asm.js jit-test (testTimeout2.js)
that does not break out of its tight loop, and it appears
that for this test that qemu-user just does not hit the
protected code memory - perhaps an optimization for a
tight loop.

I would note that qemu-user is not a very faithful
emulation and slow and would suggest reconsidering
testing using the qemu system emulator and/or real
hardware.

Christian Holler (:decoder)

Reporter

Comment 7

•

11 years ago

(In reply to Douglas Crosher [:dougc] from comment #6)
> 
> I would note that qemu-user is not a very faithful
> emulation and slow and would suggest reconsidering
> testing using the qemu system emulator and/or real
> hardware.

Real hardware is in fact highly unreliable when it comes to fuzz testing, much more unreliable than any emulation (and while emulation is slower, it can easily be scaled on our servers). If the qemu-user problems cannot be fixed, then we will have to switch back to qemu-system emulation which I was previously using. The advantage of qemu-user was primarily to be a little faster and save some memory.

Luke Wagner [:luke]

Comment 8

•

11 years ago

(In reply to Douglas Crosher [:dougc] from comment #2)
I don't think this is a valid fix: we already guarded that module.containsPC(pc) a few statements above which means all faults will end up taking the operation callback, even the out-of-bounds loads/stores.  The reason for writing module.containsPC(faultingAddress) is that we are specifically interested in the case where we fault trying to load pc itself (because we mprotected the code in TriggerOperationCallbackForAsmJSCode.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 9

•

11 years ago

> Real hardware is in fact highly unreliable when it comes to fuzz testing,
> much more unreliable than any emulation (and while emulation is slower, it
> can easily be scaled on our servers). If the qemu-user problems cannot be
> fixed, then we will have to switch back to qemu-system emulation which I was
> previously using. The advantage of qemu-user was primarily to be a little
> faster and save some memory.

While real ARM hardware is slower, jsfunfuzz testing is getting more reliable on a pandaboard as we fix issues - we've had to write something into the Python harness to kill processes starting to use more than ~500MB memory since there was only 1GB on the pands.

Thus, going forward, ARM hardware is going to get faster and we can tweak our harnesses to be more reliable. I'm not sure if qemu will have more of such issues in the future too.

I've also heard from the js team that a Chromebook Series 3 (ARM) with 2GB RAM may be a more powerful option currently.

Douglas Crosher [:dougc]

Comment 10

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #8)
> (In reply to Douglas Crosher [:dougc] from comment #2)
> I don't think this is a valid fix: we already guarded that
> module.containsPC(pc) a few statements above which means all faults will end
> up taking the operation callback, even the out-of-bounds loads/stores.  The
> reason for writing module.containsPC(faultingAddress) is that we are
> specifically interested in the case where we fault trying to load pc itself
> (because we mprotected the code in TriggerOperationCallbackForAsmJSCode.

Good point, thanks.  Yes, there is already a guard checking the
'pc', and we need to differentiate between a protected code fault
and an errant read or write by code running within the asm.js
module.

An errant read or write should not trigger the callback exit - it
might be able to do so once or twice, but we need to prevent it
looping on the fault.

Could I suggest switching back to using the qemu system emulator.

If you really need qemu-user-arm running then it might still be
possible to address this by accounting for the page protection
within a special build of the Mozilla code, or by doing so
in qemu-user.

Douglas Crosher [:dougc]

Comment 11

•

11 years ago

Attached patch Revised hack patch for qemu-arm-user. (obsolete) — Details — Splinter Review

This patch attempts to also set the siginfo si_addr to the program counter when the fault is caused by a PROT_EXEC protection failure.  This makes it unnecessary to patch the asm.js signal handler.

Qemu maintains information about the pages in use and their protection state.  This made it easy to check the protection of the page currently being executed when setting up the signal context and to copy the program counter into the fault address if necessary.  The interrupt and thread safety of the change has not been assessed.

I'll try to write a simple example to reproduce the issues and send it off to the qemu forum.

Attachment #737944 - Attachment is obsolete: true

Attachment #737951 - Attachment is obsolete: true

Douglas Crosher [:dougc]

Comment 12

•

11 years ago

Attached patch Revised patch that better manages the page protection in the qemu signal handler paths. — Details — Splinter Review

A good number of tests were still causing qemu to fail.
This patch is hopefully an incremental improvement.  It
reworks the management of the page protection in the signal
handler paths.  It looks up the page protection state in
the qemu internal page tables and only removes the read
protection if necessary.  Further it restores the page
protection when done so is a more faithful emulation.
Interrupt and thread safety are still a concern, and
there are still a number of tests for which the page
protection mechanism will not interrupt the code and
these tests do not terminate.

Attachment #738321 - Attachment is obsolete: true

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

BMO Automation

Comment 13

•

6 years ago

Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INACTIVE

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

6 years ago

Status: RESOLVED → REOPENED

Resolution: INACTIVE → ---

Negritas Sergiu, Desktop QA

Comment 14

•

3 years ago

Hello! I have tried to reproduce the issue with Firefox 78.7.0esr, 85.0 and 87.0a1 (2021-02-03) but unfortunately I wasn't able to reproduce it.
Christian does this issue still occur? And if so can you please provide a new testcase or some updated steps to reproduce?

Flags: needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 15

•

3 years ago

Likely no longer relevant, closing as WFM.

Status: REOPENED → RESOLVED

Closed: 6 years ago → 3 years ago

Flags: needinfo?(choller)

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.