Closed
Bug 861989
Opened 11 years ago
Closed 3 years ago
Crash on Heap with asm.js and timeout
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: decoder, Unassigned)
Details
(Keywords: crash, testcase, Whiteboard: [jsbugmon:ignore])
Attachments
(1 file, 3 obsolete files)
The following testcase crashes on mozilla-central revision 261d6997d1d1 (run with --ion-eager): const USE_ASM = "'use asm';"; function asmCompile() { return Function.apply(null, arguments); } function asmLink(f) { return f.apply(null, Array.slice(arguments, 1)); } var g = asmLink(asmCompile(USE_ASM + "function f() {} function g() { while(1) { f() } } return g")); timeout(1); g();
Reporter | ||
Comment 1•11 years ago
|
||
Seems to cause an invalid jump: Program received signal SIGSEGV, Segmentation fault. 0xf5bfd050 in ?? () (gdb) x /i $pc => 0xf5bfd050: Cannot access memory at address 0xf5bfd050
Whiteboard: [jsbugmon:ignore]
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(nicolas.b.pierron)
Comment 2•11 years ago
|
||
One of the issues with qemu-user is that the faulting address passed to the SIGSEGV handler is incorrect. It seems more appropriate to be checking the program counter anyway, so here is a potential fix and workaround.
Reporter | ||
Comment 3•11 years ago
|
||
Thanks dougc :) I will test the patch asap.
Group: core-security
Flags: needinfo?(nicolas.b.pierron)
Comment 4•11 years ago
|
||
The other issue is that qemu does not emulate protected memory well in the 'user' mode. AsmJS uses mprotect on the code to force a break. Unfortunately this trips up qemu internally while actually processing a SIGSEGV. This workaround tries to remove the protection in these paths, and this will probably work out ok since the handler will remove the protection anyway. This patch is relative to qemu-linaro-1.4.0-2013.03.
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Douglas Crosher [:dougc] from comment #4) > Created attachment 737951 [details] [diff] [review] > Hack patch for qemu to workaround it hitting a SIGSEGV in the signal > handling paths. Should I try to forward this patch to the qemu maintainers?
Comment 6•11 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #5) > (In reply to Douglas Crosher [:dougc] from comment #4) > > Created attachment 737951 [details] [diff] [review] > > Hack patch for qemu to workaround it hitting a SIGSEGV in the signal > > handling paths. > > Should I try to forward this patch to the qemu maintainers? No, I would not both them as it's not a comprehensive fix, and only works for us because it is known that the signal handler will remove the protection anyway. Further the problem has not been analysed well and there is a concern that in some cases it might remove the protection too aggressively and we might miss a signal. Combined these patches allow the example to run correctly. They also allow more of the asm.js jit-tests to work. There is still one asm.js jit-test (testTimeout2.js) that does not break out of its tight loop, and it appears that for this test that qemu-user just does not hit the protected code memory - perhaps an optimization for a tight loop. I would note that qemu-user is not a very faithful emulation and slow and would suggest reconsidering testing using the qemu system emulator and/or real hardware.
Reporter | ||
Comment 7•11 years ago
|
||
(In reply to Douglas Crosher [:dougc] from comment #6) > > I would note that qemu-user is not a very faithful > emulation and slow and would suggest reconsidering > testing using the qemu system emulator and/or real > hardware. Real hardware is in fact highly unreliable when it comes to fuzz testing, much more unreliable than any emulation (and while emulation is slower, it can easily be scaled on our servers). If the qemu-user problems cannot be fixed, then we will have to switch back to qemu-system emulation which I was previously using. The advantage of qemu-user was primarily to be a little faster and save some memory.
Comment 8•11 years ago
|
||
(In reply to Douglas Crosher [:dougc] from comment #2) I don't think this is a valid fix: we already guarded that module.containsPC(pc) a few statements above which means all faults will end up taking the operation callback, even the out-of-bounds loads/stores. The reason for writing module.containsPC(faultingAddress) is that we are specifically interested in the case where we fault trying to load pc itself (because we mprotected the code in TriggerOperationCallbackForAsmJSCode.
Comment 9•11 years ago
|
||
> Real hardware is in fact highly unreliable when it comes to fuzz testing,
> much more unreliable than any emulation (and while emulation is slower, it
> can easily be scaled on our servers). If the qemu-user problems cannot be
> fixed, then we will have to switch back to qemu-system emulation which I was
> previously using. The advantage of qemu-user was primarily to be a little
> faster and save some memory.
While real ARM hardware is slower, jsfunfuzz testing is getting more reliable on a pandaboard as we fix issues - we've had to write something into the Python harness to kill processes starting to use more than ~500MB memory since there was only 1GB on the pands.
Thus, going forward, ARM hardware is going to get faster and we can tweak our harnesses to be more reliable. I'm not sure if qemu will have more of such issues in the future too.
I've also heard from the js team that a Chromebook Series 3 (ARM) with 2GB RAM may be a more powerful option currently.
Comment 10•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #8) > (In reply to Douglas Crosher [:dougc] from comment #2) > I don't think this is a valid fix: we already guarded that > module.containsPC(pc) a few statements above which means all faults will end > up taking the operation callback, even the out-of-bounds loads/stores. The > reason for writing module.containsPC(faultingAddress) is that we are > specifically interested in the case where we fault trying to load pc itself > (because we mprotected the code in TriggerOperationCallbackForAsmJSCode. Good point, thanks. Yes, there is already a guard checking the 'pc', and we need to differentiate between a protected code fault and an errant read or write by code running within the asm.js module. An errant read or write should not trigger the callback exit - it might be able to do so once or twice, but we need to prevent it looping on the fault. Could I suggest switching back to using the qemu system emulator. If you really need qemu-user-arm running then it might still be possible to address this by accounting for the page protection within a special build of the Mozilla code, or by doing so in qemu-user.
Comment 11•11 years ago
|
||
This patch attempts to also set the siginfo si_addr to the program counter when the fault is caused by a PROT_EXEC protection failure. This makes it unnecessary to patch the asm.js signal handler. Qemu maintains information about the pages in use and their protection state. This made it easy to check the protection of the page currently being executed when setting up the signal context and to copy the program counter into the fault address if necessary. The interrupt and thread safety of the change has not been assessed. I'll try to write a simple example to reproduce the issues and send it off to the qemu forum.
Attachment #737944 -
Attachment is obsolete: true
Attachment #737951 -
Attachment is obsolete: true
Comment 12•11 years ago
|
||
A good number of tests were still causing qemu to fail. This patch is hopefully an incremental improvement. It reworks the management of the page protection in the signal handler paths. It looks up the page protection state in the qemu internal page tables and only removes the read protection if necessary. Further it restores the page protection when done so is a more faithful emulation. Interrupt and thread safety are still a concern, and there are still a number of tests for which the page protection mechanism will not interrupt the code and these tests do not terminate.
Attachment #738321 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 13•6 years ago
|
||
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Updated•6 years ago
|
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Comment 14•3 years ago
|
||
Hello! I have tried to reproduce the issue with Firefox 78.7.0esr, 85.0 and 87.0a1 (2021-02-03) but unfortunately I wasn't able to reproduce it.
Christian does this issue still occur? And if so can you please provide a new testcase or some updated steps to reproduce?
Flags: needinfo?(choller)
Reporter | ||
Comment 15•3 years ago
|
||
Likely no longer relevant, closing as WFM.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 3 years ago
Flags: needinfo?(choller)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•