Closed Bug 1283121 Opened 8 years ago Closed 5 years ago

wasm: implement signal handlers for unaligned floating-point accesses on ARM

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox50 --- affected

People

(Reporter: bbouvier, Assigned: lth)

References

Details

Attachments

(2 files, 4 obsolete files)

See https://bugzilla.mozilla.org/show_bug.cgi?id=1268024#c32 for more context. Emulating an integer memory access in the signal handler for unaligned access is not too hard; for float memory access, it's harder as we have no way to retrieve the content of the FPU from the signal handler without using ptrace, it seems.

As a temporary workaround, we forbid compilation to everything below ARMv7 and assume ARMv7 does the right thing. There are modes for which v7 won't do the right thing too, so this needs to be fixed later.
Some notes from discussion on IRC with sunfish:

15:08 	<sunfish>	bbouvier: my notes from last time I looked into this indicate that armv6 also has a mode where it traps on alignment
15:09 	<sunfish>	bbouvier: so it'd only be armv5 and armv6/7 in legacy mode that we wouldn't support
15:10 	<sunfish>	which i think we're totally ok with
15:16 	<sunfish>	oh, never mind. there are three levels here.
15:17 	<sunfish>	armv5 silently masks off the low bits, so it's totally unusable
15:17 	<sunfish>	armv6 introduced trapping on misaligned accesses, which is theoretically supportable, but is the thing we're talking about
15:18 	<sunfish>	and then there's the-hardware-takes-care-of-it
15:18 	<sunfish>	or, the-os-takes-care-of-it

I've checked on the rpi: by default, the kernel takes care of it for integer memory accesses (/proc/cpu/alignment set to 2).
Priority: -- → P3
Test case: This causes wasm/spec/memory_redundancy.wast to fail (the program has unaligned f32 loads and stores) on my dev board [1].

We'll need this for shipping on ARM, I expect.

[1] Nvidia Jetson TK1 4xTegra @ 2.3GHz, Ubuntu 14.
Priority: P3 → P2
Test case: This causes wasm/spec/float_memory.wast to iloop - the program has unaligned floating point accesses and the signal handler does not advance the PC when it handles the SIGBUS resulting from the unaligned access.  Same system as comment 3.
Test case: wasm/memory.js, as comment 4.
This is something I've seen in the past: according to the mode you're using to handle unaligned accesses (at the kernel level), or whether this is the simulator or a real device, the program might segfault at the expected PC or at PC+/-4. I haven't clearly identified the actual reason for this behavior...
Test case (sort of): wasm/errors.js.  This is similar to comment 4 and comment 5 but the FP accesses are all  out of bounds, they don't need to be emulated but we need to invoke the OOB handler instead.
There's the worrisome possibility that /proc/cpu/alignment is set to 0, which means the kernel will quietly swallow the exception and return garbage on an unaligned access.

https://www.kernel.org/doc/Documentation/arm/mem_alignment
https://gcc.gnu.org/ml/gcc/2010-05/msg00142.html
https://wiki.debian.org/ArmEabiFixes#word_accesses_must_be_aligned_to_a_multiple_of_their_size

One might hope that, with newer hardware supporting unaligned accesses so that the kernel does not have to, this is not an issue, but we probably want to check this on startup -- not that that's really sufficient, I expect the value can change at any time.
Not a fix, but a stopgap solution that allows testing: throw on unaligned accesses that trap, do not hang.
Attachment #8829347 - Flags: review?(bbouvier)
Comment on attachment 8829347 [details] [diff] [review]
bug1283121-throw-on-unaligned.patch

Review of attachment 8829347 [details] [diff] [review]:
-----------------------------------------------------------------

wow, very good catch.

To make sure this never happens again, and since this exit and the interrupt/OOB ones are always generated, and none of these is the first one to be generated, can you please try to add an assertion in CodeSegment::create that the linkData.{interrupt,outOfBounds,unalignedAccess}Offset are not 0? (we have plenty of tests with empty module, so if at least one offset is 0, that will show up during testing)
Attachment #8829347 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #11)
> Comment on attachment 8829347 [details] [diff] [review]
> bug1283121-throw-on-unaligned.patch
> 
> Review of attachment 8829347 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> wow, very good catch.
> 
> To make sure this never happens again, and since this exit and the
> interrupt/OOB ones are always generated, and none of these is the first one
> to be generated, can you please try to add an assertion in
> CodeSegment::create that the
> linkData.{interrupt,outOfBounds,unalignedAccess}Offset are not 0? (we have
> plenty of tests with empty module, so if at least one offset is 0, that will
> show up during testing)

Sure thing.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a9a465bd55ed2b5f39cd75be7e1e33cd07cc632
Bug 1283121 - Wasm ARM: Throw on trapping unaligned access, do not hang.  r=bbouvier
Keywords: leave-open
I think we're not targeting Wasm on ARM in earnest until the next iteration?  Also, I understand there's some discussion going on around how much effort we want to spend here.  So P3 but useful not to lose sight of this.
Priority: P2 → P3
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Component: JavaScript Engine: JIT → Javascript: Web Assembly

The Google Pixel 2 traps on unaligned FP accesses in ARMv7 mode, see bug 1517351 comment 15 et seq. THat hardware is mainstream enough that we probably have to deal with this problem. Increasing priority.

Blocks: 1517351
Status: REOPENED → NEW
Priority: P3 → P2

ARM-32 failures on Pixel 2 and Moto G5:

wasm/spec/address.wast.js fails because of an unaligned f32 load
wasm/spec/call_indirect.wast.js fails because of an unaligned f64 store
wasm/spec/memory_redundancy.wast.js fails because of an unaligned f32 load
wasm/spec/float_memory.wast.js fails because of an unaligned f32 load

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4bc15a1b5c336927fb3987fb9314f59b66204425&selectedJob=221078434

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4188c079bd4a2c8f10236333f717c4752d654930

These tests are all currently disabled on ARM for this reason.

Blocks: 1510659
Blocks: 1513231

Fairly grubby code to emulate trapping FP loads. Completely untested, not even test compiled, but everything used here is part of the Linux ABI in principle, so this is not entirely insane.

Assignee: nobody → lhansen

This compiles but does not run (it segfaults w/o much information when it should have emulated the instruction). Hard to debug locally unless you have a device that traps on unaligned FP accesses...

Attachment #9035664 - Attachment is obsolete: true

Wohoo, I have a device that has the problem.

(This is a very mainstream system, BTW, a Samsung Galaxy Tab S2, so the problem with trapping unaligned FP accesses appears to be widespread in the wild.)

Also my LG/Google Nexus 5x with Android 8.1. I think at this point the inescapable conclusion is that this problem is common or possibly even the majority case.

Simple test case (temporary home): https://axis-of-eval.org/sandbox/oob-arm.html, runs quickly and the result is reported in the window.

Works locally on my Nexus 5X. Needs:

  • cleanup & maybe generalization
  • test cases
  • testing on various android versions (the VFP_MAGIC may have changed at
    various points in the past)
  • scrutiny
  • maybe additional ifdeffery so that we don't expose ARM Linux to it, just
    Android
Attachment #9035917 - Attachment is obsolete: true

This is green on the Pixel 2 on try with my new test cases and after reinstating the test cases that were disabled. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=9b21e231d93f13c7abde89a02f1542cf6f082ec0)

Primary worries for reviews:

  • Not enough guards / risk of double-faulting because I've forgotten to check some precondition. I'm pretty sure I got "this is a known wasm instance with a PC that is for sure valid and a memory address that will for sure be in-bound when I read the data" right, but is there more?
  • Unnecessary kernel dependencies
  • Overbroad kernel assumptions
  • "Unknown unknowns"

Bob, can you put a couple of Moto G5s back in the pool for me to use? I'll want to run jittest-10 in both opt and debug mode, just to be sure.

Attachment #9036332 - Attachment is obsolete: true
Flags: needinfo?(bob)
Attachment #9036554 - Flags: review?(luke)
Attachment #9036554 - Flags: review?(jseward)

Done, I am on pto today. If you can kick these off soon before I have to leave, we can make sure they are working properly. Otherwise I'll check in around 9AM PT.

Flags: needinfo?(bob)
Comment on attachment 9036554 [details] [diff] [review]
bug1283121-emulate-fp-accesses-v4.patch

Review of attachment 9036554 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if at least |pc| is asserted to be 0 % 4.

::: js/src/wasm/WasmInstance.cpp
@@ +1143,5 @@
>           lastByteOffset < memoryMappedSize();
>  }
>  
> +bool Instance::memoryAccessInBounds(uint8_t* addr,
> +                                    unsigned numBytes) const {

Does this correctly handle (reject) the case where [addr, +numBytes) wraps around the end of the address space?  I asumme that a wasm memory itself can't wrap around the end of the address space.

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +573,5 @@
> +#if defined(__linux__) && defined(__arm__)
> +    // wasmLoadImpl() and wasmStoreImpl() in MacroAssembler-arm.cpp emit plain
> +    // unconditional VLDR and VSTR instructions that do not use the PC as the
> +    // base register.
> +    uint32_t instr = *(uint32_t*)pc;

This appears to assume that |pc| refers to an ARM-encoded instruction.  Please assert this; possibly release-assert so that we can't end up doing memory accesses on behalf of who-knows-what Thumb instruction by accident.

@@ +580,5 @@
> +    bool isVSTR = masked == 0x0D000A00;
> +    DebugOnly<bool> oob = true;
> +    if (isVLDR || isVSTR) {
> +      bool isUnconditional = (instr >> 28) == 0xE;
> +      MOZ_RELEASE_ASSERT(isUnconditional);

To release-assert this (and below) is to imply that we can only get here as a result of a fault with a wasm-jit generated instruction.  If we take a fault elsewhere in Gecko and end up in here, the release assert could cause the system to fail unnecessarily.  So do we have that guarantee?  If not, it would seem safer to simply ignore unexpected instructions.
Attachment #9036554 - Flags: review?(jseward) → review+

(In reply to Julian Seward [:jseward] from comment #26)

Comment on attachment 9036554 [details] [diff] [review]
bug1283121-emulate-fp-accesses-v4.patch

Review of attachment 9036554 [details] [diff] [review]:

r+ if at least |pc| is asserted to be 0 % 4.

::: js/src/wasm/WasmInstance.cpp
@@ +1143,5 @@

      lastByteOffset < memoryMappedSize();

}

+bool Instance::memoryAccessInBounds(uint8_t* addr,

  •                                unsigned numBytes) const {
    

Does this correctly handle (reject) the case where [addr, +numBytes) wraps
around the end of the address space? I asumme that a wasm memory itself
can't wrap around the end of the address space.

The wasm memory can't wrap but it's possible this method should be more paranoid. Will investigate.

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +573,5 @@

+#if defined(linux) && defined(arm)

  • // wasmLoadImpl() and wasmStoreImpl() in MacroAssembler-arm.cpp emit plain
  • // unconditional VLDR and VSTR instructions that do not use the PC as the
  • // base register.
  • uint32_t instr = (uint32_t)pc;

This appears to assume that |pc| refers to an ARM-encoded instruction.

Yes.

Please assert this; possibly release-assert so that we can't end up doing
memory accesses on behalf of who-knows-what Thumb instruction by accident.

I can assert pc % 4 == 0 at most and will do that, just because. For the instructions of interest, Arm and Thumb have the same encoding (the only difference is that Arm allows the instruction to be conditional), so further tests on arm-vs-thumb are not interesting.

@@ +580,5 @@

  • bool isVSTR = masked == 0x0D000A00;
  • DebugOnly<bool> oob = true;
  • if (isVLDR || isVSTR) {
  •  bool isUnconditional = (instr >> 28) == 0xE;
    
  •  MOZ_RELEASE_ASSERT(isUnconditional);
    

To release-assert this (and below) is to imply that we can only get here as
a result of a fault with a wasm-jit generated instruction.

That is what the earlier guards are supposed to guarantee - that the faulting instruction is in a wasm code segment.

If we take a
fault elsewhere in Gecko and end up in here, the release assert could cause
the system to fail unnecessarily. So do we have that guarantee? If not, it
would seem safer to simply ignore unexpected instructions.

If we fail the earlier guards then we return false and the superstructure in this file will invoke the next handler.

Memo to self: in JSContext.cpp there is code that checks for certain Android versions (to disable the JIT in some cases). It might be possible to reduce the risk of going wrong in the instruction emulation by checking the OS version and only trying to emulate if the OS is new "enough".

Comment on attachment 9036554 [details] [diff] [review]
bug1283121-emulate-fp-accesses-v4.patch

Review of attachment 9036554 [details] [diff] [review]:
-----------------------------------------------------------------

Really great work here; awesome to have this old hole finally fixed and verified on hardware.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5907,5 @@
>        ScratchRegisterScope scratch(asMasm());
>        ma_add(memoryBase, ptr, scratch);
>  
> +      // WasmSignalHandler.cpp depends on this being a single, unconditional
> +      // VLDR with a base pointer other than PC.

How about "See HandleUnalignedTrap() in WasmSignalHandler.cpp" where we depend on ..."?  (ditto below)

::: js/src/wasm/WasmSignalHandlers.cpp
@@ +543,2 @@
>  static MOZ_MUST_USE bool HandleTrap(CONTEXT* context,
> +                                    bool isSigbus = false,

How about naming this isUnalignedSignal, making it an OS-specific question of whether a given signal means "unaligned".  I'm thinking of ARM-on-Windows here (although maybe a non-issue b/c ARM64).

@@ +569,5 @@
>                       trap == Trap::IndirectCallBadSig);
> +
> +  // Handle trapping unaligned accesses.
> +  if (isSigbus) {
> +#if defined(__linux__) && defined(__arm__)

Can you hoist this whole #ifdef into a new HandleUnalignedTrap() function above in the above #ifdef (the one with ReadFPR32() et al).  Can you also have an #else that defines HandleUnalignedTrap() to `return false` so that we can call HandleUnalignedTrap() from HandleTrap() without any #ifdefs?

Also, I think it would be useful to also require `trap == Trap::OutOfBounds`?  Actually, if `isUnalignedSignal && trap != Trap::OutOfBounds` I think we shouldn't even handle the signal at all.  So that suggests the control structure:

```
if (isUnalignedSignal) {
  if (trap != Trap::OutOfBounds) return false;
  if (HandleUnalignedTrap()) return true;
}
```

@@ +578,5 @@
> +    uint32_t masked = instr & 0x0F300E00;
> +    bool isVLDR = masked == 0x0D100A00;
> +    bool isVSTR = masked == 0x0D000A00;
> +    DebugOnly<bool> oob = true;
> +    if (isVLDR || isVSTR) {

It seems like we should be able to release assert it's one of these two since we know we are at a wasm-generated trap for an out-of-bounds.  If it's false, that means we're generating a new form of OOB-trapping load/store and need to add a new pattern here.

@@ +585,5 @@
> +      bool isDouble = (instr & 0x00000100) != 0;
> +      bool isAdd = (instr & 0x00800000) != 0;
> +      uint32_t dBit = (instr >> 22) & 1;
> +      uint32_t offs = (instr & 0xFF) << 2;
> +      uint32_t rn = (instr >> 16) & 0xF;

This is some nice Scandanavian Metal here ;)

@@ +627,5 @@
> +        }
> +      }
> +    }
> +#  ifdef DEBUG
> +    if (!oob) {

With this hoisted into a function, I think you can kill `oob` and have the `!memoryAccessInBounds()` case `return false`.

@@ +631,5 @@
> +    if (!oob) {
> +      // If we get to this point, we did not understand the instruction or we
> +      // could not access the FP register.  This is probably a bug / missing
> +      // feature or an incompatibility with the kernel.  It is best to MOZ_CRASH
> +      // only in debug builds and throw an OOB error in release builds.

I like this logic.  I wonder if the __android_log_print() is necessary though: the MOZ_CRASH() will __android_log_print() its string before crashing and the instruction is asserted to be fine; the only way to reach here is that we're not able to find the f32/f64 register in WriteFPR*.
Attachment #9036554 - Flags: review?(luke) → review+

(In reply to Lars T Hansen [:lth] from comment #24)

  • Not enough guards / risk of double-faulting because I've forgotten to check some precondition. I'm pretty sure I got "this is a known wasm instance with a PC that is for sure valid and a memory address that will for sure be in-bound when I read the data" right, but is there more?

That seems sufficient to me. Note that double-faulting is uniformly and conservatively handled by sAlreadyHandlingTrap; if you have a bug here it'll just end in a proper crash report.

(In reply to Luke Wagner [:luke] from comment #29)

Comment on attachment 9036554 [details] [diff] [review]
bug1283121-emulate-fp-accesses-v4.patch

Great feedback, thanks. Some followup questions:

@@ +578,5 @@

  • uint32_t masked = instr & 0x0F300E00;
  • bool isVLDR = masked == 0x0D100A00;
  • bool isVSTR = masked == 0x0D000A00;
  • DebugOnly<bool> oob = true;
  • if (isVLDR || isVSTR) {

It seems like we should be able to release assert it's one of these two
since we know we are at a wasm-generated trap for an out-of-bounds. If it's
false, that means we're generating a new form of OOB-trapping load/store and
need to add a new pattern here.

That is a little risky. We know there may be device configurations that also trap on unaligned integer loads and stores, even if we haven't seen any, and by my later comment I think it would be better to just throw a range error if we encounter one of those.

@@ +631,5 @@

  • if (!oob) {
  •  // If we get to this point, we did not understand the instruction or we
    
  •  // could not access the FP register.  This is probably a bug / missing
    
  •  // feature or an incompatibility with the kernel.  It is best to MOZ_CRASH
    
  •  // only in debug builds and throw an OOB error in release builds.
    

I like this logic. I wonder if the __android_log_print() is necessary
though: the MOZ_CRASH() will __android_log_print() its string before
crashing and the instruction is asserted to be fine; the only way to reach
here is that we're not able to find the f32/f64 register in WriteFPR*.

As above: it's here in case we encounter a fault from unaligned integer access.

But as you point out, having moved the code into a function makes it easier to create better logic around these things, and I think I see a good compromise here.

(In reply to Lars T Hansen [:lth] from comment #31)

It seems like we should be able to release assert it's one of these two
since we know we are at a wasm-generated trap for an out-of-bounds. If it's
false, that means we're generating a new form of OOB-trapping load/store and
need to add a new pattern here.

That is a little risky. We know there may be device configurations that also trap on unaligned integer loads and stores, even if we haven't seen any, and by my later comment I think it would be better to just throw a range error if we encounter one of those.

Ah hah; I hadn't thought about trapping integer loads. You're right; it would be best if we trapped in those cases.

Updated after reviews. Green in opt and debug on pixel 2 and moto g5. Will land it after a more general try run.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7e0aca6faf936cf877722d0421acfbbdcbf5369

Bob, you should be able to reclaim the G5s now. Thanks!

Attachment #9036554 - Attachment is obsolete: true
Flags: needinfo?(bob)
Attachment #9036696 - Flags: review+

done. thanks!

Flags: needinfo?(bob)
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1230184adda1
Handle SIGBUS for unaligned FP load/store on ARM Linux. r=luke, r=jseward

Do you know how this affects JS with typed arrays?

(In reply to Jan de Mooij [:jandem] from comment #36)

Do you know how this affects JS with typed arrays?

Should not affect JS at all. Float32Array and Float64Array are always aligned, and for DataView I'm pretty sure we're always careful to use memcpy for access. This is basically a consequence of wasm allowing unaligned accesses even when the instruction says the access is aligned, combined with the desire always to emit the best possible code for accesses that say they are aligned. (See commit msg for more, or lmk if you need more information.)

(In reply to Lars T Hansen [:lth] from comment #37)

Float32Array and Float64Array are always aligned, and for DataView I'm pretty sure we're always careful to use memcpy for access.

Ah right. I thought TypedArray views could be unaligned but you're right. Great!

Status: NEW → RESOLVED
Closed: 6 years ago5 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.