Closed Bug 1661016 Opened 4 years ago Closed 4 years ago

Cranelift: test failures when tiering on aarch64 simulator

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox82 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(2 files, 1 obsolete file)

  • Run the test suite with --args="--wasm-compiler=baseline+cranelift"

There are a few test failures with the same assertion Assertion failure: instCache_[offset] == instValue, at /home/ben/code/mozilla/gecko/js/src/jit/arm64/vixl/MozCachingDecoder.h:77

    --test-wasm-await-tier2 wasm/baseline-abs-addr-opt.js
    --test-wasm-await-tier2 wasm/bce.js
    --test-wasm-await-tier2 wasm/atomic.js
    --test-wasm-await-tier2 wasm/memory.js
    --no-baseline --no-blinterp --test-wasm-await-tier2 wasm/table-gc.js

Some of them are intermittent (e.g. table-gc), but most can be reliably reproduced. Investigating.

After investigation, what happens is the following:

  • during tiering, a background thread compiles wasm, requests an executable page, and will mark it executable as well as flushes its icache.
  • on simulated aarch64, it seems an icache flushing request is first pushed to all the simulators
  • then if we have a simulator on the current thread, we execute the icache flush immediately
  • otherwise (which is our case here: we're a background thread, and the simulator is probably on the main thread), we don't do anything

The issue is that icache flushing requests are never executed on the main thread. Unfortunately, there's not a single obvious place where to execute pending icache flushing requests in this case:

  • the C++ code hit when entering the simulator might never be reached, because we might be running wasm code on the main thread, and wasm baseline may racily call into wasm optimized code. Or the simulator could be executing JS that calls into wasm through the jit entries.
  • there's a lock on the function that flushes the pending requests, so we can't just do it on every single simulated instruction

That's one issue.

There was an ongoing discussion about this being a real issue on aarch64 devices: on real hardware, the icache flushing happens immediately. Nbp says the icache flush only happens for the current core, but https://bugzilla.mozilla.org/show_bug.cgi?id=1529933#c7 seems to indicate that this is safe and the icache flush is propagated to all the cores before being executed. Lars, can you confirm this please?

Flags: needinfo?(lhansen)

Yeah, this is subtle but I think this is fundamentally a simulator/hardware mismatch. On the hardware, we depend on synchronization edges (in the locks that manage the compilation queue) to create a causal relationship between the writing threads and the thread that performs the mprotect, and we depend on the fence and the mprotect and branch mispredicts to ensure that other cores do not read stale instructions. The simulator does not simulate this relationship properly due to the asynchrony. The comment in ReprotectRegion also points to this discussion.

(It's not obvious to me that ReprotectRegion actually needs to do the iflush at all because the mprotect probably has that effect on real hardware, that's the gist of the discussion - mprotect has global icache flushing effects. But maybe a discussion for another time.)

I don't know how to approach this precisely but it seems that the simulator needs some kind of signal at the point of the mprotect that the addresses are being flushed globally and You Must Do This Now.

Flags: needinfo?(lhansen)

Pending icache flushing requests were never satisfied before this patch, when
the request was emitted from a thread other than the main thread.

See comments around sPendingRequests to see how it's done efficiently.

Assignee: nobody → bbouvier
Status: NEW → ASSIGNED

Huge thanks to Anton Kirilov and the rest of the ARM team, who helped us resolve the situation in the real hardware case.

In theory we'd need to really execute the ISB instruction on each executing thread, but instead we can have the compiler (background) thread trigger a membarrier (man page) with the parameter MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (note: it requires another call to membarrier to register that we're going to use this command), and that will trigger on every thread the same behavior as if we executed the ISB instruction. However, this is a bit heavyweight, so we should try to do it only when really necessary, i.e. when we're on a background thread. It's not required if the current thread will be the same as the one executing the code, which should be the case in general for the JS JIT, as far as I know. It's fine to do it in the context of wasm, because wasm compilation should remain pretty rare.

Attachment #9172352 - Attachment description: Bug 1661016: Flush pending icache invalidation request when using the caching decoder; r?nbp → Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp
Attachment #9172352 - Attachment description: Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp → Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp
Attachment #9172352 - Attachment description: Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp → Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp,lth
Attachment #9172352 - Attachment description: Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp,lth → Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp,lth

Comments put in the wrong bug (in bug 1660944):

A piece of news here: JS engine initialization creates JIT code for atomic ops (including flushing the icache for this JITted code). That may happen on a non-main thread, so because of the current patch, we think the icache flushing needs to be broadcast to all the threads, while we never checked for the membarrier availability before, and in my case that happens to fail (kernel too old).

The proper fix might be to add another parameter to EnsureIAndDCacheCoherency so it knows whether the icache invalidation must be broadcast or not. Then wasm compilation can set this parameter to "please broadcast the invalidation" since the code may be run on other threads, while other use cases don't have to do the heavyweight membarrier for their own usage.

to which Lars responded:

Not completely clear to me why icache invalidation is required there. The memory used should not previously have been used for code anywhere in the process. Of course, if we can do it then that's safer, but I'm unconvinced that heroics are necessary.

It's actually pretty easy to do, so I've done it and will try-run it before uploading the patch.

Attachment #9173331 - Attachment is obsolete: true
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/01323b03cc1b
aarch64: Invalidate icache when compiling on a background thread; r=nbp,lth
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/304aadc6f2c1
aarch64: slightly refactor the check that membarrier is available; r=lth
Regressions: 1663895
Regressions: 1672619
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: