ARM64: Improve code generation for atomics by avoiding fences


Currently we generate an ARM-like fence + op + fence instruction sequence for ARM64 seq_cst atomics; this is safe, but we can do better.  Specifically, the canonical reference [1] has optimized sequences that avoid fences by using only the acquire and release operations, even for seq_cst operation.

The optimized atomics are now about to become legal following an update to the memory model: The executive summary is that the old memory model disallowed the use of release-acquire atomic ops for seq_cst operation in JS; the new model is slightly weaker and allows them, and the change is motivated by the desire to allow them on ARM64. (Apparently Chrome already uses weaker atomics here.)

