Provide reliable implementation of jit/AtomicOperations for tier-1 platforms
Categories
(Core :: JavaScript Engine: JIT, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox66 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 8 obsolete files)
90.75 KB,
patch
|
nbp
:
review+
froydnj
:
review+
|
Details | Diff | Splinter Review |
63.25 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Assignee | ||
Comment 6•6 years ago
|
||
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 16•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #14)
Comment on attachment 9016313 [details] [diff] [review]
bug1394420-jit-generate-atomics-v7.patchReview of attachment 9016313 [details] [diff] [review]:
I did not look super-closely at the generators, but from a superficial look,
they looked fine.::: js/src/jit/shared/AtomicOperations-shared-jit.cpp
@@ +295,5 @@+#endif
default:
MOZ_CRASH("Unknown size");
- }
- masm.memoryBarrier(sync.barrierAfter);
I think the semantics imposed by unconditionally generating barriers is
stronger than we need on e.g. x86. So this approach is not currently as
performant as we could be?
It doesn't do that. The sync object has a number of bitvectors specifying the barriers that are actually required for the context of the Synchronization. So if the Synchronization comes from an unordered access, barrierAfter will be zero, and no fences will be emitted. If the Synchronization comes from Synchronization::Load(), no code will be emitted on x86, due to the definition of memoryBarrier(). Only for Synchronization::Store and Synchronization::Full will there be a bit here that forces a fence emitted, as desired.
@@ +302,5 @@
- return start;
+}+static uint32_t
+GenStore(MacroAssembler& masm, Scalar::Type size, Synchronization sync)
sync
is apparently unused in this function, which seems suspicious.
Nice catch.
@@ +385,5 @@
MOZ_CRASH("Unknown size");
}
offset += direction == CopyDir::DOWN ? 1 : -1;
- }
- masm.memoryBarrier(sync.barrierAfter);
We only need barriers before and after all the instructions, not in between?
(I'm not sure what the semantics are here, so asking mostly for my own
edification?
A good question. Right now all the callers of GenCopy() pass Synchronization::None(), so it's a moot point. For WebAssembly I expect the spec will say that the copy will have a barrier at the end. I'll remove the sync for now.
@@ +750,5 @@
- uint32_t nop = GenNop(masm);
+#endif- uint32_t load8SeqCst = GenLoad(masm, SIZE8, Synchronization::Full());
- uint32_t load16SeqCst = GenLoad(masm, SIZE16, Synchronization::Full());
Is it worth having local synchronization objects that can be passed in here,
rather than continuously repeating Sychronization::FOO()?
Well... matter of taste? Let me think about it.
@@ +875,5 @@
- AtomicCompilerFence = (void(*)())(code + nop);
+#endif- AtomicLoad8SeqCst = (uint8_t()(const uint8_t addr))(code + load8SeqCst);
- AtomicLoad16SeqCst = (uint16_t()(const uint16_t addr))(code + load16SeqCst);
Drive-by comments, but this is not code I own, so feel free to ignore!
- I think the use of variable names in the function type clutters things up.
Fair point, though they're a useful mnemonic device.
- I think all of these would be better served with a template like:
template<typename R, typename... Args>
R()(Args...)
Frob(uint8_t ptr, uint32_t offset)
{
return (R(*))(Args...)(code + offset);
}if that's even valid syntax. Then these turn into things like:
AtomicLoad8SeqCst = Frob<uint8_t(const uint8_t*)>(code, load8SeqCst);
which seems a little more readable? I guess the main shortening there comes
from taking out the variable name, so maybe it's not worth much.
Hm, yes. I guess the code's been written and is unlikely to be rewritten, so I may give this one a pass. Removing the variable names in the casts would help by itself, will give that a try.
Assignee | ||
Comment 17•6 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #15)
Comment on attachment 9016313 [details] [diff] [review]
bug1394420-jit-generate-atomics-v7.patchReview of attachment 9016313 [details] [diff] [review]:
r=me with AtomicOperations-shared-jit.h changes.
::: js/src/jit/shared/AtomicOperations-shared-jit.cpp
@@ +46,5 @@
+// Selected registers match the argument registers exactly, and none of them
+// overlap the result register.
+
+static const LiveRegisterSet NonVolatileRegs;nit: Prefix these with their namespaces: ::js::jit::NonVolatileRegs as there
is already a NonVolatileRegs in wasm namespace.
Renamed as AtomicNonVolatileRegs.
@@ +167,5 @@
> > + ABIArg arg = iter->abi.next(t);
> > + switch (arg.kind()) {
> > + case ABIArg::GPR: {
> > + if (arg.gpr() != reg) {
> > + masm.movePtr(arg.gpr(), reg);
While I agree that with the predefined registers enumerated above these
functions are going to behave correctly given the current set of arguments.
However, I suspect that this might become a unnoticed pit-fall for the next
person adding a new function here, which is erasing the argument value with
an already loaded argument.
I agree this is a little brittle.
Instead of having GenGprArg, I would recommend to add "getABIArg" to the
MacroAssembler, using the MoveResolver like done in passABIArg. Then Add a
function to resolve all the Move instructions [1].[1] https://searchfox.org/mozilla-central/source/js/src/jit/x64/MacroAssembler- x64.cpp#342-352
Given the very fixed functionality of this code and how long this patch has been languishing, I won't do so now, but I agree with this in principle. ISTR there were in fact some bugs during development resulting from the unchecked register constraints.
For now, I will:
- beef up the comments a little to make it even clearer that these are hard constraints
- add a pointer to the language you use above so that if the complexity of this code grows much (eg, MIPS will tend to need more registers) we will reconsider.
> ::: js/src/jit/shared/AtomicOperations-shared-jit.h
> @@ +210,5 @@
> > + T oldval = *addr; /* good initial approximation */ \
> > + for (;;) { \
> > + T nextval = (T)AtomicCmpXchg64SeqCst((uint64_t*)addr, \
> > + (uint64_t)val, \
> > + (uint64_t)oldval); \
>
> Swap oldval and val in this function call.
Nice catch, this was a cut-and-paste from MSVC code, which has reversed the arguments for this function according to what's normal.
Ditto the next two.
Comment 18•6 years ago
|
||
Comment 19•6 years ago
|
||
Backed out 2 changesets (bug 1394420) for failing testAtomicOperations.cpp, ESling and jit failures
push that caused the backout: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=b2ffeeac7326d673ff09474ce3007e84687c5a62
backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/c311812c530c03588f7ac2fac4fd9f6c38103529
Assignee | ||
Comment 20•6 years ago
•
|
||
The problem here is that while ReturnReg (as defined in the assembler) is identical to the C/C++ return register on x86, ReturnReg64 is not, at least on Linux. So more register definitions are needed. There is a risk that some of this is platform-dependent of course. Investigating further.
Specifically, the C/C++ 64-bit return register is edx:eax, while our ReturnReg64 is edi:eax.
Assignee | ||
Comment 21•6 years ago
|
||
A better link: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b2ffeeac7326d673ff09474ce3007e84687c5a62. The apparent ARM failures are emulator failures, thus the same bug. Thus it looks like only x86 is affected.
Assignee | ||
Comment 22•6 years ago
•
|
||
OK, several problems found:
- on x86 (32-bit), the ReturnReg64 is incorrect, this is bug 1521512, I've worked around it
- on ARM (32-bit), the LR is also a temp register that is used by the atomic core operations so the return address must be saved and restored
- on ARM and ARM64, the argBase value (for reading arguments from the stack) incorrectly incorporated an adjustment for the pushed return address that is only appropriate for x86/x64
In general, prologue and epilogue are a little platform-dependent and therefore now have proper ifdef nestes with an #error for unsupported platforms, just like the register definitions.
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2d921f6ce0a9
https://hg.mozilla.org/mozilla-central/rev/4e928a279f87
Comment 25•6 years ago
|
||
Description
•