Open Bug 1077036 Opened 5 years ago Updated 3 years ago
x86 atomics: Generate better code for byte-sized atomic operations (32-bit only)
Followup work on bug 979594. Currently the code generator constrains the register allocator quite a lot in several ways when it implements atomic operations on byte arrays: - it requires the use of specific registers that have byte parts (ebx, ecx) - it requires the use of a full register with a byte part in situations where it could work around that and pick any full register Also it does not conserve registers as well as it could: - it does not allow use of the "high" registers (ah, bh) - it does not allow the use of immediate values when it would sometimes be possible to do so
These optimization opportunities (modulo immediate value) apply to 32-bit only, see bug 1138348 for the 64-bit case.
Hardware: x86_64 → x86
Summary: x86 atomics: Generate better code for byte-sized atomic operations → x86 atomics: Generate better code for byte-sized atomic operations (32-bit only)
I also think that in the case of a properly in-range immediate argument the final movzbl / movsbl in the atomic binops with a result (AND, OR, NOT) can be avoided provided that the initial load performs the proper zero/sign extension. That would be true also for 16-bit operations and also for x64.
Observe also that in the case of Atomics.sub() we sometimes end up with movl src, eax neg eax lock xadd al, ... movsbl al, eax and it seems that if the initial mov is necessary then it would be desirable to combine it with the neg, I'm sensing there may be some LEA solution.
You need to log in before you can comment on or make changes to this bug.