Closed
Bug 1251392
Opened 9 years ago
Closed 9 years ago
Baldr: Implement PopcntI.
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla47
Tracking | Status | |
---|---|---|
firefox47 | --- | fixed |
People
(Reporter: mbx, Assigned: mbx)
References
Details
Attachments
(1 file, 3 obsolete files)
22.37 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•9 years ago
|
||
This is more annoying than I originally thought, popcnt is only in SSE4.2 (on Intel) and you have to check a CPU flag. AMD from what I can tell is slightly different.
Blocks: wasm
Assignee | ||
Comment 2•9 years ago
|
||
Implements popcnt for x86 only when the popcnt instruction is available. Does anyone have an idea how to implement popcnt efficiently when the popcnt instruction is not available?
Attachment #8723895 -
Flags: review?(sunfish)
Assignee | ||
Updated•9 years ago
|
Attachment #8723895 -
Flags: review?(sunfish) → feedback?(sunfish)
Comment 3•9 years ago
|
||
(In reply to Michael Bebenita [:mbx] from comment #2)
> Does anyone have an idea how to implement popcnt efficiently when the popcnt
> instruction is not available?
There's some code in mfbt/MathAlgorithms.h (only used when __builtin_popcount is not available):
inline uint_fast8_t
CountPopulation32(uint32_t aValue)
{
uint32_t x = aValue - ((aValue >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
return (((x + (x >> 4)) & 0xf0f0f0f) * 0x1010101) >> 24;
}
For __builtin_popcount, Clang actually emits code that's very similar:
movl %edi, %eax
shrl %eax
andl $1431655765, %eax ## imm = 0x55555555
subl %eax, %edi
movl %edi, %eax
andl $858993459, %eax ## imm = 0x33333333
shrl $2, %edi
andl $858993459, %edi ## imm = 0x33333333
addl %eax, %edi
movl %edi, %eax
shrl $4, %eax
addl %edi, %eax
andl $252645135, %eax ## imm = 0xF0F0F0F
imull $16843009, %eax, %eax ## imm = 0x1010101
shrl $24, %eax
Comment 4•9 years ago
|
||
(What's nice about this algorithm is that it needs only 2 registers. We can use the output register so we need only 1 temp register.)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → mbebenita
Comment 5•9 years ago
|
||
Comment on attachment 8723895 [details] [diff] [review]
popcnt.patch
Review of attachment 8723895 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/asmjs/Wasm.cpp
@@ +430,5 @@
> return DecodeUnaryOperator(f, expected, ExprType::I32);
> case Expr::I32Ctz:
> return f.fail("NYI: ctz");
> case Expr::I32Popcnt:
> + return DecodeUnaryOperator(f, expected, ExprType::I32);
This is the same as the code for I32Clz above, so you can just move the I32Popcnt case.
::: js/src/jit-test/tests/wasm/basic-integer.js
@@ +17,5 @@
> }
>
> testUnary('i32', 'clz', 40, 26);
> //testUnary('i32', 'ctz', 40, 0); // TODO: NYI
> +testUnary('i32', 'popcnt', 40, 2); // TODO: NYI
The NYI comment is no longer needed :).
::: js/src/jit/MIR.cpp
@@ +5142,5 @@
> +{
> + if (num()->isConstant()) {
> + int32_t n = num()->toConstant()->toInt32();
> + if (n == 0)
> + return MConstant::New(alloc, Int32Value(0));
Unlike ctz and clz, we don't need to special-case 0 for popcnt because CountPopulation32 does the right thing.
::: js/src/jit/MIR.h
@@ +6031,5 @@
> +class MPopcnt
> + : public MUnaryInstruction
> + , public BitwisePolicy::Data
> +{
> + bool operandIsNeverZero_;
Unlike ctz and clz, for popcnt we don't need to track operandIsNeverZero because hardware popcnt instructions and the software fallback all do the right thing for zero.
Attachment #8723895 -
Flags: feedback?(sunfish) → feedback+
Assignee | ||
Comment 6•9 years ago
|
||
Addressed comments and added assembly fallback when popcnt is not available.
Attachment #8723895 -
Attachment is obsolete: true
Assignee | ||
Comment 7•9 years ago
|
||
This version also adds support for ARM.
Attachment #8724215 -
Attachment is obsolete: true
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8724321 [details] [diff] [review]
popcnt.patch
Popcnt has a weird prefix on x86, and I can't figure out how to encoded it nicely using the existing towByteOp helper method.
Attachment #8724321 -
Flags: review?(sunfish)
Assignee | ||
Updated•9 years ago
|
Comment 9•9 years ago
|
||
Comment on attachment 8724321 [details] [diff] [review]
popcnt.patch
Review of attachment 8724321 [details] [diff] [review]:
-----------------------------------------------------------------
This patch is missing the target-independent and x86 changes, that were in an earlier version of the patch.
Attachment #8724321 -
Flags: review?(sunfish)
Assignee | ||
Comment 10•9 years ago
|
||
Oops, forgot to squash.
Attachment #8724321 -
Attachment is obsolete: true
Attachment #8724630 -
Flags: review?(sunfish)
Comment 11•9 years ago
|
||
Comment on attachment 8724630 [details] [diff] [review]
popcnt.patch
Review of attachment 8724630 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
Attachment #8724630 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 12•9 years ago
|
||
Comment 13•9 years ago
|
||
Comment 14•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in
before you can comment on or make changes to this bug.
Description
•