Closed
Bug 1251392
Opened 8 years ago
Closed 8 years ago
Baldr: Implement PopcntI.
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla47
Tracking | Status | |
---|---|---|
firefox47 | --- | fixed |
People
(Reporter: mbx, Assigned: mbx)
References
Details
Attachments
(1 file, 3 obsolete files)
22.37 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•8 years ago
|
||
This is more annoying than I originally thought, popcnt is only in SSE4.2 (on Intel) and you have to check a CPU flag. AMD from what I can tell is slightly different.
Blocks: wasm
Assignee | ||
Comment 2•8 years ago
|
||
Implements popcnt for x86 only when the popcnt instruction is available. Does anyone have an idea how to implement popcnt efficiently when the popcnt instruction is not available?
Attachment #8723895 -
Flags: review?(sunfish)
Assignee | ||
Updated•8 years ago
|
Attachment #8723895 -
Flags: review?(sunfish) → feedback?(sunfish)
Comment 3•8 years ago
|
||
(In reply to Michael Bebenita [:mbx] from comment #2) > Does anyone have an idea how to implement popcnt efficiently when the popcnt > instruction is not available? There's some code in mfbt/MathAlgorithms.h (only used when __builtin_popcount is not available): inline uint_fast8_t CountPopulation32(uint32_t aValue) { uint32_t x = aValue - ((aValue >> 1) & 0x55555555); x = (x & 0x33333333) + ((x >> 2) & 0x33333333); return (((x + (x >> 4)) & 0xf0f0f0f) * 0x1010101) >> 24; } For __builtin_popcount, Clang actually emits code that's very similar: movl %edi, %eax shrl %eax andl $1431655765, %eax ## imm = 0x55555555 subl %eax, %edi movl %edi, %eax andl $858993459, %eax ## imm = 0x33333333 shrl $2, %edi andl $858993459, %edi ## imm = 0x33333333 addl %eax, %edi movl %edi, %eax shrl $4, %eax addl %edi, %eax andl $252645135, %eax ## imm = 0xF0F0F0F imull $16843009, %eax, %eax ## imm = 0x1010101 shrl $24, %eax
Comment 4•8 years ago
|
||
(What's nice about this algorithm is that it needs only 2 registers. We can use the output register so we need only 1 temp register.)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → mbebenita
Comment 5•8 years ago
|
||
Comment on attachment 8723895 [details] [diff] [review] popcnt.patch Review of attachment 8723895 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/Wasm.cpp @@ +430,5 @@ > return DecodeUnaryOperator(f, expected, ExprType::I32); > case Expr::I32Ctz: > return f.fail("NYI: ctz"); > case Expr::I32Popcnt: > + return DecodeUnaryOperator(f, expected, ExprType::I32); This is the same as the code for I32Clz above, so you can just move the I32Popcnt case. ::: js/src/jit-test/tests/wasm/basic-integer.js @@ +17,5 @@ > } > > testUnary('i32', 'clz', 40, 26); > //testUnary('i32', 'ctz', 40, 0); // TODO: NYI > +testUnary('i32', 'popcnt', 40, 2); // TODO: NYI The NYI comment is no longer needed :). ::: js/src/jit/MIR.cpp @@ +5142,5 @@ > +{ > + if (num()->isConstant()) { > + int32_t n = num()->toConstant()->toInt32(); > + if (n == 0) > + return MConstant::New(alloc, Int32Value(0)); Unlike ctz and clz, we don't need to special-case 0 for popcnt because CountPopulation32 does the right thing. ::: js/src/jit/MIR.h @@ +6031,5 @@ > +class MPopcnt > + : public MUnaryInstruction > + , public BitwisePolicy::Data > +{ > + bool operandIsNeverZero_; Unlike ctz and clz, for popcnt we don't need to track operandIsNeverZero because hardware popcnt instructions and the software fallback all do the right thing for zero.
Attachment #8723895 -
Flags: feedback?(sunfish) → feedback+
Assignee | ||
Comment 6•8 years ago
|
||
Addressed comments and added assembly fallback when popcnt is not available.
Attachment #8723895 -
Attachment is obsolete: true
Assignee | ||
Comment 7•8 years ago
|
||
This version also adds support for ARM.
Attachment #8724215 -
Attachment is obsolete: true
Assignee | ||
Comment 8•8 years ago
|
||
Comment on attachment 8724321 [details] [diff] [review] popcnt.patch Popcnt has a weird prefix on x86, and I can't figure out how to encoded it nicely using the existing towByteOp helper method.
Attachment #8724321 -
Flags: review?(sunfish)
Assignee | ||
Updated•8 years ago
|
Comment 9•8 years ago
|
||
Comment on attachment 8724321 [details] [diff] [review] popcnt.patch Review of attachment 8724321 [details] [diff] [review]: ----------------------------------------------------------------- This patch is missing the target-independent and x86 changes, that were in an earlier version of the patch.
Attachment #8724321 -
Flags: review?(sunfish)
Assignee | ||
Comment 10•8 years ago
|
||
Oops, forgot to squash.
Attachment #8724321 -
Attachment is obsolete: true
Attachment #8724630 -
Flags: review?(sunfish)
Comment 11•8 years ago
|
||
Comment on attachment 8724630 [details] [diff] [review] popcnt.patch Review of attachment 8724630 [details] [diff] [review]: ----------------------------------------------------------------- Looks good!
Attachment #8724630 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 12•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=867ebb5fead3
Comment 14•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3a2df80faeb6
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in
before you can comment on or make changes to this bug.
Description
•