Baldr: Implement PopcntI.

RESOLVED FIXED in Firefox 47

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mbx, Assigned: mbx)

Tracking

unspecified
mozilla47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 fixed)

Details

Attachments

(1 attachment, 3 obsolete attachments)

No description provided.
This is more annoying than I originally thought, popcnt is only in SSE4.2 (on Intel) and you have to check a CPU flag. AMD from what I can tell is slightly different.
Blocks: wasm
Posted patch popcnt.patch (obsolete) — Splinter Review
Implements popcnt for x86 only when the popcnt instruction is available. Does anyone have an idea how to implement popcnt efficiently when the popcnt instruction is not available?
Attachment #8723895 - Flags: review?(sunfish)
Attachment #8723895 - Flags: review?(sunfish) → feedback?(sunfish)
(In reply to Michael Bebenita [:mbx] from comment #2)
> Does anyone have an idea how to implement popcnt efficiently when the popcnt
> instruction is not available?

There's some code in mfbt/MathAlgorithms.h (only used when __builtin_popcount is not available):

  inline uint_fast8_t
  CountPopulation32(uint32_t aValue)
  {
    uint32_t x = aValue - ((aValue >> 1) & 0x55555555);
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
    return (((x + (x >> 4)) & 0xf0f0f0f) * 0x1010101) >> 24;
  }

For __builtin_popcount, Clang actually emits code that's very similar:

	movl	%edi, %eax
	shrl	%eax
	andl	$1431655765, %eax       ## imm = 0x55555555
	subl	%eax, %edi
	movl	%edi, %eax
	andl	$858993459, %eax        ## imm = 0x33333333
	shrl	$2, %edi
	andl	$858993459, %edi        ## imm = 0x33333333
	addl	%eax, %edi
	movl	%edi, %eax
	shrl	$4, %eax
	addl	%edi, %eax
	andl	$252645135, %eax        ## imm = 0xF0F0F0F
	imull	$16843009, %eax, %eax   ## imm = 0x1010101
	shrl	$24, %eax
(What's nice about this algorithm is that it needs only 2 registers. We can use the output register so we need only 1 temp register.)
Assignee: nobody → mbebenita
Comment on attachment 8723895 [details] [diff] [review]
popcnt.patch

Review of attachment 8723895 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/Wasm.cpp
@@ +430,5 @@
>          return DecodeUnaryOperator(f, expected, ExprType::I32);
>        case Expr::I32Ctz:
>          return f.fail("NYI: ctz");
>        case Expr::I32Popcnt:
> +        return DecodeUnaryOperator(f, expected, ExprType::I32);

This is the same as the code for I32Clz above, so you can just move the I32Popcnt case.

::: js/src/jit-test/tests/wasm/basic-integer.js
@@ +17,5 @@
>  }
>  
>  testUnary('i32', 'clz', 40, 26);
>  //testUnary('i32', 'ctz', 40, 0); // TODO: NYI
> +testUnary('i32', 'popcnt', 40, 2); // TODO: NYI

The NYI comment is no longer needed :).

::: js/src/jit/MIR.cpp
@@ +5142,5 @@
> +{
> +    if (num()->isConstant()) {
> +        int32_t n = num()->toConstant()->toInt32();
> +        if (n == 0)
> +            return MConstant::New(alloc, Int32Value(0));

Unlike ctz and clz, we don't need to special-case 0 for popcnt because CountPopulation32 does the right thing.

::: js/src/jit/MIR.h
@@ +6031,5 @@
> +class MPopcnt
> +    : public MUnaryInstruction
> +    , public BitwisePolicy::Data
> +{
> +    bool operandIsNeverZero_;

Unlike ctz and clz, for popcnt we don't need to track operandIsNeverZero because hardware popcnt instructions and the software fallback all do the right thing for zero.
Attachment #8723895 - Flags: feedback?(sunfish) → feedback+
Posted patch popcnt.patch (obsolete) — Splinter Review
Addressed comments and added assembly fallback when popcnt is not available.
Attachment #8723895 - Attachment is obsolete: true
Posted patch popcnt.patch (obsolete) — Splinter Review
This version also adds support for ARM.
Attachment #8724215 - Attachment is obsolete: true
Comment on attachment 8724321 [details] [diff] [review]
popcnt.patch

Popcnt has a weird prefix on x86, and I can't figure out how to encoded it nicely using the existing towByteOp helper method.
Attachment #8724321 - Flags: review?(sunfish)
Blocks: 1242803
No longer blocks: wasm
Comment on attachment 8724321 [details] [diff] [review]
popcnt.patch

Review of attachment 8724321 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is missing the target-independent and x86 changes, that were in an earlier version of the patch.
Attachment #8724321 - Flags: review?(sunfish)
Posted patch popcnt.patchSplinter Review
Oops, forgot to squash.
Attachment #8724321 - Attachment is obsolete: true
Attachment #8724630 - Flags: review?(sunfish)
Comment on attachment 8724630 [details] [diff] [review]
popcnt.patch

Review of attachment 8724630 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8724630 - Flags: review?(sunfish) → review+
https://hg.mozilla.org/mozilla-central/rev/3a2df80faeb6
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.