OdinMonkey: support addresses with constant offsets

RESOLVED FIXED in Firefox 39

Status

()

Core
JavaScript Engine: JIT
--
enhancement
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: sunfish, Assigned: sunfish)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla39
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox39 fixed)

Details

Attachments

(2 attachments, 11 obsolete attachments)

3.00 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
173.45 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
(Assignee)

Description

4 years ago
In sqlite3VdbeExec in sqlite3, there are numerous pointers to structs, and numerous struct fields are accessed through them. The addressing for the fields all come out as adds, and they tend to congregate together in the optimizer, so they tend to show up in JS in big pools like this:

     $cmp4520 = $tmp4341 + 72 | 0;
     $aVar = $p + 72 | 0;
     $tmp3854 = $u + 4 | 0;
     $cmp6664 = $u + 8 | 0;
     $xClose = $u + 12 | 0;
     $x2146 = $p + 144 | 0;
     $errorAction$i = $p + 96 | 0;
     $zErrMsg$i3123 = $p + 56 | 0;
     $arrayidx6675 = $u + 24 | 0;
     $add2275 = $u + 32 | 0;
     $cmp5659 = $u + 16 | 0;
     $tobool6774 = $tmp4341 + 152 | 0;
     $tmp410 = $p + 160 | 0;
     $cmp$i$i1315 = $iValue$i6430 + 4 | 0;
     $zText$i$i = $iValue$i6430 + 8 | 0;
     $arrayidx$i$i = $iValue$i6430 + 12 | 0;
     $useMalloc$i$i = $iValue$i6430 + 25 | 0;
     $mallocFailed$i$i = $iValue$i6430 + 24 | 0;
     $cmp666 = $u + 76 | 0;
     $apArg = $p + 12 | 0;
     $apVal = $u + 72 | 0;
     $pVdbeFunc = $cmp6664 + 4 | 0;
     $s = $cmp6664 + 8 | 0;
     $or747 = $cmp6664 + 36 | 0;
     $xDel = $cmp6664 + 40 | 0;
     $zMalloc737 = $cmp6664 + 44 | 0;
     $z$i$i3520 = $cmp6664 + 12 | 0;
     $tobool806 = $cmp6664 + 52 | 0;
     [... it actually goes on like this for quite a while ...]

There are so many variables here that they basically all get spilled, and the result is that there are many reloads throughout the rest of the code. If we could fold these offsets into addresses, it would *greatly* reduce the amount of spill code produced.

This is similar to bug 915157, except that the loads and stores that use these pointers are not masked.

Comment 1

4 years ago
Emscripten's aggressive variable eliminator pass should remove most of those, does it help?
If 'let' were supported then many of these would not need to be hoisted to the start of the function.  I understand that 'let' is poorly supported so this is not a good option, but otherwise it would seem to be appropriate.

The ARM does not have the same addressing mode flexibility as the x86/x64 so it might not be possible to hide the addition of the offsets if they were unfolded, and it might not be without some cost even on the x86/x64.

It would seem to be a challenge for Emscripten.  Can it keep track of the number of live variables in each block, and just allocate this maximum at the start of the function, and then reuse the smaller set.  For example, if $cmp4520 and $aVar do not have an overlapping life then only one variable need be allocated for them at the start of the function.  Perhaps this is what Emscripten's 'aggressive variable eliminator pass' already does.

Can a special compiler pass move these into the scope in which they are alive, undoing the hoisting of them to the start of the function.  Supporting 'let' would avoid the need for such a pass and allow Emscripten to do the work.
(Assignee)

Comment 3

3 years ago
Created attachment 8540970 [details] [diff] [review]
asmjs-offset-wrapping.patch

This patch approaches constant offsets from a complementary perspective to the patch in bug 915157. I had previously been unsure about this approach, however, I recently made another attempt to work through the issues, and I think came up with something feasible.

The big idea here is to enable folding of offsets without regard to staying in bounds. To make this work, it extends the guard region at the end of the asm.js heap, and it teaches the signal handler to detect an out-of-bounds access which would have been in bounds under a uint32_t index wrap, so it can complete the access successfully, right there in the handler.

The offset is currently limited to 1024, but it could easily be extended. It could also theoretically be extended to negative offsets, if we're willing to live with the heap base register not pointing at the actual beginning of the heap address space. And if we do that, then we could extend this system to handle scaled indexing too, though with much larger guard regions.

It's not complete yet; it's not yet ported to all platforms which could support it, but the idea is illustrated. In short, it allows folding like this:

[Codegen] addl       $0x4, %eax
[Codegen] andl       $0xfffffffc, %eax
[Codegen] movl       0x0(%r15,%rax,1), %edi

After:

[Codegen] andl       $0xfffffffc, %eax
[Codegen] movl       0x4(%r15,%rax,1), %edi

It's particularly effective on SIMD loads and stores, because they don't have a mask, so the index operand is never used in a clobbering instruction, so we don't need to make a copy of it first.
Assignee: nobody → sunfish
Attachment #8540970 - Flags: feedback?(luke)
Comment on attachment 8540970 [details] [diff] [review]
asmjs-offset-wrapping.patch

Review of attachment 8540970 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, this looks like a piece of the puzzle.

For the 32 bit ports, dehoisting the offset might practically depend on firstly optimizing away the bounds checks, so I expect the optimizations in this patch to be very x64 specific. If JS code is written to support optimizing away the bounds checks then this patch is probably not necessary anyway. The JIT's seem to all be using explicit bounds checks.

Supporting a scaled index might also be interesting to explore. With the buffer at zero it could support a base plus scaled index plus the offset, but only when proven to be positive. This would be specific to the 64 bit ports but might still be interesting to explore.

Have seen subtly code generation differences between transforming the code early before licm etc versus later, and perhaps these are significant - still exploring.

Hoisting the low bit masking before the addition of the offset might be more generally useful, and perhaps could be integrated elsewhere.

Comment 5

3 years ago
Comment on attachment 8540970 [details] [diff] [review]
asmjs-offset-wrapping.patch

Review of attachment 8540970 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!  To wit, here is the cset that removed this folding the first time (out of laziness, when I discovered the wrap-around correctness issue): http://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey/rev/7d5adc716af4.

So will the aggregate effect on the problem you reported in comment 0 be that all the hoisted addition-plus-constants become dead?

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +96,5 @@
> +
> +    if (ptr->isAdd()) {
> +        MDefinition *op1 = ptr->getOperand(1);
> +        if (op1->isConstant()) {
> +            int32_t imm = ptr->getOperand(1)->toConstant()->value().toInt32();

Here and below, is it necessarily the case that ptr is an int-specialized-add and value() is an int32?  It seems like the asm.js type system would ensure this at construction time and MIR optimizations shouldn't break this invariant, but I just wanted to check.

@@ +100,5 @@
> +            int32_t imm = ptr->getOperand(1)->toConstant()->value().toInt32();
> +            if (access->addOffset(imm))
> +                ins->replaceOperand(0, ptr->getOperand(0));
> +        }
> +    } else if (ptr->isBitAnd() && ptr->hasOneUse()) {

Since this pass runs after GVN, is it possible that what was originally a single-use bitand now has multiple uses?

::: js/src/jit/shared/Assembler-shared.h
@@ +749,5 @@
> +    Scalar::Type type_ : 8;
> +    AnyRegister::Code valueReg_ : 8; // the reg to load into or store from
> +    int32_t imm_;       // the immediate value to store from
> +    LoadOrStore isLoad_ : 8;
> +    bool isImm_;        // for stores, are we storing from register or from immediate

Nit: could you align all the comments to the same column?
Attachment #8540970 - Flags: feedback?(luke) → feedback+
(Assignee)

Comment 6

3 years ago
Created attachment 8542779 [details] [diff] [review]
asmjs-offset-wrapping.patch

This new patch fixes up some things in the old patch, adds MacOS support (which was missing in the first patch, because it has different signal-handling code), and more significantly, adds 32-bit x86 support.

In 32-bit x86, the basic idea is the same, fold away constant offsets so that we don't need explicit add instructions, but the mechanism is different. Instead of fixing things up in a signal handler, we fold the constant offset into the bounds check, and then we add a second-level out-of-line bounds check for the low end (since we can no longer just do a simple unsigned comparison).

This patch passes a variety of tests, though I'm planning to do a bunch more, and to request fuzzing.

(In reply to Luke Wagner [:luke] from comment #5)
> So will the aggregate effect on the problem you reported in comment 0 be
> that all the hoisted addition-plus-constants become dead?

Yes. Fortunately, there's a DCE pass after EffectiveAddressAnalysis which cleans them up.

> ::: js/src/jit/EffectiveAddressAnalysis.cpp
> @@ +96,5 @@
> > +
> > +    if (ptr->isAdd()) {
> > +        MDefinition *op1 = ptr->getOperand(1);
> > +        if (op1->isConstant()) {
> > +            int32_t imm = ptr->getOperand(1)->toConstant()->value().toInt32();
> 
> Here and below, is it necessarily the case that ptr is an
> int-specialized-add and value() is an int32?  It seems like the asm.js type
> system would ensure this at construction time and MIR optimizations
> shouldn't break this invariant, but I just wanted to check.

Yes, in asm.js I believe this is a safe assumption.

> @@ +100,5 @@
> > +            int32_t imm = ptr->getOperand(1)->toConstant()->value().toInt32();
> > +            if (access->addOffset(imm))
> > +                ins->replaceOperand(0, ptr->getOperand(0));
> > +        }
> > +    } else if (ptr->isBitAnd() && ptr->hasOneUse()) {
> 
> Since this pass runs after GVN, is it possible that what was originally a
> single-use bitand now has multiple uses?

Yes, it could have multiple uses. The comment just below, "Since we currently just mutate the BitAnd in place, this requires that we are its only user." addresses this. In theory we could clone the BitAnd, or we could do something fancy and look at all the BitAnd's users, however what's in the patch right now is simple and fairly effective.
Attachment #8540970 - Attachment is obsolete: true
Attachment #8542779 - Flags: review?(luke)
(In reply to Dan Gohman [:sunfish] from comment #6)
> Created attachment 8542779 [details] [diff] [review]
> asmjs-offset-wrapping.patch
> 
> This new patch fixes up some things in the old patch, adds MacOS support
> (which was missing in the first patch, because it has different
> signal-handling code), and more significantly, adds 32-bit x86 support.
> 
> In 32-bit x86, the basic idea is the same, fold away constant offsets so
> that we don't need explicit add instructions, but the mechanism is
> different. Instead of fixing things up in a signal handler, we fold the
> constant offset into the bounds check, and then we add a second-level
> out-of-line bounds check for the low end (since we can no longer just do a
> simple unsigned comparison).

It's a good insight that the low bounds check can be move out of main path.

This should help reduce register pressure on the x86, so would be expected to give a performance improvement for the x86.

The asm.js heap lengths are constrained to make bounds checking fast on the ARM. The (heap.length - offset) comparison will be slower on the ARM because the immediate can not be encoded in a single instruction. It might be possible to speculative compare the ptr to the next smallest immediate that can be encoded in a single instruction on the ARM - code would need to avoid this upper region for efficiency. The opportunities to dehoist the offset are very limited on the ARM anyway without placing the heap at absolute zero, so this might not be worth exploring on the ARM for now.

It's still a slow path optimization without giving consideration to avoiding the bounds checks, so do you have any suggestions for better eliminating the bounds checks?

The JITs can dehoist some of these bounds check as they can bail out, and the JITs can dehoist these offsets too, so Odin is not going to be competitive on heap accesses unless the issue of the bounds checks is addressed.
(Assignee)

Comment 8

3 years ago
(In reply to Douglas Crosher [:dougc] from comment #7)
> (In reply to Dan Gohman [:sunfish] from comment #6)
> > Created attachment 8542779 [details] [diff] [review]
> > asmjs-offset-wrapping.patch
> > 
> > This new patch fixes up some things in the old patch, adds MacOS support
> > (which was missing in the first patch, because it has different
> > signal-handling code), and more significantly, adds 32-bit x86 support.
> > 
> > In 32-bit x86, the basic idea is the same, fold away constant offsets so
> > that we don't need explicit add instructions, but the mechanism is
> > different. Instead of fixing things up in a signal handler, we fold the
> > constant offset into the bounds check, and then we add a second-level
> > out-of-line bounds check for the low end (since we can no longer just do a
> > simple unsigned comparison).
> 
> It's a good insight that the low bounds check can be move out of main path.
> 
> This should help reduce register pressure on the x86, so would be expected
> to give a performance improvement for the x86.
> 
> The asm.js heap lengths are constrained to make bounds checking fast on the
> ARM. The (heap.length - offset) comparison will be slower on the ARM because
> the immediate can not be encoded in a single instruction. It might be
> possible to speculative compare the ptr to the next smallest immediate that
> can be encoded in a single instruction on the ARM - code would need to avoid
> this upper region for efficiency. The opportunities to dehoist the offset
> are very limited on the ARM anyway without placing the heap at absolute
> zero, so this might not be worth exploring on the ARM for now.
> 
> It's still a slow path optimization without giving consideration to avoiding
> the bounds checks, so do you have any suggestions for better eliminating the
> bounds checks?
>
> The JITs can dehoist some of these bounds check as they can bail out, and
> the JITs can dehoist these offsets too, so Odin is not going to be
> competitive on heap accesses unless the issue of the bounds checks is
> addressed.

For the bounds-checking part, I think we could have a guard region after the heap and a signal handler in addition to a bounds check. However, my understanding is that ARM also doesn't have the address mode flexibility the technique in the patch here wants. ARM only has register+register or register+immediate addressing, not register+register+immediate, as one of the registers is the heap base. So while the patch here is great for x86 and x64, I think we'll need to do something else for ARM
(Assignee)

Comment 9

3 years ago
Created attachment 8542959 [details] [diff] [review]
asmjs-offset-wrapping.patch

Updated patch:
 - Fixes x64 with signal-handler OOB handling disabled -- bounds checks need to be repatched when the heap is resized
 - Added code to testZOOB.js to test store-from-immediate cases
 - Rebased so that it doesn't depend on any other patches
Attachment #8542779 - Attachment is obsolete: true
Attachment #8542779 - Flags: review?(luke)
Attachment #8542959 - Flags: review?(luke)
(Assignee)

Updated

3 years ago
Blocks: 876057
(Assignee)

Updated

3 years ago
Blocks: 1116773
(Assignee)

Comment 10

3 years ago
Created attachment 8542986 [details] [diff] [review]
asmjs-offset-wrapping.patch

And another version, with fixes for Mac and Windows. And I generalized the EAA code, taking some ideas from your patch above; recognizing heap[imm+ptr] in addition to heap[ptr+imm] enables a bunch more matching.
Attachment #8542959 - Attachment is obsolete: true
Attachment #8542959 - Flags: review?(luke)
Attachment #8542986 - Flags: review?(luke)

Comment 11

3 years ago
Comment on attachment 8542986 [details] [diff] [review]
asmjs-offset-wrapping.patch

Review of attachment 8542986 [details] [diff] [review]:
-----------------------------------------------------------------

I think I see bugs, but I might not be seeing the whole picture.

::: js/src/asmjs/AsmJSValidate.h
@@ +67,5 @@
> +// might also try to access a few bytes after this limit, so just inflate it by
> +// AsmJSPageSize.
> +static const size_t AsmJSMappedSize = 4 * 1024ULL * 1024ULL * 1024ULL +
> +                                      AsmJSMaxImmediateRange +
> +                                      AsmJSPageSize;

You get a page! You get a page!

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +101,5 @@
> +        MDefinition *op1 = ptr->toAdd()->getOperand(1);
> +        if (op0->isConstantValue())
> +            mozilla::Swap(op0, op1);
> +        if (op1->isConstantValue()) {
> +            int32_t imm = op1->constantValue().toInt32();

IIUC, this only applies for SIMD/heap8 since we'll have MAsmJSLoad(MBitAnd(...)).  Adding a case for MAsmJSLoad(MBitAnd(MAdd(..., J), K)) would make sense, as long as J preserved K-alignment.

@@ +124,5 @@
> +                mozilla::Swap(op0, op1);
> +            if (op1->isConstantValue()) {
> +                uint32_t i = op1->constantValue().toInt32();
> +                uint32_t m = rhs->constantValue().toInt32();
> +                if (((~m + 1) & ~m) == 0 && ((i & m) == i) && ins->tryAddDisplacement(i))

So we have two operations that we're trying to say we can do in either order w/o changing the result: zero the high bits and add to the low bits.  The problem is that the low-bit addition can overflow into the high bits so it matters if you mask first or second.  E.g.
  ((0xf + 1) & 0xf) ==  0x0
  ((0xf & 0xf) + 1) == 0x10
I was trying to think of how the high bits don't matter or are handled by the signal handler, but I don't see it; the difference in order can result in in-bounds vs. out-of-bounds.  Also, the mask can be anything, not necessarily the size of the heap.

::: js/src/jit/shared/Assembler-shared.h
@@ +750,5 @@
> +    Scalar::Type type_ : 8;
> +    AnyRegister::Code valueReg_ : 8; // the reg to load into or store from
> +    int32_t imm_;                    // the immediate value to store from
> +    LoadOrStore isLoad_ : 8;
> +    bool isImm_;                     // for stores, are we storing from register or from immediate

Perhaps storeIsImm_?

@@ +799,5 @@
>      bool hasLengthCheck() const { return cmpDelta_ > 0; }
>      void *patchLengthAt(uint8_t *code) const { return code + (offset_ - cmpDelta_); }
>      unsigned opLength() const { return opLength_; }
> +    bool isLoad() const { return isLoad_ == Load; }
> +    bool isImm() const { return isImm_; }

Similarly, isStoreImm with assert(!isLoad)?

@@ +804,3 @@
>      Scalar::Type type() const { return type_; }
> +    AnyRegister valueReg() const { MOZ_ASSERT(!isImm_); return AnyRegister::FromCode(valueReg_); }
> +    int32_t imm() const { MOZ_ASSERT(isImm_); return imm_; }

Can this also be renamed to storedImm assert !isLoad()?

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +265,5 @@
>  
>      if (ptr->isConstant()) {
>          int32_t ptrImm = ptr->toConstant()->toInt32();
>          MOZ_ASSERT(ptrImm >= 0);
> +        srcAddr = Operand(HeapReg, int32_t(uint32_t(ptrImm) + ins->mir()->offset()));

If ptrImm is INT32_MAX, then could this leave us with a negative displacement.  In that case, iirc, the negative is sign-extended to 64-bit, which would be bad.
Attachment #8542986 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #11)
> Comment on attachment 8542986 [details] [diff] [review]
> asmjs-offset-wrapping.patch
...
> ::: js/src/jit/EffectiveAddressAnalysis.cpp
> @@ +101,5 @@
> > +        MDefinition *op1 = ptr->toAdd()->getOperand(1);
> > +        if (op0->isConstantValue())
> > +            mozilla::Swap(op0, op1);
> > +        if (op1->isConstantValue()) {
> > +            int32_t imm = op1->constantValue().toInt32();
> 
> IIUC, this only applies for SIMD/heap8 since we'll have
> MAsmJSLoad(MBitAnd(...)).  Adding a case for MAsmJSLoad(MBitAnd(MAdd(...,
> J), K)) would make sense, as long as J preserved K-alignment.

Also for 8 bit asm.js heap accesses which have no low bit mask.
 
> @@ +124,5 @@
> > +                mozilla::Swap(op0, op1);
> > +            if (op1->isConstantValue()) {
> > +                uint32_t i = op1->constantValue().toInt32();
> > +                uint32_t m = rhs->constantValue().toInt32();
> > +                if (((~m + 1) & ~m) == 0 && ((i & m) == i) && ins->tryAddDisplacement(i))
> 
> So we have two operations that we're trying to say we can do in either order
> w/o changing the result: zero the high bits and add to the low bits.  The
> problem is that the low-bit addition can overflow into the high bits so it
> matters if you mask first or second.  E.g.
>   ((0xf + 1) & 0xf) ==  0x0
>   ((0xf & 0xf) + 1) == 0x10
> I was trying to think of how the high bits don't matter or are handled by
> the signal handler, but I don't see it; the difference in order can result
> in in-bounds vs. out-of-bounds.  Also, the mask can be anything, not
> necessarily the size of the heap.

This transform is just for the case in which the low bits are being masked, not for the masking of the high bits.

E.g. (~0xf + 1) & 0xf => 1 not 0 so this transform would not be applicable.

Is it necessary to check that the isAdd() is known to be a precise 32 bit truncate-able operation?

Comment 13

3 years ago
(In reply to Douglas Crosher [:dougc] from comment #12)
> This transform is just for the case in which the low bits are being masked,
> not for the masking of the high bits.

D'oh, right, I forgot to invert due to the ~'s.  Right, then this patch makes a bunch more sense and simultaneously addresses my other comment.

Updated

3 years ago
Attachment #8542986 - Flags: review?(luke)

Comment 14

3 years ago
Comment on attachment 8542986 [details] [diff] [review]
asmjs-offset-wrapping.patch

Review of attachment 8542986 [details] [diff] [review]:
-----------------------------------------------------------------

I forgot that you're already planning some further changes to deal with the terrible unaligned-load-signal-handler madness, so I'll just leave f+ and a few other high-level comments I had after rereading the patch:

IIUC, since EAA runs after GVN, we won't get a chance to eliminate the redundant (i&M) exposed by this pass (which I expect would be fairly common due to struct access).  Do you see any downsides to running EAA before GVN/LICM?  It seems like it would help this case and also avoid the hasOneUse potential pitfall mentioned earlier.  The only case where I could imagine GVN helping is if Emscripten was emitting code like: H32[((i+J)+K)&M] (J,K constants) where we want ((i+J)+K) to be folded into ((i+(J+K)) before folding (J+K) into the load.  I don't know if Emscripten would emit this though, but I could imagine so if the +K was introduced in some final pass.  It seems like we could measure the number of emitted add/and nodes with EAA before/after to confirm.  I guess situations like this are why you see multiple CSE passes in LLVM?  I remember Alon wrote a mini-CSE for the Emterpreter to catch low-hanging fruit like this introduced in the final phase; it'd be neat to try that out.

Also, it'd be great to see the impact of this opt on asm.js macro benchmarks.

::: js/src/asmjs/AsmJSModule.cpp
@@ +785,5 @@
>              size_t scalarByteSize = TypedArrayElemSize(access.type());
> +            void *addr = access.patchLengthAt(code_);
> +            uint32_t disp = reinterpret_cast<uint32_t>(X86Assembler::getPointer(addr));
> +            X86Assembler::setPointer(addr,
> +                                     (void*)(heap->byteLength() + 1 - scalarByteSize + disp));

I was confused by the "+ disp" until I went back to CodeGenerator-x86 to see that we store -offset.  I guess "disp" here is "relative to byteLength" and not "the disp of the instruction"?  I guess once you have your brand new disassembler you can just sniff the displacement out of the insn directly w/o these hacks?

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +124,5 @@
> +                mozilla::Swap(op0, op1);
> +            if (op1->isConstantValue()) {
> +                uint32_t i = op1->constantValue().toInt32();
> +                uint32_t m = rhs->constantValue().toInt32();
> +                if (((~m + 1) & ~m) == 0 && ((i & m) == i) && ins->tryAddDisplacement(i))

To save the reader from re-deriving the meaning, could you perhaps factor out an IsLowBitMask or IsOnesFollowedByZeros.  A natural place could be either next to IsPowerOfTwo in jsutil.h or mfbt/MathAlgorithms.h (perhaps moving IsPowerOfTwo along with it).
Attachment #8542986 - Flags: review?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #14)
> Comment on attachment 8542986 [details] [diff] [review]
...
> IIUC, since EAA runs after GVN, we won't get a chance to eliminate the
> redundant (i&M) exposed by this pass (which I expect would be fairly common
> due to struct access).  Do you see any downsides to running EAA before
> GVN/LICM?

EAA does not run on the ARM either, so it's not the best place for these transforms.

For the ARM it is important to make the de-hosting decision before GVN because:

* it might be that only part of the offset can be de-hoisted and the rest might be hoistable. It is common in Emscripten generated code to see accesses with large, but close, offsets repeated, and in these cases a large base can be hoisted and the smaller differences de-hoisted.

* a multi-instruction sequence might be needed and some of the calculations might be hoistable.

There is already a good deal of folding occurring after range analysis, and bug 870743 does elimination of redundant bitand ops then, and then bug 1080477 does the offset dehoisting at the end of range analysis.

I see a need for multiple passes, but range analysis currently makes some transforms than preclude further code movement (for the bounds checking), so it might need some untangling.

Could the transform be done early at asm.js validation?  Bug 915157 makes a lot of decisions during validation, even deciding what portion of the offset can be de-hoisted on the ARM - and this varies depending the on array type.

Here are some run time comparisons using the zlib test. Emscripten aggressive-variable-elimination has been used to dehoist offsets.

Unpatched nightly x86: 47.40 sec
With the above patch:  43.23

So there is a good improvement from this patch for the x86, 43.23/47.40 = 91.20%, and I believe this is due to a lowering of register pressure when the offsets are dehoisted.

Index masking with dehoisting at asm.js validation: 38.54

So the bounds checking still has a significant cost, and index masking does significantly better: 38.54/47.40 = 81.31%

Using the partial constant propagation and late dehoisting: 40.00

This is not quite as good, and something to look into. Not sure yet if it means that dehoisting early is better for this test or if some opportunities have not been well optimized.
(In reply to Douglas Crosher [:dougc] from comment #15)
> (In reply to Luke Wagner [:luke] from comment #14)
> > Comment on attachment 8542986 [details] [diff] [review]
> ...
> > IIUC, since EAA runs after GVN, we won't get a chance to eliminate the
> > redundant (i&M) exposed by this pass (which I expect would be fairly common
> > due to struct access).  Do you see any downsides to running EAA before
> > GVN/LICM?
... 
> Index masking with dehoisting at asm.js validation: 38.54
...
> Using the partial constant propagation and late dehoisting: 40.00

Sorry, the above test did not have dehoisting enabled. With the late dehoisting enabled the result is as good if not better than early dehoisting at valitation: 37.85

This result is just an input into the discussion on early vs late dehoisting, and is only for the x86. Still need to explore why there is a difference here.
(Assignee)

Comment 17

3 years ago
Created attachment 8548554 [details] [diff] [review]
asmjs-offset-wrapping.patch

Here's an updated patch. I believe I've addressed all the review comments, and fixes a few additional bugs. Also, significantly, this patch includes an x86/x64 disassembler which is smart enough to handle all the heap accesses that Odin uses. It's a little rough still, but I'm posting this patch now to get feedback. The disassembler allows us to get the real effective address of the heap access (because both Mac and Linux sometimes give us something else), and it allows us to keep the AsmJSHeapAccess structure fairly minimal.

I've so far resisted the temptation to design a DSL or a system of tables for defining ISA information, however if this disassembler ever grows beyond its present limited purpose, that will be something to reconsider.

The EAA code is a little pessimistic with its hasOneUse() checks. I plan to generalize it in subsequent patches.

Good spot on the constant address plus offset overflow. I've now fixed it so that we don't fold offsets in that case.

> I was confused by the "+ disp" until I went back to CodeGenerator-x86 to see
> that we store -offset.  I guess "disp" here is "relative to byteLength" and
> not "the disp of the instruction"?  I guess once you have your brand new
> disassembler you can just sniff the displacement out of the insn directly
> w/o these hacks?

The disassembler doesn't actually help this problem much. The signal handler still needs to know where the compare immediate is so that it can do a patch, and once it knows that, it can just read the disp field out of it. One thing we could do though is invert the disp so that we store the plain offset and then subtract it in the signal handler instead of add it, if you think that's clearer.
Attachment #8542986 - Attachment is obsolete: true

Comment 18

3 years ago
Exciting stuff!  In case you weren't going to do it already, it'd be nice to split the disassembler into a separate patch and then have another separate patch that refactors AsmJSSignalHandler to use the disassembler.
(Assignee)

Comment 19

3 years ago
Created attachment 8551903 [details] [diff] [review]
asmjs-offset-wrapping.patch

Here's a new snapshot of the patch. A bunch of misc fixes, some more overflow paranoia, and I reorganized the heap length patching so that it's more clear what's stored in the displacement field before patching, and what the patching does.

I definitely need to split it up, as it's awkward to review in its current monolithic state. I also still need to tidy up the disassembler code, think more about integer overflows, and add more tests.
Attachment #8548554 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Depends on: 1125185
(Assignee)

Updated

3 years ago
Depends on: 1125202
(Assignee)

Updated

3 years ago
Depends on: 1125236
(Assignee)

Comment 20

3 years ago
Created attachment 8553866 [details] [diff] [review]
asmjs-constant-offsets.patch

Ok, with all the assembler/disassembler parts split out into separate patches, and a variety of other updates, here's the new asm.js-specific portion of this feature.
Attachment #8553866 - Flags: review?(luke)
(Assignee)

Updated

3 years ago
Depends on: 1113338
(Assignee)

Comment 21

3 years ago
Comment on attachment 8553866 [details] [diff] [review]
asmjs-constant-offsets.patch

This patch will need to be updated once bug 1113338 -- partial loads and stores -- lands, so no r? for now.
Attachment #8553866 - Flags: review?(luke) → feedback?(luke)

Comment 22

3 years ago
Comment on attachment 8553866 [details] [diff] [review]
asmjs-constant-offsets.patch

Review of attachment 8553866 [details] [diff] [review]:
-----------------------------------------------------------------

Really looking forward to this patch.  From irc, also quite excited about whatever you settle on for how to do the &/+ reassocation before/during GVN to expose the & to elimination.

::: js/src/asmjs/AsmJSModule.cpp
@@ +833,5 @@
>          }
>      }
> +#elif defined(JS_CODEGEN_X64)
> +    if (maybePrevBuffer) {
> +        int32_t heapLength = int32_t(intptr_t(maybePrevBuffer->byteLength()));

If you hoist here, can you hoist in the x86 case too for symmetry?

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +95,5 @@
> +    // Test whether m is just leading ones and trailing zeros.
> +    return (-m & ~m) == 0;
> +}
> +
> +template<typename MAsmJSHeapAccessTy>

"Ty" seems neither here nor there; how about "T" or "Type"?

@@ +110,5 @@
> +            // If both operands are constant, their addition may overflow the
> +            // displacement field, which we can't handle in codegen. Fortunately,
> +            // such things should usually be folded well before we get here.
> +            if (op1->isConstantValue())
> +                return;

Is there a chance that, after this pass (esp if you fold this into GVN), the non-const op could become const, even though we've added a displacement?  If there is wraparound, it seems like we could just clamp to UINT32_MAX (on x64) or hardcode a jump to oob (on x86).
Attachment #8553866 - Flags: feedback?(luke) → feedback+
It might be of interest to know how V8 turbo-fan currently optimizes these.

1. At the Typer phase (like Range Analysis) the nodes parallel the source code with addition and right shift nodes. The index scaling inherent in some access operations has not yet been expanded into a left shift or mask. Value numbering occurs for the first time in this stage, and in a few subsequent stages.

2. The Simplified Lowering phase makes an explicit node for the index scale - a left shift. There are 'reducer' operations applied that eliminate the shift and these also help with a masked index and move an offset closer to the access operation. This phase also applies value numbering. Some relevant transforms are:
      (x + (K << L)) & (-1 << L) => (x & (-1 << L)) + (K << L)
      (x >> K) << K => x & ~(2^K - 1)
      (x & K) & K => x & K

3. The Change Lowering phase repeats some of the above operations and again applies value numbering.

4. Pattern matching combines an offset into an access instruction addressing mode at a later stage when emitting instructions, and this also matches a scaled index. On the x86 the offset is combined with the constant base. The x64 has some issues to work through but it looks close.

The final pattern matching looks too late for the ARM and it might need to be a little earlier - when still going value numbering. However it is interesting that value number is repeated and after deriving type information whereas GVN appears to only be applied once in Ion and before Range Analysis. V8 turbon-fan has also defined a convention for transforms and at various stages.
(Assignee)

Updated

3 years ago
Attachment #8551903 - Attachment is obsolete: true
(Assignee)

Comment 24

3 years ago
Created attachment 8562241 [details] [diff] [review]
movslq.patch

This just renames movsxd to movslq, to be more consistent with AT&T style (GAS does accept movsxd, though llvm-mc does not, and it's typically disassembled as movslq).
Attachment #8562241 - Flags: review?(benj)
(Assignee)

Comment 25

3 years ago
Created attachment 8562250 [details] [diff] [review]
asmjs-constant-offsets.patch

The main patch, rebased on top of the partial loads/stores patch and the throw-on-OOB patch.

Throwing on OOB simplifies several things. Partial loads/stores complicate some things; particularly XYZ stores, since they go as two store instructions, and we don't want to do one part of a store and have it succeed if the other part of the store will fail. This requires a field in AsmJSHeapAccess and awareness in the signal handler.

I also implemented yet another approach to constant addresses, to address your concern above. We now fold constant addresses as offsets in EAA, before register allocation, so that the CodeGenerator isn't stuck having to do an add which it can't always do.

x64 no-signals mode was straightforward, with one exception. When there's an offset, there exists some index value near UINT32_MAX such that adding the offset to it causes it to wrap around the uint32_t space and end up back in bounds. The OffsetBoundsCheck mechanism handles this case. But on x64, the address mode ends up adding an index like 0x00000000ffffffff and an offset like 1 in a full 64-bit addition, producing 0x000000010000000 rather than the desired 0x0000000000000000. I fixed this by having the OffsetBoundsCheck sign-extend the index in place (the movslq) and then branch back to perform the heap access. This works, but it breaks the invariant that all "int32" registers always have their high 32 bits zeroed in asm.js code, which may be undesirable. I'm not yet convinced the approach here is best; I'm just including it for discussion.
Attachment #8553866 - Attachment is obsolete: true
Attachment #8562250 - Flags: feedback?(luke)

Comment 26

3 years ago
For the last issue, I think it'd be fine to make JS_NO_SIGNALS on x64 be semantically incorrect in out-of-bounds cases; I'd like to remove JS_NO_SIGNALS anyway.  Having a non-zero high word of an int32 reg would, I believe be a security problem if that int32 was used as the index of an MAsmJS(Load|Store)Heap since the index register is necessarily interpreted as an int64.
Comment on attachment 8562250 [details] [diff] [review]
asmjs-constant-offsets.patch

Review of attachment 8562250 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +12145,5 @@
> +
> +    bool tryAddDisplacement(int32_t o) {
> +        // Compute the new offset. Check for overflow and negative.
> +        int32_t newOffset = uint32_t(offset_) + o;
> +        if (newOffset < 0)

What is the offset_ is already negative or 'o' is negative making the newOffset negative?

Could 64 bit integers be used in these checks if necessary to support negative offsets?
Comment on attachment 8562241 [details] [diff] [review]
movslq.patch

Review of attachment 8562241 [details] [diff] [review]:
-----------------------------------------------------------------

Cool

::: js/src/jit/shared/BaseAssembler-x86-shared.h
@@ +1891,5 @@
>          m_formatter.oneByteOp64(OP_MOV_EAXIv, dst);
>          m_formatter.immediate64(imm);
>      }
>  
> +    void movslq_rr(RegisterID src, RegisterID dst)

wait, was it unused before this patch?!

@@ +1898,1 @@
>          m_formatter.oneByteOp64(OP_MOVSXD_GvEv, src, dst);

I'm fine with keeping MOVSXD in the OP_ name here, as this is how it is referred to in Intel's docs.  Just want to make sure you've done it on purpose?
Attachment #8562241 - Flags: review?(benj) → review+
(Assignee)

Comment 29

3 years ago
(In reply to Douglas Crosher [:dougc] from comment #27)
> ::: js/src/jit/MIR.h
> @@ +12145,5 @@
> > +
> > +    bool tryAddDisplacement(int32_t o) {
> > +        // Compute the new offset. Check for overflow and negative.
> > +        int32_t newOffset = uint32_t(offset_) + o;
> > +        if (newOffset < 0)
> 
> What is the offset_ is already negative or 'o' is negative making the
> newOffset negative?

offset_ won't ever be negative in the current patch; I'll add an assert for that. It's ok for o to be negative -- if we've already folded a positive offset, we can fold a negative offset on top of that as long as it doesn't make the final result offset negative.

> Could 64 bit integers be used in these checks if necessary to support
> negative offsets?

The problem with negative offsets is that our bounds checking mechanisms don't know how to deal with it yet. On x64, we currently only have a guard region after the heap, not before, and it'd take some amount of work to change this.
(Assignee)

Comment 30

3 years ago
(In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> ::: js/src/jit/shared/BaseAssembler-x86-shared.h
> @@ +1891,5 @@
> >          m_formatter.oneByteOp64(OP_MOV_EAXIv, dst);
> >          m_formatter.immediate64(imm);
> >      }
> >  
> > +    void movslq_rr(RegisterID src, RegisterID dst)
> 
> wait, was it unused before this patch?!

Yeah. It turns out that Odin/Ion never actually sign-extends a 32-bit value to 64-bit. I was considering doing it in this bug, though Luke's right in comment 26, so I've now dropped it. We can delete this code altogether if you want, though it seems ok to leave in in case someone else needs it in the future.

> @@ +1898,1 @@
> >          m_formatter.oneByteOp64(OP_MOVSXD_GvEv, src, dst);
> 
> I'm fine with keeping MOVSXD in the OP_ name here, as this is how it is
> referred to in Intel's docs.  Just want to make sure you've done it on
> purpose?

Yep. It's not pretty having differing names, but I think this makes sense given how everything else is structured right now.
(Assignee)

Comment 31

3 years ago
Created attachment 8562851 [details] [diff] [review]
asmjs-constant-offsets.patch

This version adds SIMD full and partial load and store support in testZOOB.js. I also reorganized testZOOB.js and split out the error testcases into testAddressErrors.js to reduce clutter and speed it up.

I also removed the sign-extend hack, following comment 26. This means that x64 with JS_NO_SIGNALS does not handle accesses where index+offset wraps around to be in bounds correctly.
Attachment #8562250 - Attachment is obsolete: true
Attachment #8562250 - Flags: feedback?(luke)
Attachment #8562851 - Flags: review?(luke)
(In reply to Dan Gohman [:sunfish] from comment #29)
> (In reply to Douglas Crosher [:dougc] from comment #27)
...
> > What is the offset_ is already negative or 'o' is negative making the
> > newOffset negative?
> 
> offset_ won't ever be negative in the current patch; I'll add an assert for
> that. It's ok for o to be negative -- if we've already folded a positive
> offset, we can fold a negative offset on top of that as long as it doesn't
> make the final result offset negative.
> 
> > Could 64 bit integers be used in these checks if necessary to support
> > negative offsets?
> 
> The problem with negative offsets is that our bounds checking mechanisms
> don't know how to deal with it yet. On x64, we currently only have a guard
> region after the heap, not before, and it'd take some amount of work to
> change this.

There is still the case in which the index is proven to be within bounds, and it might still be useful to accept a negative offset for this case even without guard zones.
(In reply to Dan Gohman [:sunfish] from comment #31)
> Created attachment 8562851 [details] [diff] [review]
> asmjs-constant-offsets.patch
...
> I also removed the sign-extend hack, following comment 26. This means that
> x64 with JS_NO_SIGNALS does not handle accesses where index+offset wraps
> around to be in bounds correctly.

Such a semantic difference might be a fingerprinting privacy issue and might need review on this basis - unless the JS_NO_SIGNALS support is planned to be removed before this difference makes it to a release. You should think about other JS implementations too - none appear to use this protected guard zone strategy but will need to be efficient with the same code patterns and may not want to have a semantic difference for this edge case.

It appears that the complexity for dealing with this case is in the OOB path so it might be practical to handle this edge case correctly. To avoid too much code bloat, might it be possible for the OOB path to branch to some common code that uses the return address to disassemble the code sequence and emulate the load/store operation? This could save and restore the state to avoid destructively modifying the ptr register.
(Assignee)

Comment 34

3 years ago
Comment on attachment 8562851 [details] [diff] [review]
asmjs-constant-offsets.patch

Cancelling the review for now, as Luke pointed out that it mistakes the meaning of needsBoundsCheck() on asm.js heap accesses.
Attachment #8562851 - Flags: review?(luke)
(Assignee)

Comment 35

3 years ago
The movslq assembler patch: https://hg.mozilla.org/integration/mozilla-inbound/rev/f8556dc2b985
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/f8556dc2b985
(Assignee)

Updated

3 years ago
Whiteboard: [leave open]
(Assignee)

Updated

3 years ago
Depends on: 1122856
(Assignee)

Comment 37

3 years ago
Created attachment 8566919 [details] [diff] [review]
asmjs-constant-offsets.patch

Updated patch, on top of the patch for bug 1116773 to fix the needBoundsCheck() ambiguity.

This patch preserves full support for JS_NO_SIGNALS by using the movslq trick, and zeroing out the high halves of the registers afterwards to preserve asm.js invariants.
Attachment #8562851 - Attachment is obsolete: true
(Assignee)

Comment 38

3 years ago
Comment on attachment 8566919 [details] [diff] [review]
asmjs-constant-offsets.patch

The needsBoundsCheck() issues are addressed, so submitting for review again. Thanks for your patience!
Attachment #8566919 - Flags: review?(luke)
(Assignee)

Comment 39

3 years ago
Created attachment 8567074 [details] [diff] [review]
asmjs-constant-offsets.patch

Now with refactoring to put all of the bounds-check branch codegen code in common helper functions, which also helped expose some inconsistencies between the different versions of the code.

This also undoes the recursion in x86's emitSimdLoad and emitSimdStore. I know that was my crazy idea, but it turned out to be less useful than I earlier thought.
Attachment #8566919 - Attachment is obsolete: true
Attachment #8566919 - Flags: review?(luke)
Attachment #8567074 - Flags: review?(luke)

Comment 40

3 years ago
Comment on attachment 8567074 [details] [diff] [review]
asmjs-constant-offsets.patch

Review of attachment 8567074 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!  I really appreciate all the refactoring and general clean code.

On general request: I was wondering if it'd be simpler and also save some codesize (you'd be surprised the total amount of space spent on these out-of-bounds paths) to, instead of reusing OutOfLineLoadTypedArrayOutOfBounds so that we need two OOL paths, have a single OutOfLineAsmJSHeapOutOfBounds that does both the offset check (if non-zero offset) and handles the out-of-bounds case.

::: js/src/asmjs/AsmJSModule.cpp
@@ +786,5 @@
>      heapDatum() = heap->dataPointer();
>  
>  #if defined(JS_CODEGEN_X86)
>      uint8_t *heapOffset = heap->dataPointer();
> +    uint32_t heapLength = uint32_t(uintptr_t(heap->byteLength()));

(Here and bellow) can we just have: 'uint32_t heapLength = heap->byteLength()' since byteLength is non-negative and necessarily <4gb?  When we go past that we'll need to change all this code anyway.

::: js/src/asmjs/AsmJSSignalHandlers.cpp
@@ +596,5 @@
> +#if defined(JS_CODEGEN_X64)
> +    // Check x64 asm.js heap access invariants.
> +    MOZ_ASSERT(address.disp() >= 0);
> +    MOZ_ASSERT(address.base() == HeapReg.code());
> +    MOZ_ASSERT(address.scale() == 0);

How about making the new pc an outparam, returning a 'bool' indicating "everything is ok" and changing all these asserts into real dynamic checks.  That way, if anything doesn't check out, we safely crash and get a crash report.

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +102,5 @@
> +{
> +    MDefinition *ptr = ins->ptr();
> +
> +    if (ptr->isConstantValue()) {
> +        // Look for heap[i] where i is a constant offset, and fold the offset.

Perhaps a short explanation of why this is good/assumed by x64 lowering/codegen?

::: js/src/jit/MIR.h
@@ +12152,5 @@
> +        // AsmJSCheckedImmediateRange. Otherwise, just keep it within what
> +        // the instruction set can support.
> +        if (size_t(newEnd) > (needsBoundsCheck()
> +                              ? AsmJSCheckedImmediateRange
> +                              : AsmJSImmediateRange))

Pardon by uber-nitting, but if you have a separate 'int32_t maxEnd = needsBoundsCheck() ? AsmJSCheckImmediateRange : AsmJSImmediateRange;' then you avoid ugly multi-line condition and requisite curlies.

::: js/src/jit/shared/Assembler-shared.h
@@ +777,5 @@
> +#if defined(JS_CODEGEN_X86)
> +    uint8_t opLength_;  // the length of the load/store instruction
> +#endif
> +#if defined(JS_CODEGEN_X64)
> +    uint8_t accessOffset_; // if is this e.g. the Z of an XYZ

Do you think we'll ever use this for anything other than the Z special case?  If not, then perhaps we could just have a bool isSimdZLoad()?  If so (perhaps more cases in the future from new vector types), then let's give it a very specific name like offsetWithinWholeSimdVector_.

@@ +822,5 @@
>  
>      uint32_t offset() const { return offset_; }
>      void setOffset(uint32_t offset) { offset_ = offset; }
>  #if defined(JS_CODEGEN_X86)
>      void *patchOffsetAt(uint8_t *code) const { return code + (offset_ + opLength_); }

'offset' was never a great name, but now it seems even more confusing since we have multiple offsets at play.  Since you're touching the relevant areas anyway, could you perhaps rename offset(_) to insnOffset(_) and rename patchOffsetAt() to patchHeapPtrImmAt() (alternate names welcome)?

::: js/src/jit/shared/CodeGenerator-x86-shared.cpp
@@ +409,5 @@
> +
> +    Label *pass = nullptr;
> +
> +    // If we have a non-zero offset, it's possible that |ptr| itself is out of
> +    // bounds, while adding the offset computs an in-bounds address. To catch

computes

::: js/src/jit/shared/CodeGenerator-x86-shared.h
@@ +73,5 @@
> +    bool needsBranch(const MAsmJSHeapAccess *mir) const;
> +    MOZ_WARN_UNUSED_RESULT
> +    uint32_t emitBranchFor(const MAsmJSHeapAccess *mir, const MInstruction *ins,
> +                           Register ptr, Label *fail);
> +    void cleanupAfterBranch(const MAsmJSHeapAccess *mir, Register ptr);

Could you inject "AsmJSOutOfBounds" into each of these methods?

::: js/src/jit/x64/Architecture-x64.h
@@ +262,5 @@
>      return false;
>  }
>  
> +// Support some constant-offset addressing. Note that this must stay within
> +// AsmJSMaxCheckedImmediateRange.

Can you statically assert instead (on all archs)?  It'd also be nice to have (again on all archs) a comment that says "See AsmJSMaxCheckedImmediateRange comment in AsmJSValidate.h" and then put a nice general comment there explaining the meaning and difference between AsmJSImmediateRange and AsmJSCheckedImmediateRange.

::: js/src/jit/x64/Lowering-x64.cpp
@@ +153,5 @@
>      MDefinition *ptr = ins->ptr();
>      MOZ_ASSERT(ptr->type() == MIRType_Int32);
>  
> +    // For simplicity (and since we don't care about getting maximum performance
> +    // in these cases) only allow constant operands when skipping bounds checks.

(Here and below) I think this comment is out of date; perhaps now should reference EAA comment.

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +453,1 @@
>          before = masm.size();

Pre-existing, but I think it'd be cleaner to dehoist the 'uint32_t before = masm.size();' into both branches (also in the store case).

@@ +500,1 @@
>          loadAndNoteViewTypeElement(accessType, srcAddr, out);

Pre-existing, but I don't think (load|store)AndNoteViewTypeElement is buying us much in the way of code factoring and it often bugs me b/c I don't know what it does; perhaps we can remove it, inlining it into the few callers?

@@ +735,5 @@
>          memoryBarrier(ins->mir()->barrierAfter());
>          return;
>      }
>  
> +    Label *rejoin = GetJitContext()->temp->lifoAlloc()->new_<Label>();

Would 'new (alloc()) Label()' work instead?
Attachment #8567074 - Flags: review?(luke) → review+
(Assignee)

Comment 41

3 years ago
Created attachment 8567519 [details] [diff] [review]
asmjs-constant-offsets.patch

Carrying forward r+. I'm planning to wait until after the uplift to land this. Until then, here's the updated patch with comments addressed.

(In reply to Luke Wagner [:luke] from comment #40)
> Comment on attachment 8567074 [details] [diff] [review]
> asmjs-constant-offsets.patch
> 
> Review of attachment 8567074 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Excellent!  I really appreciate all the refactoring and general clean code.
> 
> On general request: I was wondering if it'd be simpler and also save some
> codesize (you'd be surprised the total amount of space spent on these
> out-of-bounds paths) to, instead of reusing
> OutOfLineLoadTypedArrayOutOfBounds so that we need two OOL paths, have a
> single OutOfLineAsmJSHeapOutOfBounds that does both the offset check (if
> non-zero offset) and handles the out-of-bounds case.

It would save a slow-path branch per access with non-zero offset on x86. Which is probably useful, though I'm inclined to do it in another patch.

> ::: js/src/asmjs/AsmJSSignalHandlers.cpp
> @@ +596,5 @@
> > +#if defined(JS_CODEGEN_X64)
> > +    // Check x64 asm.js heap access invariants.
> > +    MOZ_ASSERT(address.disp() >= 0);
> > +    MOZ_ASSERT(address.base() == HeapReg.code());
> > +    MOZ_ASSERT(address.scale() == 0);
> 
> How about making the new pc an outparam, returning a 'bool' indicating
> "everything is ok" and changing all these asserts into real dynamic checks. 
> That way, if anything doesn't check out, we safely crash and get a crash
> report.

As discussed on IRC, I chagned the code to use MOZ_RELEASE_ASSERT for this.

> ::: js/src/jit/x64/Architecture-x64.h
> @@ +262,5 @@
> >      return false;
> >  }
> >  
> > +// Support some constant-offset addressing. Note that this must stay within
> > +// AsmJSMaxCheckedImmediateRange.
> 
> Can you statically assert instead (on all archs)?  It'd also be nice to have
> (again on all archs) a comment that says "See AsmJSMaxCheckedImmediateRange
> comment in AsmJSValidate.h" and then put a nice general comment there
> explaining the meaning and difference between AsmJSImmediateRange and
> AsmJSCheckedImmediateRange.

Actually, this was all just an attempt to avoid a #include dependency. It now seems better to just do the #include, and get rid of AsmJSMaxCheckedImmediateRange altogether.

> ::: js/src/jit/x86/CodeGenerator-x86.cpp
> @@ +453,1 @@
> >          before = masm.size();
> 
> Pre-existing, but I think it'd be cleaner to dehoist the 'uint32_t before =
> masm.size();' into both branches (also in the store case).
> 
> @@ +500,1 @@
> >          loadAndNoteViewTypeElement(accessType, srcAddr, out);
> 
> Pre-existing, but I don't think (load|store)AndNoteViewTypeElement is buying
> us much in the way of code factoring and it often bugs me b/c I don't know
> what it does; perhaps we can remove it, inlining it into the few callers?

Fixing these two things led to a significant refactoring, as it also led me to remove the special constant-pointer code from x86 (EAA now folds these into offsets). The end result is that the x86 code now looks a lot more like the x64 code.

> @@ +735,5 @@
> >          memoryBarrier(ins->mir()->barrierAfter());
> >          return;
> >      }
> >  
> > +    Label *rejoin = GetJitContext()->temp->lifoAlloc()->new_<Label>();
> 
> Would 'new (alloc()) Label()' work instead?

No, because Label isn't a TempObject, though offhand I don't know why.
Attachment #8567074 - Attachment is obsolete: true
Attachment #8567519 - Flags: review+
(In reply to Dan Gohman [:sunfish] from comment #41)
> > > +    Label *rejoin = GetJitContext()->temp->lifoAlloc()->new_<Label>();
> > 
> > Would 'new (alloc()) Label()' work instead?
> 
> No, because Label isn't a TempObject, though offhand I don't know why.

Then can you use alloc().lifoAlloc()->new_<Label>()? That's faster and we may kill GetJitContext() at some point.
(Assignee)

Comment 43

3 years ago
(In reply to Jan de Mooij [:jandem] from comment #42)
> Then can you use alloc().lifoAlloc()->new_<Label>()? That's faster and we
> may kill GetJitContext() at some point.

That works, so I also updated the code in CodeGenerator.cpp that I had copied from. Thanks!

https://hg.mozilla.org/integration/mozilla-inbound/rev/11a0fa1a0122
(Assignee)

Updated

3 years ago
Blocks: 1136276
https://hg.mozilla.org/mozilla-central/rev/11a0fa1a0122
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox39: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Depends on: 1136551
Depends on: 1138205
You need to log in before you can comment on or make changes to this bug.