Closed Bug 1210554 Opened 9 years ago Closed 9 years ago

ARM64: Branch target out of range after inserting constant pools

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: jolesen, Assigned: jolesen)

References

Details

Attachments

(11 files, 4 obsolete files)

7.31 KB, patch
sfink
: review+
Details | Diff | Splinter Review
9.36 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
7.05 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
16.80 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
20.98 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
2.17 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
1.49 KB, patch
sfink
: review+
Details | Diff | Splinter Review
15.28 KB, patch
Details | Diff | Splinter Review
14.43 KB, patch
nbp
: review+
Details | Diff | Splinter Review
12.27 KB, patch
Details | Diff | Splinter Review
15.47 KB, patch
Details | Diff | Splinter Review
Build SpiderMonkey with --enable-sim=arm64.
$  dist/bin/js --baseline-eager ../js/src/jit-test/tests/debug/Script-getLineOffsets-06.js 
Assertion failure: is_int19(imm19), at /Users/jolesen/gecko-dev/js/src/jit/arm64/vixl/Assembler-vixl.h:1638
Segmentation fault: 11

It is a branch target that goes out of range when incorporating the sizes of constant pools:

$ lldb -- dist/bin/js --baseline-eager ../js/src/jit-test/tests/debug/Script-getLineOffsets-06.js 
(lldb) target create "dist/bin/js"
Current executable set to 'dist/bin/js' (x86_64).
(lldb) settings set -- target.run-args  "--baseline-eager" "../js/src/jit-test/tests/debug/Script-getLineOffsets-06.js"
(lldb) r
Process 90592 launched: '/Users/jolesen/gecko-dev/obj-a64simdev/dist/bin/js' (x86_64)
Assertion failure: is_int19(imm19), at /Users/jolesen/gecko-dev/js/src/jit/arm64/vixl/Assembler-vixl.h:1638
Process 90592 stopped
* thread #1: tid = 0xb0c0d, 0x0000000100a4d323 js`vixl::Assembler::ImmCondBranch(imm19=313127) + 67 at Assembler-vixl.h:1638, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100a4d323 js`vixl::Assembler::ImmCondBranch(imm19=313127) + 67 at Assembler-vixl.h:1638
   1635	  }
   1636	
   1637	  static Instr ImmCondBranch(int imm19) {
-> 1638	    VIXL_ASSERT(is_int19(imm19));
   1639	    return truncate_to_int19(imm19) << ImmCondBranch_offset;
   1640	  }
   1641	
(lldb) p/x imm19
(int) $0 = 0x0004c727
(lldb) bt
* thread #1: tid = 0xb0c0d, 0x0000000100a4d323 js`vixl::Assembler::ImmCondBranch(imm19=313127) + 67 at Assembler-vixl.h:1638, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100a4d323 js`vixl::Assembler::ImmCondBranch(imm19=313127) + 67 at Assembler-vixl.h:1638
    frame #1: 0x0000000100a43ea4 js`vixl::Assembler::b(at=0x0000000105ecace8, imm19=313127, cond=eq) + 36 at MozAssembler-vixl.cpp:118
    frame #2: 0x0000000100a454b7 js`vixl::MozBaseAssembler::RetargetNearBranch(i=0x0000000105ecace8, byteOffset=1252508, final=false) + 247 at MozAssembler-vixl.cpp:497
    frame #3: 0x0000000100a021a5 js`js::jit::AssemblerBufferWithConstantPools<1024ul, 4ul, vixl::Instruction, vixl::MozBaseAssembler>::patchBranch(this=0x00007fff5fbf8e30, i=0x0000000105ecace8, curpool=842, branch=(offset = 1828)) + 405 at IonAssemblerBufferWithConstantPools.h:891
    frame #4: 0x00000001009cc268 js`js::jit::AssemblerBufferWithConstantPools<1024ul, 4ul, vixl::Instruction, vixl::MozBaseAssembler>::executableCopy(this=0x00007fff5fbf8e30, dest_="\x9f#") + 472 at IonAssemblerBufferWithConstantPools.h:914
    frame #5: 0x00000001009b5341 js`js::jit::Assembler::executableCopy(this=0x00007fff5fbf8d68, buffer="\x9f#") + 49 at Assembler-arm64.cpp:143
    frame #6: 0x000000010078ca28 js`js::jit::JitCode::copyFrom(this=0x0000000105ca3520, masm=0x00007fff5fbf8d68) + 72 at Ion.cpp:793
    frame #7: 0x000000010080b4cd js`js::jit::JitCode* js::jit::Linker::newCode<(js::AllowGC)1>(this=0x00007fff5fbf8b38, cx=0x0000000103745800, kind=BASELINE_CODE) + 637 at Linker.h:72

In the patchBranch() frame, the branch offset crosses the permitted +/- 1 MB boundary for an arm64 conditional branch:

(lldb) p/x Assembler::GetBranchOffset(ci)
(ptrdiff_t) $7 = 0x00000000000d0f28
(lldb) p/x offset
(ptrdiff_t) $8 = 0x0000000000131c9c

The difference is the accumulated size of constant pools inserted between the branch and its target.
See the discussion in bug 1207827. Constant pools and branches with limited range should be dealt with together.
Alright, here's the approach I am going with for handling limited-range branches in ARM64:

- Assembling a branch to a bound label is the easy case: If the label is in range, branch directly to the label, otherwise generate two instructions:

  b.cond !cond, skip
  b label
skip:

This is already implemented in MacroAssembler::B(Label* label, Condition cond).

- Assembling a branch to an unbound label is trickier. We don't know if the label will be in range of the branch when it is bound. Optimistically assume that the label will be in range, and emit a short-range branch. Also register the branch deadline with the AssemblerBuffer which will detect when the branch is about to go out out of range.

Before the branch goes out of range, AssemblerBuffer will emit a branch veneer (preferably along with a constant pool). The branch veneer is an unconditional branch targeting the still unbound label:

original_branch:
  b.cond veneer

  ... lots of data

veneer:
  b label

The linked list of branches associated with the unbound label will contain both the original short-range branch and the veneer branch. Assembler::bind() can tell them apart because it won't be able to patch the short-range branch.
Assignee: nobody → jolesen
Also minor fixes to the AssemblerBuffer class:

- Tighten encapsulation / data hiding.
- Use consistent types size_t + void* for raw byte data.
This is the data structure that will be used to keep track of
unresolved forward short-range branches.
AssemblerBufferWithConstantPools geta a branchDeadlines_ member which
keeps track of forward branch to unbound labels.

Add a hasSpaceForInsts() method which collects the logic for checking
for available space in one place. Insert a constant pool both when
constant pool loads are about to go out of range, and when short-range
branch deadlines are about to expire.

Add registerBranchDeadline() and unregisterBranchDeadline() methods
that the assembler will use to add and remove branches to be tracked.
Test the existing functionality of AssemblerBufferWithConstantPools
using a fake ISA that is much more constrained than ARM and ARM64.

Documant the Assembler callback that are required to use
AssemblerBufferWithConstantPools, and implement mock versions for the
unit test.
This is the second part of the short branch handling in
AssemblerBufferWithConstantPools. The PatchShortRangeBranchToVeneer()
callback is called from finishPool() to patch short-range branches that
are about to expire.

Implement no-op versions of the callback for ARM and ARM64. These
versions will never be called as long as no short-line branches are
registered. They only exist to prevent linker errors in unoptimized
builds. In an optimized build, the unused function calls will be
optimized out because DeadlineSet<0>::empty() is hardwired to return
true.
We already have an ARM64 ImmBranchType which classifies the branch
instructions in the ISA. The /range/ classification is required because
we need unique small integers to pass to
AssemblerBufferWithConstantPool::registerBranchDeadline(). The b.cond
and cbz instructions have the same range, but different branch types.

Classify the 32 KB and 1 MB range branches as 'short-range'. Request
these branch ranges to be tracked by the new
AssemblerBufferWithConstantPools::NumShortBranchRanges faclity.

Also add two functions for computing the maximum forward and backward
reach of branches given their range enumerator.
Instead of storing byte offsets in the branch instructions using a
label, store instruction offsets, just like the finished branches do.
Use a 0 pc offset to terminate the linked list instead of -1.

This increases the maximum distance between linked branches to be the
same as the range of the branch instrructions. Previously, the
supported range was only 1/4 of what the branch instructions can
encode.

Provide protected functions for manipulating the linked list in
MozBaseAssembler, and rewrite Assembler::bind() and retarget() to use
them instead of decoding branches manually.

Move the LinkAndGet*OffsetTo functions into MozBaseAssembler. Our
version of these functions is completely different from the VIXL
versions.
Add a branch range argument to LinkAndGetOffsetTo(): ARM64 branches
can't encode arbitrary ranges, so the linked list of unbound label uses
needs some consideration. We can't assume that a newly assembled branch
instruction will be able to point backwards to label->offset().

Change LinkAndGetOffsetTo() to a normal function instead of a template.
We don't need the code duplication just to apply different scale
factors. Throw the premature microoptimizers a bone by replacing the
element_size template argument with its logarithm.

Implement Assembler::PatchShortRangeBranchToVeneer() to insert the
veneer branch after the original short-range branch in the linked list
of uses of the unbound label.

Fix Assembler::bind() to understand that not all branches can reach the
label. Verify that these branches jump to a veneer instead.

Register short-range branches in LinkAndGetOffsetTo(), and unregister
them again in Assembler::bind().
When handed a call that had been disabled by ToggleCall(), this
function would crash.
Attachment #8685645 - Flags: review?(sphink)
Attachment #8685646 - Flags: review?(sstangl)
Attachment #8685647 - Flags: review?(nicolas.b.pierron)
Attachment #8685648 - Flags: review?(nicolas.b.pierron)
Attachment #8685649 - Flags: review?(nicolas.b.pierron)
Attachment #8685650 - Flags: review?(nicolas.b.pierron)
Attachment #8685651 - Flags: review?(sstangl)
Attachment #8685652 - Flags: review?(sstangl)
Attachment #8685653 - Flags: review?(sstangl)
Attachment #8685654 - Flags: review?(sstangl)
Attachment #8685645 - Flags: review?(sphink) → review+
The js::jit::AsemblerBuffer class has incompatible definitions in
the js/src/jit/shared and js/src/jit/x86-shared directories.
Attachment #8686175 - Flags: review?(sphink)
Comment on attachment 8685647 [details] [diff] [review]
Implement BranchDeadlineSet. r=nbp

Review of attachment 8685647 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/IonAssemblerBuffer.h
@@ +58,5 @@
>      }
> +
> +    // Comparator for use with std::sort etc.
> +    struct Less {
> +        bool operator()(BufferOffset a, BufferOffset b) const {

Any reasons not to overload the  operator<(const BufferOffset& b)  ?

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +105,5 @@
> +    // The offsets in each vector are always kept in ascending order.
> +    //
> +    // This means that as forward branches are added to the assembler buffer,
> +    // their deadlines will always be appended to the vector corresponding to
> +    // their range. If long-range and short-range branches shared the same

Start the previous sentence with "Assuming that we have different vector for different ranges, this means …"
And for the moment remove the hypothetical case of having multiple ranges.

@@ +145,5 @@
> +        earliest_ = BufferOffset();
> +        for (unsigned r = 0; r < NumRanges; r++) {
> +            auto& vec = vectorForRange(r);
> +            if (!vec.empty() &&
> +                (!earliest_.assigned() || vec[0].getOffset() < earliest_.getOffset())) {

style-nit: as the condition spans on multiple lines, move the opening brace to a new line.
Attachment #8685647 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8685648 [details] [diff] [review]
Wire up branchDeadlines_ partially. No Asm callbacks yet. r=nbp

Review of attachment 8685648 [details] [diff] [review]:
-----------------------------------------------------------------

(re-ask for review when answered)

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +406,5 @@
>  };
>  
>  
>  // The InstSize is the sizeof(Inst) but is needed here because the buffer is
>  // defined before the Instruction.

nit: Merge this comment with the one below.

@@ +631,5 @@
>      static const unsigned NO_DATA = unsigned(-2);
>  
> +    // Check if it is possible to add numInst instructions and numPoolEntries
> +    // constant pool entries without needing to flush the current pool.
> +    bool hasSpaceForInsts(unsigned numInsts, unsigned numPoolEntries) const

I have a question about some unlikely case:

Let's say I have an assemble which needs 2 instructions to make any jump, and that instruction are of size 1.
I have 4 jumps which are recorded in the list of dead lines, in 2 different ranges indexes:

  (0, BufferOffset(20)) (0, BufferOffset(22))
  (1, BufferOffset(21)) (1, BufferOffset(23)) 

As, I understand, the following will only trigger if the pool end goes beyond the first deadline.

So my question is, how do we ensure that we have enough instruction space for such cases?  Or this example is not realistic and we should assert against that?

Identically, we could end-up in a loop of veneer forward branches, If the size needed for encoding a jump is as large as the space between 2 deadlines.

@@ +773,5 @@
> +    // rangeIdx
> +    //   A number < NumShortBranchRanges identifying the range of the branch.
> +    //
> +    // deadline
> +    //   The highest buffer offset the the short-range branch can reach

typo-nit: s/the the/the/
Attachment #8685648 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8685649 [details] [diff] [review]
Implement constant pool test. r=nbp

Review of attachment 8685649 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jsapi-tests/testAssemblerBuffer.cpp
@@ +319,5 @@
> +    //  12: pool header
> +    //  16: poolData
> +    //  20: 2
> +    //
> +    ab.putInt(1);

This was not clear to me at the beginning that putInt(1) is used to write instructions.
Maybe it would make sense to use 0x22220000 as an instruction prefix for any random instruction.
Attachment #8685649 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8685650 [details] [diff] [review]
Add PatchShortRangeBranchToVeneer(). r=nbp

Review of attachment 8685650 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi-tests/testAssemblerBuffer.cpp
@@ +459,5 @@
> +    CHECK_EQUAL(*ab.getInst(br1), 0xb1bb00cc);
> +    CHECK_EQUAL(*ab.getInst(br2), 0xb1bb0d2d);
> +
> +    // Cancel one of the pending branches.
> +    ab.unregisterBranchDeadline(1, BufferOffset(off.getOffset() + TestAssembler::BranchRange));

I am not yet sure to understand this use case yet, do you have an example to clarify this?

@@ +488,5 @@
> +    CHECK_EQUAL(*ab.getInst(br2), 0xb3bb0000 + 20);         // br2 pc+20 (patched)
> +    CHECK_EQUAL(*ab.getInst(BufferOffset(28)), 0xb0bb0010); // br pc+16 (guard)
> +    CHECK_EQUAL(*ab.getInst(BufferOffset(32)), 0xffff0000); // pool header 0 bytes (not counting veneers.)
> +    CHECK_EQUAL(*ab.getInst(BufferOffset(36)), 0xb2bb00cc); // veneer1 w/ original 'cc' offset.
> +    CHECK_EQUAL(*ab.getInst(BufferOffset(40)), 0xb2bb0d2d); // veneer2 w/ original 'cc' offset.

comment-nit: veneer 2 w/ original 'd2d' offset.
Attachment #8685650 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> Comment on attachment 8685648 [details] [diff] [review]
> I have a question about some unlikely case:
> 
> Let's say I have an assemble which needs 2 instructions to make any jump,
> and that instruction are of size 1.
> I have 4 jumps which are recorded in the list of dead lines, in 2 different
> ranges indexes:
> 
>   (0, BufferOffset(20)) (0, BufferOffset(22))
>   (1, BufferOffset(21)) (1, BufferOffset(23)) 
> 
> As, I understand, the following will only trigger if the pool end goes
> beyond the first deadline.
> 
> So my question is, how do we ensure that we have enough instruction space
> for such cases?  Or this example is not realistic and we should assert
> against that?
> 
> Identically, we could end-up in a loop of veneer forward branches, If the
> size needed for encoding a jump is as large as the space between 2 deadlines.

That's a good point. It is unlikely, but I don't think we can ignore it. ARM64 has two different branch ranges: the tbn/tbnz instructions can branch 32 KB ahead while the cbz/cbnz/b.cond conditional branches can branch 1 MB ahead. It would be possible to have a tbz and a cbz branch that happen to have the same deadline. In that case, we would only reserve space for one veneer, and we would not be able to add a veneer for the second branch.

This is a conservative fix: In hasSpaceForInsts(), make sure that there is room for NumShortBranchRanges after the end of the pool, and the last of the veneers must be placed before the deadline. Something like:

            if (deadline + guardSize_ < poolEnd * NumShortBranchRanges*guardSize_)
                return false;
 
This is assuming that guardSize_ is the size of an unconditional branch.

The problem I describe above is not exactly what you asked about. It fixes this case:

  (0, BufferOffset(20)) (1, BufferOffset(20))

I had initially thought that this was good enough because there is a limited density to the branches. The next deadlines in each range must be a whole instruction size after the earliest deadline. But you are right, that is not good enough. In the worst case, the deadlines can be packed so densely that we have NumShortBranchRanges new deadlines expiring for every one veneer inserted.

Assuming 2-byte branches as in your example, this worst case scenario is possible:

  (0, BufferOffset(20)) (1, BufferOffset(20))
  (0, BufferOffset(22)) (1, BufferOffset(22))
  (0, BufferOffset(24)) (1, BufferOffset(24))

The six veneers for these branches would have to be inserted no later than at offset 14:

14: veneer for (0,20)
16: veneer for (1,20)
18: veneer for (0,22)
20: veneer for (1,22)
22: veneer for (0,24)
24: veneer for (0,24)

This is an annoying problem because a situation like this is extremely unlikely to occur. I do think that it should be addressed, though. I don't want to build a JIT compiler that is just extremely likely to work.

It would be possible to compute the necessary margin by looking at the front of the deadline queues. Suppose the deadlines for all ranges were merge-sorted into one array d[]. Then we want the smallest i > 0 such that:

  d[i] - d[0] >= i * branchSize, or i = len(d)

The earliest deadline should be adjusted by (i-1)*branchSize to accomodate the 'decompression' of the dense branches.

We will find that i=1 99.9% of the time. With that assumption, it should be possible to compute i efficiently.
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> Comment on attachment 8685650 [details] [diff] [review]
> Add PatchShortRangeBranchToVeneer(). r=nbp
> 
> Review of attachment 8685650 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsapi-tests/testAssemblerBuffer.cpp
> @@ +459,5 @@
> > +    CHECK_EQUAL(*ab.getInst(br1), 0xb1bb00cc);
> > +    CHECK_EQUAL(*ab.getInst(br2), 0xb1bb0d2d);
> > +
> > +    // Cancel one of the pending branches.
> > +    ab.unregisterBranchDeadline(1, BufferOffset(off.getOffset() + TestAssembler::BranchRange));
> 
> I am not yet sure to understand this use case yet, do you have an example to
> clarify this?

When you bind a label before the branches expire (the common case), Assembler::bind() is going to call unregisterBranchDeadline() because the branch won't need a veneer after all.

I'll clarify this in a comment in the test case.
Comment on attachment 8686175 [details] [diff] [review]
Remove testAssemblerBuffer from unified build.

Review of attachment 8686175 [details] [diff] [review]:
-----------------------------------------------------------------

I'm certainly ok with non-unifying testAssemblerBuffer, but I'd like to better understand the problem to be sure there isn't an additional fix needed to identify a similar problem more reliably.

Why are there two different definitions of jit::AssemblerBuffer? Or, more specifically, which one is testAssemblerBuffer using, and would we ever have anything else forcibly using that same one?

I don't know which patch testAssemblerBuffer is in, but is it testing an AssemblerBuffer that would never get used by the spidermonkey compiled along with this, in some configurations? Would it be better to skip the test in those cases?

Sorry, I don't want to dig up the relevant code, but I'd like to understand what the real issue is here. (That said, I'm fine with landing this as-is for now.)
Attachment #8686175 - Flags: review?(sphink) → review+
(In reply to Steve Fink [:sfink, :s:] from comment #22)
> Comment on attachment 8686175 [details] [diff] [review]
> Remove testAssemblerBuffer from unified build.
> 
> Review of attachment 8686175 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm certainly ok with non-unifying testAssemblerBuffer, but I'd like to
> better understand the problem to be sure there isn't an additional fix
> needed to identify a similar problem more reliably.
> 
> Why are there two different definitions of jit::AssemblerBuffer? Or, more
> specifically, which one is testAssemblerBuffer using, and would we ever have
> anything else forcibly using that same one?
> 
> I don't know which patch testAssemblerBuffer is in, but is it testing an
> AssemblerBuffer that would never get used by the spidermonkey compiled along
> with this, in some configurations? Would it be better to skip the test in
> those cases?

Thanks, Steve

One version of jit::AssemblerBuffer is defined in jit/x86-shared/AssemblerBuffer-x86-shared.h
It is used when JS_CODEGEN_X86 or JS_CODEGEN_X64 is defined.

The other version is in jit/shared/IonAssemblerBuffer.h.
This one is used for JS_CODEGEN_ARM, JS_CODEGEN_ARM64, JS_CODEGEN_MIPS32, and JS_CODEGEN_MIPS64.

The tests in jsapi-tests/testAssemblerBuffer.cpp cover the second version. Since this version is supposed to be target-independent, the test exercises the code regardless of which JS_CODEGEN_* is set.

One AssemblerBuffer version is a template class, and the other version is a plain non-template class, so there is no chance of linkage collisions in jsapi-tests. But that is probably more luck than we deserve.

An alternative solution would be to disable testAssemblerBuffer.cpp under JS_CODEGEN_X86/JS_CODEGEN_X64.
Ok, thanks. Given that the jit/shared one is intended to be target-independent, it seems worth testing that it is, which is exactly what you're doing already by enabling testAssemblerBuffer.cpp for all arches. I'm fine with your patch as-is. Having the same name seems a little worrisome, but I'm fine relying on the linker to complain if that's ever an issue.
Updated patch after nbp review:
- Implement relational operator overloads for BufferOffset.
- Add size() and maxRangeSize() methods to BranchDeadlineSet.
- Tweak comment.
Attachment #8685647 - Attachment is obsolete: true
Updated patch after nbp review:
- Tweaked comments.
- Addressed nbp's "unlikely case" in hasSpaceForInsts().

Rather than implementing a lazy merge sort to detect how much headroom is
required for veneers, I realized that a more conservative approach can be much
faster:

- Since branches and veneers have the same size, there is no problem when we
  only have one branch range. Each branch deadline must be at least an
  instruction size after its predecessor within the same range class.

- In an architecture with muliple branch range classes, the same applies to the
  range class that hapens to have the most deadlines registered. We would be
  able to emit all veneers for this range, but deadlines for the remaining ranges
  may cause trouble. Therefore, reserve space for all veneers that don't belong
  to the range with the most deadlines.

For ARM64, there will be two ranges: 1 MB for normal conditional branches and
cbz/cbnz, and 32 KB or tbz/tbnz. Most of the time, there will be more pending
1 MB branches, so we simply reserve space for all the tbz/tbnz branches. This is
unlikely to be more than a handful, so we won't lose much.
Attachment #8687492 - Flags: review?(nicolas.b.pierron)
Attachment #8685648 - Attachment is obsolete: true
Updated patch:
- Use 0x2222xxxx as fake arithmetic instruction opcodes.
Attachment #8685649 - Attachment is obsolete: true
Updated patch after nbp review:
- Clarify use of unregisterBranchDeadline().
Attachment #8685650 - Attachment is obsolete: true
Attachment #8685646 - Flags: review?(sstangl) → review+
Comment on attachment 8685651 [details] [diff] [review]
Add enum ImmBranchRangeType. r=sstangl

Review of attachment 8685651 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm64/vixl/MozBaseAssembler-vixl.h
@@ +40,5 @@
>  
>  
>  class MozBaseAssembler;
> +typedef js::jit::AssemblerBufferWithConstantPools<1024, 4, Instruction, MozBaseAssembler,
> +                                                  NumShortBranchRangeTypes> ARMBuffer;

This patch probably won't apply by itself, since it's missing the relevant modifications to AssemblerBufferWithConstantPools.
Attachment #8685651 - Flags: review?(sstangl) → review+
Comment on attachment 8685652 [details] [diff] [review]
Change representation of unbound Label linked lists. r=sstangl

Review of attachment 8685652 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm64/Assembler-arm64.cpp
@@ +229,2 @@
>  
> +    while (branchOffset.assigned()) {

Oh this is so much nicer.

::: js/src/jit/arm64/vixl/MozAssembler-vixl.cpp
@@ +67,5 @@
> +    Instruction* link = getInstructionAt(cur);
> +    // Raw encoded offset.
> +    ptrdiff_t offset = link->ImmPCRawOffset();
> +    // End of the list is encoded as 0.
> +    if (offset == 0)

kEndOfLabelUseList?

@@ +71,5 @@
> +    if (offset == 0)
> +        return BufferOffset();
> +    // The encoded offset is the number of instructions to move.
> +    offset *= kInstructionSize;
> +    offset += cur.getOffset();

Prefer |ptrdiff_t nextOffset = cur.getOffset() + offset * kInstructionSize|.

@@ +108,2 @@
>    if (armbuffer_.oom())
> +    return 0;

kEndOfLabelUseList

@@ +118,5 @@
>    if (!label->used()) {
>      // The label is unbound and unused: store the offset in the label itself
>      // for patching by bind().
>      label->use(branch.getOffset());
> +    return 0;

kEndOfLabelUseList

@@ +128,2 @@
>    label->use(branch.getOffset());
> +  MOZ_ASSERT(offset != 0);

kEndOfLabelUseList. This code is much nicer!

@@ +143,5 @@
>  }
>  
> +ptrdiff_t
> +MozBaseAssembler::LinkAndGetPageOffsetTo(BufferOffset branch, Label* label)
> +{

There are a bunch of such changes to this file. They should all be changed back to only one line, since this file follows VIXL style, and non-comment lines are given 100 columns in both SM and VIXL styles.
Attachment #8685652 - Flags: review?(sstangl) → review+
Comment on attachment 8685653 [details] [diff] [review]
Dynamically track short-range branches. r=sstangl

Review of attachment 8685653 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm64/vixl/MozAssembler-vixl.cpp
@@ +111,3 @@
>  ptrdiff_t
> +MozBaseAssembler::LinkAndGetOffsetTo(BufferOffset branch, ImmBranchRangeType branchRange,
> +                                     unsigned elementBits, Label* label)

The name elementBits is misleading. Maybe "elementShift" would be clearer?

@@ +122,5 @@
>      return label_offset - branch_offset;
>    }
>  
> +  // Keep track of short-range branches targeting unbound labels. We may need
> +  // to insert veneers in PatchShortRangeBranchToVeneer() below.

PatchShortRangeBranchToVeneer doesn't actually insert a veneer -- it just gets passed the veneer's address. It's not clear from this patch what does actually create the veneer, but I assume that's coming up.

@@ +137,2 @@
>      label->use(branch.getOffset());
>      return 0;

kEndOfLabelUseList? Maybe?

@@ +145,5 @@
> +
> +  // What is the earliest buffer offset that would be reachable by the branch
> +  // we're about to add?
> +  ptrdiff_t earliestReachable =
> +    branch.getOffset() + Instruction::ImmBranchMinBackwardOffset(branchRange);

This looks like it might fit in the 100 column limit.

@@ +152,5 @@
> +  // new branch, we can simply insert the new branch at the front of the list.
> +  if (label->offset() >= earliestReachable) {
> +      ptrdiff_t offset = EncodeOffset(branch, BufferOffset(label));
> +      label->use(branch.getOffset());
> +      MOZ_ASSERT(offset != 0);

kEndOfLabelUseList

@@ +185,5 @@
> +  } while (next.assigned());
> +  SetNextLink(exbr, branch);
> +
> +  // This branch becomes the new end of the list.
> +  return 0;

kEndOfLabelUseList

@@ +491,5 @@
> +  // Verify that the branch range matches what's encoded.
> +  MOZ_ASSERT(Instruction::ImmBranchTypeToRange(branchInst->BranchType()) == branchRange);
> +
> +  // We want to insert veneer after branch in the linked list of instructions
> +  // that use the same unbound label.

This code doesn't actually do any inserting... the veneer space is already allocated and passed in via BufferOffset.

@@ +493,5 @@
> +
> +  // We want to insert veneer after branch in the linked list of instructions
> +  // that use the same unbound label.
> +  // The veneer should be an unconditional branch.
> +  ptrdiff_t offset = branchInst->ImmPCRawOffset();

instructionOffset?

@@ +496,5 @@
> +  // The veneer should be an unconditional branch.
> +  ptrdiff_t offset = branchInst->ImmPCRawOffset();
> +
> +  // If offset is 0, this is the end of the linked list.
> +  if (offset == 0) {

kEndOfLabelUseList

@@ +502,5 @@
> +  } else {
> +      // Make the offset relative to veneer so it targets the same instruction
> +      // as branchInst.
> +      offset *= kInstructionSize;
> +      offset += branch.getOffset() - veneer.getOffset();

ptrdiff_t byteOffset = (instructionOffset * kInstructionSize) + branch.getOffset() - veneer.getOffset();

@@ +507,5 @@
> +      Assembler::b(veneerInst, offset / kInstructionSize);
> +  }
> +
> +  // Now point branchInst at veneer. See also SetNextLink() above.
> +  branchInst->SetImmPCRawOffset(EncodeOffset(branch, veneer));

Does anything assert that veneer is actually in range of branchInst?
Attachment #8685653 - Flags: review?(sstangl) → review+
Comment on attachment 8685654 [details] [diff] [review]
Handle toggled calls in CodeFromJump(). r=sstangl

Review of attachment 8685654 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm64/Assembler-arm64.cpp
@@ +486,5 @@
>  
>  static JitCode*
>  CodeFromJump(JitCode* code, uint8_t* jump)
>  {
>      Instruction* branch = (Instruction*)jump;

Calling this branch is a bad idea, since it's not actually a branch. I haven't thought of any better names, though. Maybe just "inst".

@@ +506,3 @@
>          target = (uint8_t*)branch->ImmPCOffsetTarget();
> +    } else if (branch->IsLDR()) {
> +        // This is an ldr+blr call that is enabled.

MOZ_ASSERT that the nextInst()->IsBLR()?

@@ +507,5 @@
> +    } else if (branch->IsLDR()) {
> +        // This is an ldr+blr call that is enabled.
> +        target = (uint8_t*)branch->Literal64();
> +    } else if (branch->IsADR()) {
> +        // This is a disabled call: adr+nop. See ToggleCall().

MOZ_ASSERT that the nextInst()->IsNOP()?
Attachment #8685654 - Flags: review?(sstangl) → review+
(In reply to Sean Stangl [:sstangl] from comment #31)
> > +  // Now point branchInst at veneer. See also SetNextLink() above.
> > +  branchInst->SetImmPCRawOffset(EncodeOffset(branch, veneer));
> 
> Does anything assert that veneer is actually in range of branchInst?

Yes, SetImmPCRawOffset() eventually VIXL_ASSERT's that the offset can be encoded.

For security reasons, we may want to change those to release asserts, though.
Comment on attachment 8687492 [details] [diff] [review]
Wire up branchDeadlines_ partially. No Asm callbacks yet.

Review of attachment 8687492 [details] [diff] [review]:
-----------------------------------------------------------------

patch comment: s/geta a/get a/

::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h
@@ +652,5 @@
>      static const unsigned NO_DATA = unsigned(-2);
>  
> +    // Check if it is possible to add numInst instructions and numPoolEntries
> +    // constant pool entries without needing to flush the current pool.
> +    bool hasSpaceForInsts(unsigned numInsts, unsigned numPoolEntries) const

Do we have a way to assert the correctness of these arguments, as valid upper bounds of the generated code?
Attachment #8687492 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #34)
> > +    // Check if it is possible to add numInst instructions and numPoolEntries
> > +    // constant pool entries without needing to flush the current pool.
> > +    bool hasSpaceForInsts(unsigned numInsts, unsigned numPoolEntries) const
> 
> Do we have a way to assert the correctness of these arguments, as valid
> upper bounds of the generated code?

Yes.

This function is normally called with (1,0) or (1,1) arguments before inserting a single instruction. It is called from enterNoPool() with the user-provided maxInst argument. There is a dedicated mechanism for verifying that the user doesn't attempt to insert more no-pool instructions than promised.

In general, the instruction encoding function assert that the encoded offset on constant pool loads and branches is in range.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: