Closed Bug 733336 Opened 12 years ago Closed 12 years ago

IonMonkey: Theoretical PIC bug on ARM

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mjrosenb, Unassigned)

Details

Attachments

(1 file)

for PICs, we need to generate control flow altering instructions from the main codepath off to the actual ICs.  Currently the instructions used are all local branches, meaning that if we generate more than 32MB of data, the code to generate PICs will suddenly start throwing assertions.
Theoretically, we can replace all of these with moves into IP followed by a branch-to-register, however that will likely be slow for a PIC.  Similarly, we can use an ldr pc, [pc, #foo] to move execution to wherever we want to.
My idea is to use both ldr pc, [pc, #foo] and b bar.

The initial state will be
// machine code
branch:
ldr pc, address;

address:
tasty_tasty_data

void *branchLoc = &branch;

so we keep track of all branches in the routine, writing down the address where control flow is altered.  If we ever need to find out where control flow is altered to, we can pull the final address out of the ldr, then get to what i've labeled address.

When the target of the branch needs to be changed, and it can now be represented with a branch instruction, we modify the world to look like this:

// machine code
branch:
b dest;

address:
&branch;

void *branchLoc = &address|2;

namely, we re-point our label to the pool slot where the dest used to be stored, change that value to be a pointer to the new branch instruction.  From this label, we can get both the pool location (in case we need to go back to the old format later), and the actual address of the branch.

This should be compatible with THUMB code, namely, by looking at the bottom 2 bits, we can tell what state we're in:
00: should be pointing to an ldr instruction
01: pointing to a thumb ldr
10: pointing to the 2 after pool entry, which points to the branch, mask off the bottom bits to get the correct value
11: pointing to a thumb ldr with a different alignment.
There are a couple of cleanup fixes and a correctness fix I found while testing this patch included, I can pull that into a separate patch/bug if anyone wants it.
Attachment #626464 - Flags: review?(dvander)
Attachment #626464 - Flags: review?(Jacob.Bramley)
Comment on attachment 626464 [details] [diff] [review]
fix IC patching on ARM

Review of attachment 626464 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't spot any major problems, but I made several minor comments. r+ with those things addressed.

::: js/src/ion/CodeGenerator.cpp
@@ +2307,5 @@
>  // An out-of-line path to call an inline cache function.
>  class OutOfLineCache : public OutOfLineCodeBase<CodeGenerator>
>  {
>      LInstruction *ins;
> +    RepatchLabel  repatchEntry_;

s/  / /

::: js/src/ion/IonCaches.cpp
@@ +66,5 @@
>          JS_ASSERT((uint64_t)raw_ <= UINT32_MAX);
>  #endif
>          new_off = masm->actualOffset((uintptr_t)raw_);
> +#ifdef JS_SMALL_BRANCH
> +        jumpTableEntry_ = reinterpret_cast<uint8 *>(masm->actualIndex(reinterpret_cast<size_t>(jumpTableEntry_)));

Why not overload actualIndex to take (and return) a 'uint8*'? Alternatively (and arguably even better), create a local jumpTableEntryOffset before this conditional block, then pass that into jumpTableEntry below:

size_t jumpTableEntryOffset = reinterpret_cast<size_t>(jumpTableEntry_);
if (masm != NULL) {
  jumpTableEntryOffset = masm->actualIndex(jumpTableEntryOffset);
}
jumpTableEntry_ = Assembler::PatchableJumpAddress(code, jumpTableEntryOffset);

@@ +149,5 @@
>      CodeOffsetJump rejoinOffset;
>  
>      void generate(JSContext *cx, MacroAssembler &masm, JSObject *obj, JSObject *holder,
>                    const Shape *shape, Register object, TypedOrValueRegister output,
> +                  RepatchLabel *failures, Label *nonPatchFailures = NULL)

'nonRepatchFailures' would be more consistent.

@@ +154,5 @@
>      {
>          // If there's a single jump to |failures|, we can patch the shape guard
>          // jump directly. Otherwise, jump to the end of the stub, so there's a
>          // common point to patch.
> +        bool multipleFailureJumps = nonPatchFailures != NULL && nonPatchFailures->used();

I think some parentheses would be nice here.

@@ -201,2 @@
>          rejoinOffset = masm.jumpWithPatch(&rejoin_);
> -        masm.bind(&rejoin_);

Why don't you bind this any more? Is it valid to leave the jump pointing to an unbound label?

@@ +240,5 @@
>      Linker linker(masm);
>      IonCode *code = linker.newCode(cx);
>      if (!code)
>          return false;
> +    getprop.rejoinOffset.fixup(&masm);

Other instances put a blank line before these fixups.

@@ +375,2 @@
>      CodeOffsetJump rejoinOffset = masm.jumpWithPatch(&rejoin_);
>      masm.bind(&rejoin_);

You still bind the rejoin label here.

::: js/src/ion/arm/Assembler-arm.cpp
@@ -383,5 @@
>      // or if we need a larger branch (or just need to use our pool entry)
>      Instruction *jump = (Instruction*)jump_.raw();
>      Assembler::Condition c;
>      jump->extractCond(&c);
> -    JS_ASSERT(jump->is<InstBranchImm>());

You should still assert that the instruction is as expected. (Assertions like these were very useful for catching code offset errors in JaegerMonkey.) In some cases it might need to be extended to allow "ldr pc, ..." as well as branches. I think JaegerMonkey has a suitable utility function which you could move into IonMonkey.

In this case, the comments state that the jump will always be a branch to start with, so the assertion should be fine as it is.

@@ +404,5 @@
>      for (size_t i = 0; i < jumps_.length(); i++) {
>          jumps_[i].fixOffset(m_buffer);
>      }
>  
> +    for (unsigned int i = 0; i < tmpDataRelocations_.length(); i++) {

I have no objection to making this unsigned, but why only change this loop, and not the others?

@@ +676,5 @@
>  {
>      for (size_t i = 0; i < codeLabels_.length(); i++) {
>          CodeLabel *label = codeLabels_[i];
> +        //Bind(code, label->dest(), code->raw() + label->src()->offset());
> +        JS_NOT_REACHED("dead code?");

Add a TODO or some other flag, so this doesn't get forgotten. (Even better, file another bug to track it.)

Actually, can you just delete the whole function? The comment at the top suggests that it isn't referenced anywhere, and if that's true then deleting it is probably the right thing to do.

@@ +1386,5 @@
>      }
> +
> +    bool isValidPoolHint() {
> +        // If this has already been patched with a valid instruction, the ONES field
> +        // will not be 0xf, since that is not a valid condition code.

Technically this is true. However, the bit pattern is used by some special instructions, most of which are coprocessor and NEON instructions, but BLX <immediate> also shares this bit pattern. In these instructions, those bits are not considered a condition code. (Well, that depends on exactly how you interpret the wording in the ARM ARM, but these instructions certainly can't be conditional.)

The test is safe if we don't use these instructions, but we should assert that somehow (in case we use them later) and we shouldn't say in a comment that 0xf isn't a valid condition code because it could be confusing.

One option (which I haven't thought through) would be to move the 'cond' bitfield to be adjacent to 'ONES', then check for 0xff, since 0xff****** _is_ always invalid (if I remember correctly). This fails if we ever store 0xf in 'cond'. I'm assuming that we don't.

@@ +1522,5 @@
> +      case PoolHintData::poolBranch:
> +        // Either this used to be a poolBranch, and the label was already bound, so it was
> +        // replaced with a real branch, or this may happen in the future.
> +        // If this is going to happen in the future, then the actual bits that are written here
> +        // don't matter (except the condition code), but if it does not get bound later,

You should explain why the condition code matters, otherwise this could be somewhat confusing. (Just say that condition 0xf is used as a flag.)

@@ +1988,5 @@
> +        // If the label has a use, then change this use to refer to
> +        // the bound label;
> +        BufferOffset branchOff(label->offset());
> +        // Since this was created with a RepatchLabel, the value written in the
> +        // instruction stream is not branch shaped, it is PoolHintData shaped.

Nice and clear, thanks!

@@ +2021,5 @@
>  
>  }
>  
>  
> +#if 0

Just delete it.

@@ +2113,4 @@
>  {
> +    // Retargeting calls is totally unsupported!
> +    if (i->is<InstBranchImm>())
> +        JS_ASSERT(i->is<InstBImm>());

Use this:
JS_ASSERT_IF(i->is<InstBranchImm>(), i->is<InstBImm>());

@@ +2122,5 @@
> +void
> +Assembler::retargetFarBranch(Instruction *i, uint8 **slot, uint8 *dest, Condition cond)
> +{
> +    int32 offset = reinterpret_cast<uint8*>(slot) - reinterpret_cast<uint8*>(i);
> +    if (!i->is<InstLDR>()) {

So, if I understand correctly, we can retarget B to LDR, but why not the other way around (in retargetNearBranch)? Does that just not come up very often?

In JM, it was common to retarget LDR to B but rare to go the other way, since the default branch target was usually out of range but almost always got patched to a short-range branch when the code was finalized.

::: js/src/ion/arm/Assembler-arm.h
@@ +914,5 @@
>        : data ((offset - 8) >> 2 & 0x00ffffff)
>      {
>          JS_ASSERT ((offset & 0x3) == 0);
>          JS_ASSERT ((offset - 8) >= -33554432);
>          JS_ASSERT ((offset - 8) <= 33554428);

Use JS_ASSERT(isInRange(offset)) here.

@@ +917,5 @@
>          JS_ASSERT ((offset - 8) >= -33554432);
>          JS_ASSERT ((offset - 8) <= 33554428);
>      }
> +    static bool isInRange(int offset)
> +    {

It might also be wise to JS_ASSERT((offset & 0x3) == 0);

@@ +920,5 @@
> +    static bool isInRange(int offset)
> +    {
> +        if ((offset - 8) < -33554432)
> +            return false;
> +        if ((offset - 8) >= 33554428)

It can be equal to 33554428, so use '>' rather than '>='. (I know, it will never really come up, but it might!)

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +404,5 @@
>                  uint32 shift;
>                  JS_FLOOR_LOG2(shift, constant);
>                  uint32 rest = constant - (1 << shift);
>                  // See if the constant has one bit set, meaning it can be encoded as a bitshift
> +                if (signed(1 << shift) == constant) {

Put the cast in parentheses:

if ((signed)(1 << shift) [...]

I'd argue that 'constant' should be unsigned. I think this code will work anyway, but it relies on some implementation-defined C(++) behaviour and the sign semantics are wrong.

Note that JS_FLOOR_LOG2 casts the constant to uint32_t, so (although correct and well-defined) passing a signed integer is likely to cause trouble.

::: js/src/ion/shared/Assembler-shared.h
@@ +278,5 @@
> +    bool bound() const {
> +        return bound_;
> +    }
> +    void bind(int32 dest) {
> +        JS_ASSERT(!bound_);

You should assert that dest != INVALID_OFFSET, however unlikely it looks.

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +454,5 @@
>          }
>          return j;
>      }
> +
> +    JmpSrc jSrc(Condition cond, RepatchLabel *label) {

jSrc and jmpSrc are already in the repository. Does this hunk actually belong to an older patch? There are several existing refences to jSrc in this patch which have not been changed.

@@ +464,5 @@
> +            label->use(j.offset());
> +        }
> +        return j;
> +    }
> +    JmpSrc jmpSrc(RepatchLabel *label) {

I don't like that jSrc and jmpSrc have different names, but very similar behaviour.

::: js/src/ion/shared/IonAssemblerBufferWithConstantPools.h
@@ +269,5 @@
> +    // A class for indexing into constant pools.
> +    // Each time a pool entry is added, one of these is generated.
> +    // This can be supplied to read and write that entry after the fact.
> +    // And it can be used to get the address of the entry once the buffer
> +    // has been finalized, and an executable copy allocated.

This comment should be before the class definition, or at least not after the 'private:' statement.

@@ +279,5 @@
> +        friend struct AssemblerBufferWithConstantPool;
> +        uint32 offset_ : offsetBits;
> +        uint32 kind_ : poolKindBits;
> +        PoolEntry(int offset, int kind) : kind_(kind), offset_(offset)
> +        {

The curly brackets should open on the previous line. (The same applies to other methods in this class.)

@@ +285,5 @@
> +      public:
> +        int32 encode() {
> +            int32 ret;
> +            memcpy(&ret, this, sizeof(int32));
> +            return ret;

Wouldn't uint32 make more sense? Other than that, this method is good.

@@ +299,5 @@
> +        int32 poolKind() const {
> +            return kind_;
> +        }
> +        int32 offset() const {
> +            return offset_;

Again, these quantities are stored as uint32 types, so returning uint32 types makes more sense. (If there's a specific reason why you haven't done that, add a comment.)

@@ +451,3 @@
>              Asm::insertTokenIntoTag(instSize, inst, token);
> +            int poolId = p - pools;
> +            JS_ASSERT(poolId <= 1+(1 << poolKindBits) && poolId >= 0);

Why can poolId be equal to 1+(1<<poolKindBits)? entryCount[] has 1<<poolKindBits entries, so the following assertions should be correct:

JS_ASSERT(poolId >= 0);
JS_ASSERT(poolId < (1 << poolKindBits));

If I've misunderstood and it is correct, please add a comment to explain why.

Also, it would be clearer written with two separate assertions.

@@ +457,3 @@
>          // Now inst is a valid thing to insert into the instruction stream
>          this->putBlob(instSize, inst);
> +        // figure out the offset within like-kinded pool entries

The comment doesn't match what the return statement does.

@@ +811,5 @@
>                  curpool--;
>              }
>              // Can't assert anything here, since the first pool may be after the target.
>          }
> +        Asm::retargetNearBranch(i, offset);

Can we never need a far-branch here?

@@ +936,5 @@
> +            if (poolGroup[idx].other == realPool) {
> +                return start + offset;
> +            }
> +            start = poolGroup[idx].other->addPoolSize(start);
> +        }

Add a comment to explain why you loop forward through pool groups first, then backwards when looking up the 'other' pools.

Also, you should explain somewhere (ideally with the 'other' declaraation) that the 'other' pool has backwards-references.
Attachment #626464 - Flags: review?(Jacob.Bramley) → review+
Status: NEW → ASSIGNED
Comment on attachment 626464 [details] [diff] [review]
fix IC patching on ARM

Review of attachment 626464 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/shared/Assembler-shared.h
@@ +47,5 @@
>  #include "ion/RegisterSets.h"
> +#if defined(JS_CPU_X64) || defined(JS_CPU_ARM)
> +// JS_SMALL_BRANCH means the range on a branch instruction
> +// is smaller than the whole address space
> +#    define JS_SMALL_BRANCH

nit: better name might be "JS_LIMITED_BRANCHES"

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +1088,5 @@
>      uint32 actualOffset(uint32 x) {
>          return x;
>      }
>  
> +    uint32 actualIndex(uint32 x) {

What does this function do?
Attachment #626464 - Flags: review?(dvander) → review+
landed: http://hg.mozilla.org/projects/ionmonkey/rev/6ed2a75d11ae
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.