Last Comment Bug 649202 - Fast property access for typed arrays on ARM
: Fast property access for typed arrays on ARM
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Other Branch
: ARM All
: -- normal with 1 vote (vote)
: mozilla8
Assigned To: Marty Rosenberg [:mjrosenb]
:
Mentors:
Depends on: 649690 663485 678928
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-11 18:05 PDT by Alon Zakai (:azakai)
Modified: 2011-08-14 19:35 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
first try at fast typed arrays for arm (58.12 KB, patch)
2011-07-20 11:35 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
Typed array benchmark (6.34 KB, application/octet-stream)
2011-07-20 11:52 PDT, Alon Zakai (:azakai)
no flags Details
patch againt central (58.74 KB, patch)
2011-07-20 14:58 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
Fixes some of the rarely used accesses to typed arrays (59.14 KB, patch)
2011-07-27 17:11 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
may fix some bugs that testing did not catch... (60.03 KB, patch)
2011-08-01 14:10 PDT, Marty Rosenberg [:mjrosenb]
Jacob.Bramley: review+
Details | Diff | Splinter Review
Turn on polyic typed arrays on arm. (740 bytes, patch)
2011-08-01 14:12 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
bundles all previous patches (60.68 KB, patch)
2011-08-05 18:11 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review

Description Alon Zakai (:azakai) 2011-04-11 18:05:36 PDT
I am told that we don't have fast property access for typed arrays on ARM. Would be great if we did, it would allow us to do some very cool demos.

cc'ing people that know much more about this than me.
Comment 1 Marty Rosenberg [:mjrosenb] 2011-07-20 11:35:20 PDT
Created attachment 547175 [details] [diff] [review]
first try at fast typed arrays for arm

this is a patch against ionmonkey, but with all of the ionmonkey stuff pulled out, so it *should* apply cleanly against central. I'm about to try testing it against central, and making sure it still works correctly there.

So far the only typed array speed test I have run on arm (attachment for bug 652346)
goes from 27 seconds to 20 seconds with these changes.
Comment 2 Alon Zakai (:azakai) 2011-07-20 11:52:59 PDT
Created attachment 547186 [details]
Typed array benchmark

Very glad to hear about progress on this!

Here is another benchmark, maybe it'll be useful. It's a simple C++ memory access benchmark, compiled in 3 ways: no typed arrays (ta0.js), typed arrays method 1 (ta1.js) and method 2 (ta2.js).

Aside from comparing the last two with and without this patch, it might be interesting to compare the last two to the first. In theory, typed arrays should make this kind of benchmark faster. (Not necessarily both method 1 and method 2, but at least one of them, I would hope...)

Might also be interesting to look at the typed array version of Kraken, I had heard we had that somewhere.
Comment 3 Marty Rosenberg [:mjrosenb] 2011-07-20 14:58:31 PDT
Created attachment 547259 [details] [diff] [review]
patch againt central

This patch is explicitly against central, but it does not turn on typed arrays for arm.
As for the tests,
ta1.js:
old: user    2m24.380s
new: user    1m22.850s

ta2.js:
old: user    1m37.980s
new: user    1m8.400s

So it seems to be having about the expected speedup?
Comment 4 Alon Zakai (:azakai) 2011-07-20 15:06:05 PDT
Yes, looks good :)

One interesting thing is that ta2 was slightly slower than ta1 on my desktop (x86). But both with and without this patch apparently ta2 is faster on arm (although, less with the patch). This leads to a slightly worrying situation where one might optimize their scripts on desktop, but that doesn't simultaneously optimize for mobile as well.
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2011-07-20 19:00:23 PDT
That worrying situation is going to be true no matter what.  For example, the simple fact that the CPU cache sizes are different means that the right size/speed tradeoff is different... (that includes between desktop machines).
Comment 6 Marty Rosenberg [:mjrosenb] 2011-07-27 17:11:14 PDT
Created attachment 548982 [details] [diff] [review]
Fixes some of the rarely used accesses to typed arrays

After discovering that I could set the arguments to js in the js test, I found some more bugs.  Mostly, I left out bits, or swapped bits.
Comment 7 Marty Rosenberg [:mjrosenb] 2011-07-27 19:12:19 PDT
Comment on attachment 548982 [details] [diff] [review]
Fixes some of the rarely used accesses to typed arrays

Since I don't have commit access yet, if you could push this to the try server, then to a repo, that would be very helpful.
Comment 8 Marty Rosenberg [:mjrosenb] 2011-08-01 14:10:24 PDT
Created attachment 549915 [details] [diff] [review]
may fix some bugs that testing did not catch...

So I was going to implement some other improvements, and I noticed some things that looked like bugs in the current patch.  Fixing them does not cause any test to fail, so I'm lead to believe that they are now (more) correct.  Also, since jbramley is back from vacation, I'm asking him to review, since he is probably the best person to review this patch.  Also, same request as last time, push to try, & inbound if at all possible.
Side Note: this patch does not actually turn on typed array PIC's for arm.  I'll attach a one line patch that does that in a minute.
Comment 9 Marty Rosenberg [:mjrosenb] 2011-08-01 14:12:53 PDT
Created attachment 549917 [details] [diff] [review]
Turn on polyic typed arrays on arm.
Comment 10 Jacob Bramley [:jbramley] 2011-08-04 04:04:31 PDT
Comment on attachment 549915 [details] [diff] [review]
may fix some bugs that testing did not catch...

Review of attachment 549915 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me in general. I like the re-organization and the consolidation of some of the back-end methods.

I'm giving r+ for the design, subject to the questions and corrections I've noted in my detailed review. I've not tested this on try, but I will if you still don't have access.

General nit-picking: Your comments are excellent, but I'd be much happier if they had capital letters and full stops (periods).

::: js/src/assembler/assembler/ARMAssembler.cpp
@@ +270,5 @@
>  }
>  
>  // Memory load/store helpers
> +// TODO: this does not take advantage of all of ARMv7's instruction encodings, it should.
> +void ARMAssembler::dataTransferN(bool isLoad, bool isSigned, int size, RegisterID srcDst, RegisterID base, int32_t offset)

Since you're writing new code here, it might be worth replacing 'srcDst' with 'rt' (as the ARM ARM calls it). If it's just me that gets confused by 'srcDst', please ignore me :-)

@@ +274,5 @@
> +void ARMAssembler::dataTransferN(bool isLoad, bool isSigned, int size, RegisterID srcDst, RegisterID base, int32_t offset)
> +{
> +    bool posOffset = offset >= 0;
> +    if (offset < 0)
> +        offset = - offset;

Careful with this. It's broken if offset is 0x80000000. See bug 643213.

@@ +276,5 @@
> +    bool posOffset = offset >= 0;
> +    if (offset < 0)
> +        offset = - offset;
> +    if (offset <= 0xfff)
> +        // LDR rd, [rb, +offset]

I'm nit-picking, but I'd use "#+offset" to match ARM assembly syntax. It just makes it clear that 'offset' is an immediate, and not a register.

@@ +283,5 @@
> +        // add upper bits of offset to the base, and store the result into the temp registe
> +        if (posOffset)
> +            add_r(ARMRegisters::S0, base, OP2_IMM | (offset >> 12) | (10 << 8));
> +        else
> +            sub_r(ARMRegisters::S0, base, OP2_IMM | (offset >> 12) | (10 << 8));

The bracing style isn't exactly consistent between Nitro and the rest of JM, but the Mozilla Coding Style asks for curly braces even for single-line if-else statements: https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide#Control_structures
I've used them elsewhere in Nitro, and I've been bitten more than once by bugs in dangling else statements.

@@ +284,5 @@
> +        if (posOffset)
> +            add_r(ARMRegisters::S0, base, OP2_IMM | (offset >> 12) | (10 << 8));
> +        else
> +            sub_r(ARMRegisters::S0, base, OP2_IMM | (offset >> 12) | (10 << 8));
> +        // load using the lower bits of the register

I think you mean 'offset', not 'register'.

@@ +298,2 @@
>  
>  void ARMAssembler::dataTransfer32(bool isLoad, RegisterID srcDst, RegisterID base, int32_t offset)

Isn't this function redundant, now you've added dataTransferN?

@@ +326,5 @@
>              dtr_dr(isLoad, srcDst, base, ARMRegisters::S0);
>          }
>      }
>  }
> +/* this is large, ugly and obsolete.  dataTransferN is superior.*/

Why not delete it? That could be a separate bug, of course.

@@ +334,5 @@
> +        if (offset <= 0xfff) {
> +            if (isSigned)
> +                mem_imm_off(isLoad, true, 8, true, srcDst, base, offset);
> +            else
> +                dtrb_u(isLoad, srcDst, base, offset);

Why not use mem_imm_off to handle the unsigned case too? You do this in dataTransferN.

@@ +459,5 @@
>      fdtr_u(isLoad, srcDst, ARMRegisters::S0, 0);
>  }
>  
> +void ARMAssembler::floatTransfer(bool isLoad, FPRegisterID srcDst, RegisterID base, int32_t offset)
> +{

The alignment stuff here is broken. I have a fix in bug 669132 but I've been a bit slow about pushing it.

@@ +516,5 @@
> +        return;
> +    }
> +
> +    ldr_un_imm(ARMRegisters::S0, offset);
> +    // vldr/vstr do not allow indexed operations, so we get to do this *manually*.

s/indexed/register-indexed/

(Only because it confused me when I first read it.)

::: js/src/assembler/assembler/ARMAssembler.h
@@ +229,5 @@
>              DT_BYTE = (1 << 22),
>              DT_WB = (1 << 21),
>              // This flag is inlcuded in LDR and STR
>              DT_PRE = (1 << 24),
> +            // This flag makes the "half word" actually a half word

I'm not sure what is meant by this comment.

@@ +546,5 @@
> +        void mem_imm_off(bool isLoad, bool isSigned, int size, bool posOffset,
> +                         int rd, int rb, ARMWord offset, Condition cc = AL)
> +        {
> +            char const * mnemonic_act = (isLoad) ? ("ld") : ("st");
> +            char const * mnemonic_sign = (isSigned) ? ("?") : ("");

The signed-access mnemonics use 'S', not '?'. For example: LDRSB

However, they only apply to loads, and only to 8- and 16-bit accesses. You should assert that only valid combinations are requested, or perhaps explicitly ignore the isSigned flag for size==32.

@@ +547,5 @@
> +                         int rd, int rb, ARMWord offset, Condition cc = AL)
> +        {
> +            char const * mnemonic_act = (isLoad) ? ("ld") : ("st");
> +            char const * mnemonic_sign = (isSigned) ? ("?") : ("");
> +            /*sorry, this looks horrible*/

This comment can go.

@@ +549,5 @@
> +            char const * mnemonic_act = (isLoad) ? ("ld") : ("st");
> +            char const * mnemonic_sign = (isSigned) ? ("?") : ("");
> +            /*sorry, this looks horrible*/
> +            char const * mnemonic_size = 
> +                (size == 8) ? ("b") : ((size == 16) ? ("h") : (""));

I'd expand that to a switch statement or if-else chain. It'll take more lines of code, but be much more readable. Alternatively, use an array look-up like "sizes[(size/8)-1]", plus an assertion to check for bounds.

Maybe even set the whole mnemonic in this way. It might be verbose, but it's clear.

@@ +552,5 @@
> +            char const * mnemonic_size = 
> +                (size == 8) ? ("b") : ((size == 16) ? ("h") : (""));
> +            char const * off_sign = (posOffset) ? ("+") : ("-");
> +            js::JaegerSpew(js::JSpew_Insns, 
> +                           IPFX "**%s%s%sr, [%s, #%s%u]\n]", 

What is the "**" prefix for? Is that left-over debugging?

Also, I think the mnemonic will be wrong for LDRH, LDRB and their store equivalents. Would it not come out as "ldbr" and suchlike?

Finally, I think you have a stray square bracket after your newline.

@@ +556,5 @@
> +                           IPFX "**%s%s%sr, [%s, #%s%u]\n]", 
> +                           MAYBE_PAD, mnemonic_act, mnemonic_sign, mnemonic_size,
> +                           nameGpReg(rd), nameGpReg(rb), off_sign, offset);
> +            if (size == 32 || (size == 8 && !isSigned)) {
> +                /* all (the one) 32 bit ops and the signed 8 bit ops use the original encoding*/

s/signed/unsigned/

@@ +563,5 @@
> +                         (size == 8 ? DT_BYTE : 0) |
> +                         (posOffset ? DT_UP : 0), rd, rb, offset);
> +            } else {
> +                /* all 16 bit ops and 8 bit unsigned use the newer encoding*/
> +                /*these instructions don't exist before ARMv4*/

By the way, the oldest architecture we'll ever see here will be ARMv4T. That is an implicit assumption across Nitro (and many other modules).

@@ +573,5 @@
> +                         (posOffset ? DT_UP : 0), rd, rb, offset);
> +            }
> +        }
> +
> +        void mem_reg_off(bool isLoad, bool isSigned, int size, bool posOffset, int rd, int rb, int rm, Condition cc = AL)

Most of my comments on mem_imm_off apply here too.

@@ +659,5 @@
>  
>          // Data transfers like this:
> +        //  LDRSB rd, [rb, +offset]
> +        //  STRSB rd, [rb, +offset]
> +        // TODO: this instruction does not exist on arm v4 and earlier

It's present on ARMv4. We don't care about ARMv3 here.

@@ +662,5 @@
> +        //  STRSB rd, [rb, +offset]
> +        // TODO: this instruction does not exist on arm v4 and earlier
> +        void dtrsb_u(bool isLoad, int rd, int rb, ARMWord offset, Condition cc = AL)
> +        {
> +            ASSERT(isLoad); /*can only do signed byte loads, not stores*/

Just remove the flag, then, or ignore it. A signed, narrowing store is functionally equivalent to an unsigned store anyway, so it's still semantically correct to request a signed, narrowing store.

@@ +693,5 @@
>          }
>  
>          // Data transfers like this:
> +        //  LDRSB rd, [rb, -offset]
> +        //  STRSB rd, [rb, -offset]

Use "#-offset".

@@ +1449,5 @@
> +            FUITOD = 0x0eb80b40,
> +            FTOSID = 0x0ebd0b40,
> +            FTOSIZD = 0x0ebd0bc0,
> +            FMSTAT = 0x0ef1fa10,
> +            FDTR = 0x0d000b00

I'd like to see UAL mnemonics here (like VMOV, VADD etc), but they probably aren't worth adding for this patch.

@@ +1495,5 @@
> +            return true;
> +        }
> +
> +        /*VFP Code:*/
> +        /* this is horrible. There needs to be some sane way of distinguishing D from S from R*/

Agreed, but Nitro doesn't wrap registers in any sensible class. NanoJIT has a nice system which we might be able to borrow, but inserting it into Nitro would be a fiddly job.

@@ +1496,5 @@
> +        }
> +
> +        /*VFP Code:*/
> +        /* this is horrible. There needs to be some sane way of distinguishing D from S from R*/
> +        void emitVFPInst(ARMWord op, int rd, int rn, ARMWord op2)

Use vd, vm and vn, as in the ARM ARM.

Also, make these ARMWords. The callers of this function (i.e. fmem_imm) use DD(...) and suchlike to encode the registers, and 'int' is typically used for raw register numbers here. Actually, this function just becomes an elaborate ORR at that point, so perhaps the DD(...) bits should go in here.

@@ +1498,5 @@
> +        /*VFP Code:*/
> +        /* this is horrible. There needs to be some sane way of distinguishing D from S from R*/
> +        void emitVFPInst(ARMWord op, int rd, int rn, ARMWord op2)
> +        {
> +            ASSERT ( ((op2 & ~OP2_IMM) <= 0xfff) || (((op2 & ~OP2_IMMh) <= 0xfff)) );

VFP instructions don't have the magic "Operand 2", so this assertion doesn't make much sense.

@@ +1503,5 @@
> +            m_buffer.putInt(op | rn | rd | op2);
> +        }
> +
> +
> +        void fmem_imm(bool isLoad, bool isDouble, bool isUp, int dest, int rn, ARMWord offset, Condition cc = AL)

You're treating 'offset' as a word offset, rather than a byte offset. You should make this clear in a comment as most offsets in Nitro are byte-offsets.

The integer equivalent was called 'mem_imm_off'. Why is this one not 'fmem_imm_off'?

@@ +1521,5 @@
> +        }
> +
> +        /* WARNING: even for an int -> float conversion, all registers used
> +         * are VFP registers.
> +         */

Good warning, thanks! That one is a little unusual.

By the way, most of Nitro uses // comments. I know it's not what Jaeger Monkey does, but most of the ARM back-end does it so I just stick with the local convention.

@@ +1522,5 @@
> +
> +        /* WARNING: even for an int -> float conversion, all registers used
> +         * are VFP registers.
> +         */
> +        void vcvt(RegType srcType, RegType dstType, int src, int dest, Condition cc = AL)

Nice! I like the new (UAL) mnemonic usage.

@@ +1528,5 @@
> +            ASSERT(srcType != dstType);
> +            ASSERT(isFloatType(srcType) || isFloatType(dstType));
> +
> +            js::JaegerSpew(js::JSpew_Insns,
> +                           IPFX   "***vcvt.%s.%-15s, %s,%s\n", MAYBE_PAD, 

Debug prefix?

@@ +1541,5 @@
> +                            dblToFloat ? SD(dest) : DD(dest),
> +                            dblToFloat ? DM(src) : SM(src), 0);
> +            } else {
> +                bool unimplimented = true;
> +                ASSERT(unimplimented);

That looks weird to me. It's like an assertion that the statement is reached. It will also give an "unused variable" warning in release mode.

@@ +1545,5 @@
> +                ASSERT(unimplimented);
> +            }
> +        }
> +
> +        /* does r1*r2 -> dn, dn -> r1 * r2, r1 * r2 -> s1 * s1, etc.*/

For 64-bit values, use a colon rather than an asterisk, to avoid confusion with multiplication. It's probably best to use little-endian notation, so r2:r1 -> dn would take r2 as the high word.

@@ +1546,5 @@
> +            }
> +        }
> +
> +        /* does r1*r2 -> dn, dn -> r1 * r2, r1 * r2 -> s1 * s1, etc.*/
> +        // this only exists on arches that support VFPv2 and higher (VFPv1 is deprecated)

You can ignore VFPv1. Assume that you have an intersection of VFPv2 and VFPv3 unless you have specific tests.

@@ +1556,5 @@
> +                               nameGpReg(r1), nameGpReg(r2), nameFpRegD(rFP));
> +            else
> +                js::JaegerSpew(js::JSpew_Insns,
> +                               IPFX   "***%-15s %s, %s, %s\n", MAYBE_PAD, "vmov",
> +                               nameFpRegD(rFP), nameGpReg(r1), nameGpReg(r2));

Curly brackets.

::: js/src/assembler/assembler/MacroAssemblerARM.h
@@ +476,5 @@
> +    }
> +    void store16(TrustedImm32 imm, ImplicitAddress address)
> +    {
> +        if (imm.m_isPointer)
> +            m_assembler.ldr_un_imm(ARMRegisters::S1, imm.m_value);

That's only necessary if the immediate needs to be patchable, I think. The 'm_isPointer' name is a bit misleading. I'm guessing that the immediate doesn't need to be patchable as you don't return any labels. (The existing code does it this way, so I may be overlooking something.)

Also, since this is store16, you can always generate the immediate using MOVW (on ARMv7), and even on older architectures you can always do it in two arithmetics. 'move' should handle that, if the isPointer stuff can be disposed of. (Logically it cannot be a pointer if it's just 16 bits.)

@@ +513,5 @@
> +    {
> +        if (imm.m_isPointer)
> +            m_assembler.ldr_un_imm(ARMRegisters::S1, imm.m_value);
> +        else
> +            move(imm, ARMRegisters::S1);

Same comments as for store16.

@@ +652,5 @@
>      }
>  
>      Jump branch32(Condition cond, RegisterID left, Address right)
>      {
> +        /*If the load only takes a single instruction, then we could just do a load into*/

[...] load into PC? I'm not sure that's true. 'branch32' does a comparison, and I don't see how this can be optimized.

The 'right' address load could be optimized, but the instruction sequence needs to be consistent for some of these branches, so fiddling with them can have unexpected results.

@@ +1125,5 @@
> +
> +    void loadFloat(ImplicitAddress address, FPRegisterID dest)
> +    {
> +        // as long as this is a sane mapping, (*2) should just work
> +        dest = (FPRegisterID) (dest * 2);

You made an inline function for this earlier.

@@ +1187,5 @@
> +        m_assembler.baseIndexFloatTransfer(false, false, src,
> +                                           address.base, address.index,
> +                                           address.scale, address.offset);
> +    }
> +    /* Both ImmDouble storeFloat functions copied *directly* from x86.*/

Remove this comment. It gets confusing if one version changes.

@@ +1194,5 @@
> +        union {
> +            float f;
> +            uint32 u32;
> +        } u;
> +        u.f = imm.u.d;

This kind of type aliasing is explicitly allowed in GCC, but is not strictly correct in C(++). It's better to use memcpy to re-interpret the types like this. GCC knows that the memcpy is really just a no-op (so it's free), and it has the benefit of being semantically correct C(++).

::: js/src/methodjit/MethodJIT.cpp
@@ +485,5 @@
>  "   bxne    r0"                             "\n"
>  
>      /* Tidy up, then return '0' to represent an unhandled exception. */
>  "   mov     r0, sp"                             "\n"
> +/*"   blx  " SYMBOL_STRING_VMFRAME(PopActiveVMFrame) "\n"*/

Delete this, rather than leaving a comment.
Comment 11 Marty Rosenberg [:mjrosenb] 2011-08-04 11:56:50 PDT
> Since you're writing new code here, it might be worth replacing 'srcDst'
> with 'rt' (as the ARM ARM calls it). If it's just me that gets confused by
> 'srcDst', please ignore me :-)
> 
that confused me to no end when I was reading the old code.

> Careful with this. It's broken if offset is 0x80000000. See bug 643213.
yeah, I realized that when I reviewed that patch.

> I'm nit-picking, but I'd use "#+offset" to match ARM assembly syntax. It
> just makes it clear that 'offset' is an immediate, and not a register.
Sounds like a good idea

> 
> Isn't this function redundant, now you've added dataTransferN?
> 

Redundant? Yes.  Still used? yes.  I decided to make this patch as small as possible, and not go ripping up *all* of the memory accessing code, as you commented on later.
> Why not delete it? That could be a separate bug, of course.

> The alignment stuff here is broken. I have a fix in bug 669132 but I've been
> a bit slow about pushing it.
yeah, I noticed it was off, but I decided they should both be the same.  I can incorporate that patch into both double and float if you'd like.

> > +            // This flag makes the "half word" actually a half word
> 
> I'm not sure what is meant by this comment.
will fix.  "It distinguishes a 16 bit load from an 8 bit load, since the format of ldrh is also used to represent 8 bit signed transfers"

 
> The signed-access mnemonics use 'S', not '?'. For example: LDRSB
I have *no* clue why I typed '?'

> I'd expand that to a switch statement or if-else chain. It'll take more
> lines of code, but be much more readable. Alternatively, use an array
> look-up like "sizes[(size/8)-1]", plus an assertion to check for bounds.
> 
> Maybe even set the whole mnemonic in this way. It might be verbose, but it's
> clear.
> 
yeah, the idea behind that was "at least it is short", since most of the alternatives i'd thought of did not seem to be significantly clearer

> What is the "**" prefix for? Is that left-over debugging?
yup.  There were a couple of different functions that would dump some of these instructions into the spew, and I'd wanted to mark the ones that were likely wrong.

> 
> Also, I think the mnemonic will be wrong for LDRH, LDRB and their store
> equivalents. Would it not come out as "ldbr" and suchlike?
> 
> Finally, I think you have a stray square bracket after your newline.
> 
yeah, i didn't notice that while debugging.

> By the way, the oldest architecture we'll ever see here will be ARMv4T. That
> is an implicit assumption across Nitro (and many other modules).
Good to know! I had not heard any definitive answers on this before.

> Also, make these ARMWords. The callers of this function (i.e. fmem_imm) use
> DD(...) and suchlike to encode the registers, and 'int' is typically used
> for raw register numbers here. Actually, this function just becomes an
> elaborate ORR at that point, so perhaps the DD(...) bits should go in here.
I thought about that, but I opted not to, since for various instructions, the N, M, D registers may be
integer, double or single registers, each of which is encoded differently.
> 
> VFP instructions don't have the magic "Operand 2", so this assertion doesn't
> make much sense.
copied(incorrectly) from emitinst.  
> 
> @@ +1503,5 @@
> > +            m_buffer.putInt(op | rn | rd | op2);
> > +        }
> > +
> > +
> > +        void fmem_imm(bool isLoad, bool isDouble, bool isUp, int dest, int rn, ARMWord offset, Condition cc = AL)
> 
> You're treating 'offset' as a word offset, rather than a byte offset. You
> should make this clear in a comment as most offsets in Nitro are
> byte-offsets.
> 
I've almost considered making up separate types for every possible encoding of numbers so the compiler will complain if we use them incorrectly.

> Good warning, thanks! That one is a little unusual.
I hadn't seen those instructions before, and had *no* clue that you could store integer values in floating point registers


> 
> By the way, most of Nitro uses // comments. I know it's not what Jaeger
> Monkey does, but most of the ARM back-end does it so I just stick with the
> local convention.
sorry, habit more than anything else.

> That looks weird to me. It's like an assertion that the statement is
> reached. It will also give an "unused variable" warning in release mode.
yeah, evidently there is a JS_ASSERT_UNREACHED as well.  That seems like a nicer solution.

> For 64-bit values, use a colon rather than an asterisk, to avoid confusion
> with multiplication. It's probably best to use little-endian notation, so
> r2:r1 -> dn would take r2 as the high word.
iirc, one of the fixes between the first two versions of this patch was me getting that backwards in half of its uses.

> That's only necessary if the immediate needs to be patchable, I think. The
> 'm_isPointer' name is a bit misleading. I'm guessing that the immediate
> doesn't need to be patchable as you don't return any labels. (The existing
> code does it this way, so I may be overlooking something.)
I was going to ask you about those...
I guess this will be an instance of "try to fix it, and see if anything breaks"

> 
> Also, since this is store16, you can always generate the immediate using
> MOVW (on ARMv7), and even on older architectures you can always do it in two
> arithmetics. 'move' should handle that, if the isPointer stuff can be
> disposed of. (Logically it cannot be a pointer if it's just 16 bits.)
>
I had not even thought of how silly that was before.

> [...] load into PC? I'm not sure that's true. 'branch32' does a comparison,
> and I don't see how this can be optimized.
> 
> The 'right' address load could be optimized, but the instruction sequence
> needs to be consistent for some of these branches, so fiddling with them can
> have unexpected results.
> 
I'll just kill the comment

> You made an inline function for this earlier.
the inline function was an afterthought.

> This kind of type aliasing is explicitly allowed in GCC, but is not strictly
> correct in C(++). It's better to use memcpy to re-interpret the types like
> this. GCC knows that the memcpy is really just a no-op (so it's free), and
> it has the benefit of being semantically correct C(++).
Should the x86 variant of that be fixed as well? that is most of why I put the comment there.
 

> > +/*"   blx  " SYMBOL_STRING_VMFRAME(PopActiveVMFrame) "\n"*/
> 
> Delete this, rather than leaving a comment.
OOPS. that was not supposed to go in.  both variants expand to the same thing, but that was very much me changing things randomly to track down a bug.  Turns out, the bug was in GNU as.
Comment 12 Marty Rosenberg [:mjrosenb] 2011-08-05 18:11:32 PDT
Created attachment 551208 [details] [diff] [review]
bundles all previous patches
Comment 13 Marco Bonardo [::mak] 2011-08-06 03:02:21 PDT
http://hg.mozilla.org/mozilla-central/rev/56b60d340857

Note You need to log in before you can comment on or make changes to this bug.