Closed
Bug 690053
Opened 13 years ago
Closed 13 years ago
Land initial arm port of IM
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: mjrosenb, Assigned: mjrosenb)
Details
Attachments
(2 files, 4 obsolete files)
178.47 KB,
patch
|
jbramley
:
review+
|
Details | Diff | Splinter Review |
179.85 KB,
patch
|
Details | Diff | Splinter Review |
IM now works on ARM with the function: function foo() {return 5;}
This patch needs to be rebased, cleaned up, and a bit of cruft needs to be obliterated. Also tested for expressions that aren't '5'.
Assignee | ||
Comment 1•13 years ago
|
||
Sweet! 3 volunteers for an r? already!
Assignee | ||
Comment 2•13 years ago
|
||
This isn't horribly close to bringing ARM up to speed with x86, and I'm pretty sure I have some pretty glaring bugs, most of which I'm fairly ok with. for example, I know that the identity function doesn't work.
There are some things that might have crept in that I'd really prefer didn't go into this commit:
functions with the wrong naming convention/order of arguments
comments that are gibberish, misleading or wrong
functions that duplicate behavior/have been superseded
Other than that, I'd like some feedback (from anyone) about:
I'd like to have consistency between all of the different methods of encoding fields in instructions, but ended up leaving some fields as enums that don't have trailing zeros, and are assigned into a bitfield that lives in the correct place, and leaving some fields as enums that have some absurd number of trailing zeros so they can simply be or'ed into the final constant
Attachment #563147 -
Attachment is obsolete: true
Attachment #563692 -
Flags: review?(Jacob.Bramley)
Comment on attachment 563692 [details] [diff] [review]
rebased patch, minor fixes
Review of attachment 563692 [details] [diff] [review]:
-----------------------------------------------------------------
Great to see this! (And thanks for the detailed comments.)
A few drive-by nits:
::: js/src/ion/IonFrames.h
@@ +47,5 @@
>
> +#if defined(JS_CPU_X86) || defined (JS_CPU_X64)
> +# include "ion/shared/IonFrames-x86-shared.h"
> +#elif defined (JS_CPU_ARM)
> +# include "ion/arm/IonFrames-arm.h"
Alternately, could put this in Architecture-$(ARCH).h, if we don't want to create a whole file for one struct. (btw, is IonFrames-x86 missing?)
::: js/src/ion/arm/Architecture-arm.h
@@ +153,2 @@
> };
> +bool hasARMv7();
Newlines to give this breathing room.
::: js/src/ion/arm/Assembler-arm.cpp
@@ +42,5 @@
> #include "Assembler-arm.h"
> #include "jsgcmark.h"
>
> using namespace js;
> using namespace js::ion;
Newline here
@@ +275,5 @@
> + }
> +}
> +
> +ALUOp js::ion::ALUNeg(ALUOp op, Imm32 &imm)
> +{
House style for out-of-header functions is:
ALUOp
ion::ALUNeg(ALUOp op, Imm32 &imm)
For outparams, pointers are preferred. Also the js namespace is used so you can omit the js::
@@ +283,5 @@
> + imm = Imm32(~imm.value);
> + return op_mvn;
> + case op_mvn:
> + imm = Imm32(~imm.value);
> + return op_mov;
House style for switch is:
switch (op) {
case x:
imm = ...
::: js/src/ion/arm/Assembler-arm.h
@@ +109,5 @@
> +int RN(Register r);
> +
> +class VFPRegister
> +{
> +public:
Two-space indent on public/protected/private
@@ +593,5 @@
> +public:
> + EDtrOffReg(Register rm) : EDtrOff(rm) {}
> +};
> +
> +class EDtrAddr {
{ on newline for classes
@@ +752,5 @@
> +
> + uint32 * editSrc (bufferOffset bo) {
> + return (uint32*)(((char*)m_buffer.data()) + bo.getOffset());
> + }
> + class bufferOffset {
Better to upper-case ("BufferOffset") to make it clear it's not a variable.
::: js/src/ion/arm/Lowering-arm.cpp
@@ +167,5 @@
> LSnapshot *snapshot = LSnapshot::New(gen, lastResumePoint_);
> if (!snapshot)
> return false;
>
> + for (size_t i = 0; i < lastResumePoint_->numOperands(); i++) {
Rogue indent
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to David Anderson [:dvander] from comment #3)
>
> ::: js/src/ion/IonFrames.h
> @@ +47,5 @@
> >
> > +#if defined(JS_CPU_X86) || defined (JS_CPU_X64)
> > +# include "ion/shared/IonFrames-x86-shared.h"
> > +#elif defined (JS_CPU_ARM)
> > +# include "ion/arm/IonFrames-arm.h"
>
> Alternately, could put this in Architecture-$(ARCH).h, if we don't want to
> create a whole file for one struct. (btw, is IonFrames-x86 missing?)
Yeah. I'm bad with hg add :(
I'd love to put them in an appropriate pre-existing header. Architecture-arm.h works for arm,
but I'd initially made the x86 one as a shared header, and currently there isn't an Architecture-x86-shared.h, so I can put a copy of the header into both Architecture-x86 and in Architecture-x64.
Also, I just tried moving it to Architecture-arm.h, and ran into some issues with include dependencies, namely:
In file included from ../../src/ion/shared/Assembler-shared.h:46,
from ../../src/ion/arm/Architecture-arm.h:46,
from /home/mrosenberg/src/ionmonkey/ionmonkey-build/js/src/ion/IonFrames.h:52,
from /home/mrosenberg/src/ionmonkey/ionmonkey-build/js/src/ion/IonCompartment.h:49,
from /home/mrosenberg/src/ionmonkey/ionmonkey-build/js/src/jscompartment.cpp:62:
../../src/ion/IonRegisters.h:59: error: ‘Registers’ does not name a type
If you want' I can actually figure out how to put this structure into Architecture-*.h, but I am inclined to leave that for later, since I have something that works (at the cost of two extra header files)
> House style for switch is:
>
> switch (op) {
> case x:
> imm = ...
>
> ::: js/src/ion/arm/Assembler-arm.h
> @@ +109,5 @@
> > +int RN(Register r);
> > +
> > +class VFPRegister
> > +{
> > +public:
>
> Two-space indent on public/protected/private
>
> @@ +593,5 @@
> > +public:
> > + EDtrOffReg(Register rm) : EDtrOff(rm) {}
> > +};
> > +
> > +class EDtrAddr {
>
Yeah, I'd just been using whatever emacs did by default after I removed the brain-deadness
of gnu-style
Assignee | ||
Comment 5•13 years ago
|
||
does not address the location of the class IonFrameData.
should address everything else.
Attachment #563692 -
Attachment is obsolete: true
Attachment #563692 -
Flags: review?(Jacob.Bramley)
Attachment #564320 -
Flags: review?(Jacob.Bramley)
Comment 6•13 years ago
|
||
Comment on attachment 563692 [details] [diff] [review]
rebased patch, minor fixes
Review of attachment 563692 [details] [diff] [review]:
-----------------------------------------------------------------
Firstly, for patches this size, I'm _really_ glad we have Splinter reviews!
> I'd like to have consistency between all of the different methods of
> encoding fields in instructions, but ended up leaving some fields as enums
> that don't have trailing zeros, and are assigned into a bitfield that lives
> in the correct place, and leaving some fields as enums that have some absurd
> number of trailing zeros so they can simply be or'ed into the final constant
I'm not sure of the boundaries between ARMAssembler and Assembler-arm. There's a lot of low-level assembly going on outside of ARMAssembler. What's going on there? Are we replacing Nitro?
I very much like the enums with all combinations, like "WriteBack" and "NoWriteBack" even if one is 0. This makes code much clearer.
You seem to use toInt() to return some encoding that can be combined into an instruction using |. I'd like to suggest using "encode()" or something similar, to avoid confusion with immediate integers and suchlike.
Just like Jaeger Monkey, there is a lot of code in header files. This makes for a very painful build process, particularly as these headers tend to end up getting #included quite heavily. Is it possible to move more code into CPP files? This was a source of great pain in Jaeger Monkey.
Since it looks like we now have an ARM-specific macro assembler interface, is it feasible to give it a destination-first argument order? This would match the assembler as well as ARM assembly conventions.
I appreciate all the comments, but they could do with capital letters and suchlike. They would then be much easier to read.
Finally, I've looked at the ARM code, but I'm not very familiar with the way things are done in IM so it was hard to see if some parts were correct or not.
::: js/src/ion/IonFrames.h
@@ +61,5 @@
> // The layout of an Ion frame on the C stack:
> // argN _
> // ... \ - These are jsvals
> // arg0 /
> // this _/
ARM has padding here, too. It's worth mentioning here to avoid confusion.
::: js/src/ion/arm/Architecture-arm.h
@@ +117,5 @@
> + static const uint32 JSCallClobberMask =
> + (1 << JSC::ARMRegisters::r0) |
> + (1 << JSC::ARMRegisters::r1) |
> + (1 << JSC::ARMRegisters::r2) |
> + (1 << JSC::ARMRegisters::r3);
Does it matter that IP and LR are also caller-saved?
@@ +147,5 @@
> static const uint32 NonAllocatableMask =
> // the scratch float register for ARM.
> (1 << JSC::ARMRegisters::SD0);
> + static const uint32 AllocatableMask = AllMask & ~NonAllocatableMask;
> + static const uint32 JSCallClobberMask = AllocatableMask;
That looks wrong, but probably only in that we're preserving more registers than necessary. D0-D7 are caller-saved (like r0-r3) but D8-D15 are callee-saved (like r4-r11) and thus don't belong in JSCallClobberMask.
D16-D31 are callee-saved on processors that have them.
@@ +153,2 @@
> };
> +bool hasARMv7();
Two points:
* I can't find an implementation of this. Am I missing something obvious?
* I doubt that this mechanism the resolution that you want. I'd prefer something like hasMOVWT(), for example. That allows elegant extension for optional ARMv7 features, like UDIV/SDIV (hardware integer divide) or VFPv3-D32 (extra VFP registers).
::: js/src/ion/arm/Assembler-arm.cpp
@@ +60,3 @@
> for (size_t i = 0; i < jumps_.length(); i++) {
> RelativePatch &rp = jumps_[i];
> + JS_NOT_REACHED("too lazy");
Flag these with TODO or something similar, so they don't get left behind. In six months' time, some "too lazy" assertion won't make a lot of sense when the rest of the engine is complete.
@@ +135,5 @@
> {
> + ASSERT(m_buffer.sizeOfConstantPool() == 0);
> + memcpy(buffer, m_buffer.data(), m_buffer.size());
> + // whoops, can't call this; it is ARMAssembler-specific.
> + //fixUpOffsets(buffer);
It might be ARMAssembler-specific, but it's really important. It's used for patching relative jumps. Nasty things will happen if you don't call it.
@@ +158,5 @@
> }
> }
>
> Assembler::Condition
> Assembler::inverseCondition(Condition cond)
Tip: ARM condition codes are encoded in four bits. The least significant of those bits can be inverted to give you the opposite condition. This is true for all conditions except 'AL' (always), which you should assert for anyway.
I'm not sure if you can make use of that here, since the Condition type is generic, but it does make things simpler.
There's a complete table of ARM condition codes here: blogs.arm.com/software-enablement/206-condition-codes-1-condition-flags-and-codes/
@@ +272,5 @@
> + imm2 = ((imm >> (32 - imm2shift)) | (imm << imm2shift)) & 0xff;
> + return TwoImm8mData(datastore::Imm8mData(imm1,imm1shift),
> + datastore::Imm8mData(imm2, imm2shift));
> + }
> +}
I've not reviewed this function thoroughly, though it looks roughly sane. If you haven't done it already, it needs independent testing, at least once to verify correctness. There are many corner-cases and assessing correctness by review is hard and error-prone.
@@ +312,5 @@
> +
> +bool js::ion::can_dbl(ALUOp op)
> +{
> + // some instructions can't be processed as two separate instructions
> + // such as and, and possibly add (when we're setting ccodes).
Ooh, fancy! I'm looking forward to seeing that.
@@ +319,5 @@
> +
> +bool js::ion::condsAreSafe(ALUOp op) {
> + // even when we are setting condition codes, sametimes the
> + // codes only depend on the value of the result (N,Z)
> + // other times, (such as add)
This comment doesn't make the purpose of the function clear. Is it something to do with logical operations not updating the carry (C) and overflow (V) flags? Some condition codes are still safe to use after those.
::: js/src/ion/arm/Assembler-arm.h
@@ +99,5 @@
> +static const FloatRegister d11 = {JSC::ARMRegisters::d11};
> +static const FloatRegister d12 = {JSC::ARMRegisters::d12};
> +static const FloatRegister d13 = {JSC::ARMRegisters::d13};
> +static const FloatRegister d14 = {JSC::ARMRegisters::d14};
> +static const FloatRegister d15 = {JSC::ARMRegisters::d15};
Consider including D16-D31. (Not necessarily for the first port, though.)
@@ +115,5 @@
> + enum RegType {
> + Double,
> + Single,
> + UInt,
> + Int
Since UInt and Int are indistinguishable from the public interface, why separate them now?
@@ +116,5 @@
> + Double,
> + Single,
> + UInt,
> + Int
> + };
Since this enum goes into a bitfield, give each element an explicit value and add a comment to make sure it doesn't get extended past the bitfield size.
@@ +121,5 @@
> +private:
> + RegType kind:2;
> + // ARM doesn't have more than 32 registers...
> + // don't take more bits than we'll need.
> + int _code:5;
This isn't enough to address D16-D31. That doesn't matter if you don't support them, but you mention the lack of S registers for D16-D31 in singleOverlay.
@@ +146,5 @@
> + VFPRegister singleOverlay() {
> + if (kind == Double) {
> + // There are no corresponding float registers for d16-d31
> + ASSERT(_code < 16);
> + return VFPRegister(_code << 1, Double);
Does it matter that a double actually aliases two single-precision registers?
@@ +180,5 @@
> +enum Index {
> + Offset = 0 << 21 | 1<<24,
> + PreIndex = 1<<21 | 1 << 24,
> + PostIndex = 0 << 21 | 0 << 24
> + // the docs were rather unclear on this. it sounds like 1<<21 | 0 << 24 encodes dtrt
Yep. LDRT isn't useful outside the kernel, ignore it.
Index = bit24, Wback = !bit24 || bit21
In practice, bit21 sets wback except for post-index, where wback is implicit and bit21 has special meaning.
@@ +185,3 @@
> };
>
> +// seriously, wtf arm
:-)
These could do with some simple comments with example instructions, like "LDR", "LDRD" and "ADD". It's not clear what 'EDTR' means here, for example.
@@ +195,3 @@
> };
>
> +enum IsImm_ {
IsOp2Imm or something similar might be nice. These are all instructions with the ALU "Operand 2" flexible operand.
@@ +203,5 @@
> + LSL = 0, // << 5
> + LSR = 1, // << 5
> + ASR = 2, // << 5
> + ROR = 3, // << 5
> + RRX = ROR // RRX is just ROR with a 0 offset.
RRX is _encoded as_ ROR [...]
The comment makes it sound like it's equivalent.
@@ +212,5 @@
> +struct ConditionCodes {
> + bool Zero : 1;
> + bool Overflow : 1;
> + bool Carry : 1;
> + bool Minus : 1;
I'd make this a single field with some inline getters, with the encoding matching that in the APSR.
@@ +268,5 @@
> + op_cmp = 0xa << 21,
> + op_teq = 0x9 << 21,
> + op_tst = 0x8 << 21,
> + op_movw,
> + op_movt,
Why don't MOVW and MOVT have encodings?
@@ +269,5 @@
> + op_teq = 0x9 << 21,
> + op_tst = 0x8 << 21,
> + op_movw,
> + op_movt,
> + op_invalid
I'd make op_invalid something like 0xff000000 to make sure it's an invalid instruction.
@@ +327,5 @@
> +// structures. The reason this is so horribly round-about is I wanted
> +// to have Imm8 and RegisterShiftedRegister inherit directly from Operand2
> +// but have all of them take up only a single word of storage.
> +// I also wanted to avoid passing around raw integers at all
> +// since they are error prone.
Oh, excellent. I like the system and the comment is really helpful.
@@ +332,5 @@
> +namespace datastore {
> + struct Reg {
> + // the "second register"
> + uint32 RM : 4;
> + // do we get another register for shifting
Yes, you get another register for shifting (Rs). It's another 4-bit field [11:8]. I'd make an entirely new struct for register-shifted-registers, and make RSR=0 for this one.
@@ +339,5 @@
> + ShiftType Type : 2;
> + // I'd like this to be a more sensible encoding, but that would
> + // need to be a struct and that would not pack :(
> + uint32 ShiftAmount : 5;
> + uint32 pad : 20;
Is the padding field necessary?
@@ +361,5 @@
> + uint32 buff : 19;
> + public:
> + uint32 invalid : 1;
> + uint32 toInt() {
> + return data | rot << 8;
Should you ASSERT(!invalid) at this point?
@@ +382,5 @@
> + Imm8Data(uint32 imm) : imm4L(imm&0xf), imm4H(imm>>4){
> + JS_ASSERT(imm <= 0xff);
> + }
> + };
> + // LDR/STR take an 8 bit offset, which is implicitly left shifted
VLDR/VSTR
@@ +392,5 @@
> + uint32 toInt() {
> + return data;
> + };
> + Imm8VFPOffData(uint32 imm) : data (imm) {
> + JS_ASSERT(imm <= (0xff << 2));
0xff<<2 won't fit into unsigned char. I'd just make 'data' a uint32 field. and ASSERT(!(data & ~(0xff<<2))).
@@ +409,5 @@
> + return imm4L | (imm4H << 16);
> + };
> + Imm8VFPImmData(uint32 imm) : imm4L(imm&0xf), imm4H(imm>>4){
> + JS_ASSERT(imm <= 0xff);
> + }
VFP immediates are only available to VFPv3, which means ARMv7. It might be worth asserting that in here somewhere.
@@ +418,5 @@
> + Imm12Data(uint32 imm) : data(imm) { JS_ASSERT(data == imm); }
> + uint32 toInt() { return data; }
> + };
> +
> + struct RSI {
What's this for? What does RSI mean? If it's "register-shifted immediate", then why does it take an immediate argument?
@@ +429,5 @@
> + struct RSR {
> + uint32 MustZero : 1;
> + // the register that holds the shift amount
> + uint32 RS : 4;
> + RSR(uint32 rs) : RS(rs) {}
I think you need an ASSERT(rs == RS) here.
@@ +458,5 @@
> +
> +class Imm8 : public Operand2 {
> +public:
> + static datastore::Imm8mData encodeImm(uint32 imm) {
> + // on arm, clz is undefined if you call it with 0.
The ARMv5T CLZ instruction returns 32 if you call it with 0. I suggest using that, rather than __builtin_clz (which is undefined for 0), but that requires a change in jsbit.h. (Note that __builtin_clz probably uses CLZ anyway.)
@@ +463,5 @@
> + if (imm == 0)
> + return datastore::Imm8mData(0, 0);
> + int left = js_bitscan_clz32(imm) & 30;
> + // if the first bit set
> + if (left >= 24)
"left >= 24" doesn't check the first bit. That would be "left >= 16", surely.
@@ +476,5 @@
> + return datastore::Imm8mData();
> + unsigned int mask = imm << (8 - right) | imm >> (24 + right);
> + if (mask <= 0xff)
> + return datastore::Imm8mData(mask, (8-right) >> 1);
> + return datastore::Imm8mData();
In the interest of getting this patch reviewed quickly, I've not studied how this bit works, but it has many conditional paths which all need testing as these routines can easily hide bugs in rarely-hit paths.
@@ +496,5 @@
> + : Operand2(datastore::Reg(rm.code(), type, 0, shiftImm.toInt())) {}
> + Op2Reg(Register rm, ShiftType type, datastore::RSR shiftReg)
> + : Operand2(datastore::Reg(rm.code(), type, 1, shiftReg.toInt())) {}
> +};
> +class O2RegShiftImm : public Op2Reg {
Regsiter-shifted immediate doesn't make sense. Is this an immediate-shifted register?
@@ +518,5 @@
> +// an offset from a register to be used for ldr/str. This should include
> +// the sign bit, since ARM has "signed-magnitude" offsets. That is it encodes
> +// an unsigned offset, then the instruction specifies if the offset is positive
> +// or negative. The +/- bit is necessary if the instruction set wants to be
> +// able to have a negative register offset e.g. ldr pc, [r1,-r2];
Good comment.
@@ +536,5 @@
> + { JS_ASSERT((imm < 4096) && (imm > -4096)); }
> +};
> +
> +class DtrOffReg : public DtrOff {
> + // don't call these unless you know what you're doing
Why not? What do they do that's unexpected?
@@ +631,5 @@
> + }
> + BOffImm(int offset) : data (offset >> 2 & 0x00ffffff) {
> + JS_ASSERT ((offset & 0x3) == 0);
> + JS_ASSERT (((offset & 0xff000000) == 0) ||
> + ((offset & 0xff000000) == 0xff000000));
The range here is wrong, in that it needs to be scaled. Actually I usually find it clearer to express the limits in unscaled decimal form (as in the ARM ARM), to save confusion: -33554432 to 33554428, inclusive.
@@ +639,5 @@
> + uint32 lower : 12;
> + uint32 pad : 4;
> + uint32 upper : 4;
> +public:
> + Imm16(unsigned int imm)
s/unsigned int/uint32/
(based on previous usage)
@@ +701,5 @@
> + GreaterThanOrEqual = JSC::ARMAssembler::GE,
> + LessThan = JSC::ARMAssembler::LT,
> + LessThanOrEqual = JSC::ARMAssembler::LE,
> + Overflow = JSC::ARMAssembler::VS,
> + Signed = JSC::ARMAssembler::MI,
I dislike the name "Signed". It really means "negative". A signed value can still be positive.
@@ +704,5 @@
> + Overflow = JSC::ARMAssembler::VS,
> + Signed = JSC::ARMAssembler::MI,
> + Zero = JSC::ARMAssembler::EQ,
> + NonZero = JSC::ARMAssembler::NE,
> + Always = JSC::ARMAssembler::AL,
There are inconsistencies here. GT,GE,LT and LE are signed comparisons, whilst HI,LS,CS and CC are unsigned. In our list, for some reasons, 'Below', 'Above' and 'AboveOrEqual' are unsigned whilst the rest are signed. Firstly, that's a strange difference in names that otherwise look equivalent ('Below' vs 'LessThan'), but more importantly 'Below' and 'BelowOrEqual' differ on this.
@@ +711,5 @@
> + GreaterEqual_Unordered = AboveOrEqual,
> + Unordered = Overflow,
> + NotUnordered = JSC::ARMAssembler::VC,
> + Greater_Unordered = Above
> + // there are more.
It would be cleaner to specify these independently of the integer comparisons.
@@ +828,5 @@
> + // write a blob of binary into the instruction stream
> + void writeBlob(uint32 x)
> + {
> + m_buffer.putInt(x);
> + }
writeWord or writeData (or indeed putInt) would be better names than writeBlob. writeBlob sounds like it ought to take a pointer and a length parameter.
@@ +833,5 @@
>
> public:
> void align(int alignment) {
> + while (!m_buffer.isAligned(alignment))
> + as_mov(r0, O2Reg(r0));
A 'nop' function would be preferable.
@@ +914,5 @@
> + // not quite ALU worthy, but useful none the less:
> + // these also have the isue of these being formatted
> + // completly differently from the standard ALU operations.
> + void as_movw(Register dest, Imm16 imm, Condition c = Always) {
> + writeBlob(0x03000000 | c | imm.toInt() | RD(dest));
ASSERT(hasMOVWT, or ARMv7 (or ARMv6T2 if we're getting technical))
@@ +921,5 @@
> + writeBlob(0x03400000 | c | imm.toInt() | RD(dest));
> + }
> + // data transfer instructions: ldr, str, ldrb, strb.
> + // using an int to differentiate between 8 bits and 32 bits is
> + // overkill, but meh
Probably not overkill. This will likely be inlined, so the int will disappear. Packing and unpacking code would probably not get optimized out. However it would be cleaner to have two separate functions: as_dtr32 and as_dtr8u, perhaps.
@@ +931,5 @@
> + RT(rt) | addr.toInt());
> + return;
> + }
> + // handles all of the other integral data transferring functions:
> + // ldrsb, ldrsh, ldrd, etc.
Comment that 'size' is in bits.
@@ +933,5 @@
> + }
> + // handles all of the other integral data transferring functions:
> + // ldrsb, ldrsh, ldrd, etc.
> + void as_extdtr(LoadStore ls, int size, bool IsSigned, Index mode,
> + Register rt, EDtrAddr addr, Condition c = Always)
LDRD and STRD have two source/destination registers, so I'd add as_dtr64 to handle those specially. The second register isn't encoded (in ARM code) but specifying it explicitly allows a handy assertion and makes it clear to calling code that two registers will be read or clobbered.
Also, again, calling code could be more readable if this were split into as_dtr8s, as_dtr16u, as_dtr16s and as_dtr64.
@@ +936,5 @@
> + void as_extdtr(LoadStore ls, int size, bool IsSigned, Index mode,
> + Register rt, EDtrAddr addr, Condition c = Always)
> + {
> + int op2;
> + int op1;
I'd call these 'base' and 'offset' to avoid confusion with 'Operand 2'.
@@ +937,5 @@
> + Register rt, EDtrAddr addr, Condition c = Always)
> + {
> + int op2;
> + int op1;
> + switch(size / 8) {
Why not just have "case 8", "case 16" and "case 64", rather than divide here?
@@ +939,5 @@
> + int op2;
> + int op1;
> + switch(size / 8) {
> + case 1:
> + JS_ASSERT(IsSigned);
Also ASSERT(ls != IsStore) as there is no STRSB. (It would be identical to STRB.)
@@ +944,4 @@
> break;
> + case 2:
> + //case 4:
> + // doesn't need to be signed.
It'd be clearer to have the opposite comment on LDRSB.
@@ +954,3 @@
> break;
> + case 8:
> + JS_ASSERT(IsSigned);
LDRD/STRD are neither signed nor unsigned as no extension is performed. IsSigned doesn't carry useful information in this case, so why not allow a semantically-correct LDRD of a uint64_t, for example?
@@ +981,5 @@
> + void as_bx(Register r, Condition c = Always)
> + {
> + writeBlob(((int) c) | op_bx | r.code());
> + }
> + // branch can branch to an immediate *or* to a register
B <imm> is relative, B <reg> is aboslute. This difference is obviously very important but may not be obvious to calling code.
@@ +996,4 @@
> } else {
> + // ugh.
> + int32 old = l->use(next.getOffset());
> + if (old == -1) {
If you're trying to check for an as-yet unused Label, compare with LabelBase::INVALID_OFFSET, not an explicit -1. That way, it's obvious what you're doing.
@@ +999,5 @@
> + if (old == -1) {
> + old = -4;
> + }
> + // this will fail if we couldn't actually
> + // encode the offset of the branch.
What happens then? Do we need a slow fall-back? Does this need a TODO?
@@ +1031,5 @@
> + void as_bl()
> + {
> + JS_NOT_REACHED("Feature NYI");
> + }
> + // bl can have a condition code, blx cannot.
BLX <reg> can be conditional, but BL <reg> cannot.
@@ +1056,5 @@
> +
> + // VFP instructions!
> + // Float only variants
> + void as_vfp_float(VFPRegister vd, VFPRegister vn, VFPRegister vm,
> + VFPOp op, Condition c = Always)
What's this function for?
@@ +1090,5 @@
> + void as_vnmls(VFPRegister vd, VFPRegister vn, VFPRegister vm,
> + Condition c = Always)
> + {
> + JS_NOT_REACHED("Feature NYI");
> + }
Why VNMUL,VNMLA and VNMLS, but not VMLA and VMLS? Is it just an incomplete list?
@@ +1120,5 @@
> + * and vfp registers are passed in based on their type, and src/dest is
> + * determined by the float2core.
> + */
> + void as_vxfer(Register vt1, Register vt2, VFPRegister vn, FloatToCore_ f2c,
> + Condition c = Always)
Why not overload as_vmov to keep the naming consistent with the assembly mnemonics?
@@ +1147,5 @@
> + /*also has update conditions*/Condition c = Always)
> + {
> + JS_NOT_REACHED("Feature NYI");
> + }
> + void as_vimm(VFPRegister vd /*, fimm8 imm*/)
This is another vmov variant.
@@ +1227,4 @@
> }
>
> + void as_bkpt() {
> + writeBlob(0xe1200070);
Be sure that you actually want a real breakpoint. They don't break in GDB in any useful fashion. I find the following more useful for debugging: "LDR ip, [ip, -ip]"
@@ +1431,5 @@
> + case 3:
> + *out = r3;
> + return true;
> + case 4:
> + *out = r4;
No, argument registers are r0-r3. (If this isn't for the EABI calling convention, add a comment to make that clear.)
::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +113,5 @@
> LBlock *ifTrue = test->ifTrue()->lir();
> LBlock *ifFalse = test->ifFalse()->lir();
>
> // Test the operand
> + masm.ma_tst(ToRegister(opd), ToRegister(opd));
ma_cmp would be preferable.
@@ +337,5 @@
>
> bool
> +CodeGeneratorARM::visitDivI(LDivI *ins)
> +{
> + JS_NOT_REACHED("codegen for DIVI NYI");
On cores without UDIV/SDIV, integer division is best handled by a call to a helper function,
@@ +558,2 @@
> default:
> JS_NOT_REACHED("unexpected opcode");
Do we really only support ADD and MUL?
@@ +586,5 @@
> + // then it was clamped to INT_MIN/INT_MAX, and we can test it.
> + // NOTE: if the value really was supposed to be INT_MAX / INT_MIN
> + // then it will be wrong.
> + // 2) convert the floating point value to an integer, if it did not fit,
> + // then it set one or two bits in the fpcsr. Check those.
Option 1 is probably the fastest here. Option 2 catches INT_MAX / INT_MIN but I don't think we care too much about those. Option 3 is probably the slowest because it involves several VFP operations with dependencies.
@@ +596,1 @@
> // "x86-only"
That's probably in the wrong place.
::: js/src/ion/arm/IonFrames-arm.h
@@ +52,5 @@
> + protected:
> + void *returnAddress_;
> + uintptr_t sizeDescriptor_;
> + void *calleeToken_;
> + void *padding;
I assume the padding is there to ensure 8-byte alignment. Can you add a comment to explain that please?
::: js/src/ion/arm/Lowering-arm.cpp
@@ +167,5 @@
> LSnapshot *snapshot = LSnapshot::New(gen, lastResumePoint_);
> if (!snapshot)
> return false;
>
> + for (size_t i = 0; i < lastResumePoint_->numOperands(); i++) {
Incorrect indentation.
@@ +268,5 @@
> +{
> + ins->setOperand(0, useRegister(lhs));
> +
> + // shift operator should be constant or in register ecx
> + // x86 can't shift a non-ecx register
x86 comments in an ARM file.
::: js/src/ion/arm/MacroAssembler-arm.h
@@ +47,5 @@
> namespace js {
> namespace ion {
>
> +static Register CallReg = ip;
> +// this is grabbing the arm assembler
What is? This comment isn't clear.
@@ +237,5 @@
> + ma_alu(InvalidReg, imm, dest, op_mov, sc, c);
> + }
> + void ma_mov(const ImmGCPtr &ptr, Register dest) {
> + ma_mov(Imm32(ptr.value), dest);
> + JS_NOT_REACHED("todo:make gc more sane.");
What does this have to do with GC?
@@ +293,5 @@
> + ma_alu(src1, imm, dest, op_and, sc, c);
> + }
> +
> +
> + // bit clear (dest <- src1 & ~src2)
Your comment doesn't match the prototype. You have two sources but the function uses dest as a source.
@@ +622,2 @@
> }
> void checkCallAlignment() {
Make this #ifdef DEBUG to ensure that we don't emit these checks for release code.
@@ +709,2 @@
> JS_ASSERT(dest != ScratchFloatReg);
> + // Implicitly assume that we have access to VFPv2
The comment isn't worthwhile.
::: js/src/ion/arm/MoveEmitter-arm.cpp
@@ +95,5 @@
> MoveEmitterARM::doubleSpillSlot() const
> {
> + int offset = masm.framePushed() - pushedAtCycle_;
> + JS_ASSERT(offset < 4096 && offset > -4096);
> + // NOTE: this isn't correct. Since we're spilling a double
Put in an ASSERT_NOT_REACHED or something like that. Otherwise, there's a risk that the broken code will get left behind, particularly as some offsets will still work anyway.
@@ +136,1 @@
> // random, and if it ends up being bad, we can use actual heuristics later.
I think that might end up being bad. It'll do for now, though.
@@ +157,5 @@
> if (pushedAtDoubleSpill_ == -1) {
> masm.reserveStack(sizeof(double));
> pushedAtDoubleSpill_ = masm.framePushed();
> }
> + //masm.ma_vstr(spilledFloatReg_, doubleSpillSlot().EDTRAddr());
Why is this commented out? Isn't that broken?
@@ +319,2 @@
> if (pushedAtSpill_ != -1 && spilledReg_ != InvalidReg)
> + masm.ma_ldr(spillSlot().toDTRAddr(), spilledReg_);
Curly braces.
::: js/src/ion/arm/Trampoline-arm.cpp
@@ +74,3 @@
> * =r0 =r1 =r2 =r3
> * ...using standard cdecl calling convention
> * ...cdecl has no business existing on ARM
CDECL isn't a term that's used in ARM (as far as I've seen).
Note that 'this' takes the first argument in C++ member functions. I assume that's not important here.
@@ +127,2 @@
>
> // loop over arguments, copying them over
Why do we do this? Can't we access the arguments in-place, above the preserved registers?
@@ +194,2 @@
> for (uint32 i = 0; i < Registers::Total; i++)
> + masm.transferReg(Register::FromCode(i));
From the PUSH documentation:
The SP and PC can be in the list in ARM instructions, but not in Thumb instructions. However:
* ARM deprecates the use of ARM instructions that include the SP or the PC in the list
* if the SP is in the list, and it is not the lowest-numbered register in the list, the instruction stores an unknown value for the SP.
@@ +225,5 @@
> + masm.as_sub(sp, sp, Imm8(4));
> +
> + // set the old (4-byte aligned) value of the sp as the first argument
> + masm.setABIArg(0, r0);
> +
Why not just push a dummy register along with the frameClass value? The following sequence will suffice, if I've understood correctly:
PUSH {r4,ip} // r4 (frameClass) goes into the _lower_ address here.
MOV r0, sp
Attachment #563692 -
Attachment is obsolete: false
Assignee | ||
Comment 7•13 years ago
|
||
(In reply to Jacob Bramley [:jbramley] from comment #6)
> > I'd like to have consistency between all of the different methods of
> > encoding fields in instructions, but ended up leaving some fields as enums
> > that don't have trailing zeros, and are assigned into a bitfield that lives
> > in the correct place, and leaving some fields as enums that have some absurd
> > number of trailing zeros so they can simply be or'ed into the final constant
>
> I'm not sure of the boundaries between ARMAssembler and Assembler-arm.
> There's a lot of low-level assembly going on outside of ARMAssembler. What's
> going on there? Are we replacing Nitro?
>
That is my plan. If I've done this correctly, There shouldn't be any references to
Nitro other than the register definitions.
> I very much like the enums with all combinations, like "WriteBack" and
> "NoWriteBack" even if one is 0. This makes code much clearer.
>
> You seem to use toInt() to return some encoding that can be combined into an
> instruction using |. I'd like to suggest using "encode()" or something
> similar, to avoid confusion with immediate integers and suchlike.
>
> Just like Jaeger Monkey, there is a lot of code in header files. This makes
> for a very painful build process, particularly as these headers tend to end
> up getting #included quite heavily. Is it possible to move more code into
> CPP files? This was a source of great pain in Jaeger Monkey.
>
I'll do that after I get this landed. I'm curious about how much gets inlined, and
how much that impacts performance (and start time/executable size)
> Since it looks like we now have an ARM-specific macro assembler interface,
> is it feasible to give it a destination-first argument order? This would
> match the assembler as well as ARM assembly conventions.
>
The arm-specific macro assembler was quite an afterthought. I'll swizzle the arguments
around when I move function bodies to .cpp files.
> I appreciate all the comments, but they could do with capital letters and
> suchlike. They would then be much easier to read.
>
I'll try. I may miss some.
> Finally, I've looked at the ARM code, but I'm not very familiar with the way
> things are done in IM so it was hard to see if some parts were correct or
> not.
>
> ::: js/src/ion/IonFrames.h
> @@ +61,5 @@
> > // The layout of an Ion frame on the C stack:
> > // argN _
> > // ... \ - These are jsvals
> > // arg0 /
> > // this _/
>
> ARM has padding here, too. It's worth mentioning here to avoid confusion.
>
> ::: js/src/ion/arm/Architecture-arm.h
> @@ +117,5 @@
> > + static const uint32 JSCallClobberMask =
> > + (1 << JSC::ARMRegisters::r0) |
> > + (1 << JSC::ARMRegisters::r1) |
> > + (1 << JSC::ARMRegisters::r2) |
> > + (1 << JSC::ARMRegisters::r3);
>
> Does it matter that IP and LR are also caller-saved?
>
tbh, I'm not sure what these masks are used for.
> @@ +153,2 @@
> > };
> > +bool hasARMv7();
>
> Two points:
> * I can't find an implementation of this. Am I missing something obvious?
Whoops. My bad. I'd moved that into a new file that was not included due to a lack of hg add.
fyi, its implementation is "return true;"
> * I doubt that this mechanism the resolution that you want. I'd prefer
> something like hasMOVWT(), for example. That allows elegant extension for
> optional ARMv7 features, like UDIV/SDIV (hardware integer divide) or
> VFPv3-D32 (extra VFP registers).
>
This sounds like a good idea.
> ::: js/src/ion/arm/Assembler-arm.cpp
> @@ +60,3 @@
> > for (size_t i = 0; i < jumps_.length(); i++) {
> > RelativePatch &rp = jumps_[i];
> > + JS_NOT_REACHED("too lazy");
>
> Flag these with TODO or something similar, so they don't get left behind. In
> six months' time, some "too lazy" assertion won't make a lot of sense when
> the rest of the engine is complete.
>
yeah. "too lazy" was actually my response to "what message should I use to indicate this hasn't been implemented?"
"eh, I'm too lazy to think of one"
thankfully, sed exists.
> @@ +135,5 @@
> > {
> > + ASSERT(m_buffer.sizeOfConstantPool() == 0);
> > + memcpy(buffer, m_buffer.data(), m_buffer.size());
> > + // whoops, can't call this; it is ARMAssembler-specific.
> > + //fixUpOffsets(buffer);
>
> It might be ARMAssembler-specific, but it's really important. It's used for
> patching relative jumps. Nasty things will happen if you don't call it.
>
I agree. Right now those aren't actually generated, so it is a moot point.
I will add a comment reminding me to re-implement/copy that function when things
inevitably break.
> @@ +158,5 @@
> > }
> > }
> >
> > Assembler::Condition
> > Assembler::inverseCondition(Condition cond)
>
> Tip: ARM condition codes are encoded in four bits. The least significant of
> those bits can be inverted to give you the opposite condition. This is true
> for all conditions except 'AL' (always), which you should assert for anyway.
>
> I'm not sure if you can make use of that here, since the Condition type is
> generic, but it does make things simpler.
>
> There's a complete table of ARM condition codes here:
> blogs.arm.com/software-enablement/206-condition-codes-1-condition-flags-and-
> codes/
>
yeah. I'm actually curious if gcc could do that optimization based on the switch.
> @@ +272,5 @@
> > + imm2 = ((imm >> (32 - imm2shift)) | (imm << imm2shift)) & 0xff;
> > + return TwoImm8mData(datastore::Imm8mData(imm1,imm1shift),
> > + datastore::Imm8mData(imm2, imm2shift));
> > + }
> > +}
>
> I've not reviewed this function thoroughly, though it looks roughly sane. If
> you haven't done it already, it needs independent testing, at least once to
> verify correctness. There are many corner-cases and assessing correctness by
> review is hard and error-prone.
>
Yup. Tested against your function in just over 4 billion cases.
I should probably integrate this with make test
> @@ +319,5 @@
> > +
> > +bool js::ion::condsAreSafe(ALUOp op) {
> > + // even when we are setting condition codes, sametimes the
> > + // codes only depend on the value of the result (N,Z)
> > + // other times, (such as add)
>
> This comment doesn't make the purpose of the function clear. Is it something
> to do with logical operations not updating the carry (C) and overflow (V)
> flags? Some condition codes are still safe to use after those.
>
will fix. The idea is if we logically want to have adds r0, r1, 0xffff,
and the condition being checked is V, we can't break it into adds r0, r1, 0xff00; adds r0, r0, 0xff
since they set condition codes differently; the value of the condition code does not depend solely on the
final value stored into r0, it depends on the whole operation.
> @@ +115,5 @@
> > + enum RegType {
> > + Double,
> > + Single,
> > + UInt,
> > + Int
>
> Since UInt and Int are indistinguishable from the public interface, why
> separate them now?
>
That is purely for vcvt, where it can do a signed or unsigned conversion.
granted, we will likely never generate an unsigned conversion, but if we can think
of some way of using it, this will make our lives easier. I can add a comment to this effect
if you'd like.
> @@ +116,5 @@
> > + Double,
> > + Single,
> > + UInt,
> > + Int
> > + };
>
> Since this enum goes into a bitfield, give each element an explicit value
> and add a comment to make sure it doesn't get extended past the bitfield
> size.
>
iirc, gcc warns if the bitfield is not large enough to hold the enum, but
explicitly naming them is probably a good idea.
> @@ +121,5 @@
> > +private:
> > + RegType kind:2;
> > + // ARM doesn't have more than 32 registers...
> > + // don't take more bits than we'll need.
> > + int _code:5;
>
> This isn't enough to address D16-D31. That doesn't matter if you don't
> support them, but you mention the lack of S registers for D16-D31 in
> singleOverlay.
So it isn't.
>
> @@ +146,5 @@
> > + VFPRegister singleOverlay() {
> > + if (kind == Double) {
> > + // There are no corresponding float registers for d16-d31
> > + ASSERT(_code < 16);
> > + return VFPRegister(_code << 1, Double);
>
> Does it matter that a double actually aliases two single-precision registers?
There are probably some circumstances that it does. Their counterparts in nitro's assembler
are used for double -> single and single -> double conversions. to convert r1 to d4, we
generate vmov s8, r1; vcvt.f64.i32 d4, s8, and to get s8 from d4, singleOverlay() is called.
> @@ +268,5 @@
> > + op_cmp = 0xa << 21,
> > + op_teq = 0x9 << 21,
> > + op_tst = 0x8 << 21,
> > + op_movw,
> > + op_movt,
>
> Why don't MOVW and MOVT have encodings?
They shouldn't be in that enum, since they don't fit
the encoding of all of the other ALU operations. I'd initially included them
by scanning down the list in the nitro assembler.
> @@ +332,5 @@
> > +namespace datastore {
> > + struct Reg {
> > + // the "second register"
> > + uint32 RM : 4;
> > + // do we get another register for shifting
>
> Yes, you get another register for shifting (Rs). It's another 4-bit field
that was more of "this bit is answering said question"
> [11:8]. I'd make an entirely new struct for register-shifted-registers, and
> make RSR=0 for this one.
>
I suspect the "must be 0" part, no matter how emphatic, has been deprecated.
I will look into it and remove it from existence if necessary.
> @@ +339,5 @@
> > + ShiftType Type : 2;
> > + // I'd like this to be a more sensible encoding, but that would
> > + // need to be a struct and that would not pack :(
> > + uint32 ShiftAmount : 5;
> > + uint32 pad : 20;
>
> Is the padding field necessary?
>
I'd like the operation to copy them out to be "transfer 32 bits out of this structure"
(previously done with memcpy), and those bits won't be 0 if I don't explicitly specify
a field for them and zero them.
> @@ +361,5 @@
> > + uint32 buff : 19;
> > + public:
> > + uint32 invalid : 1;
> > + uint32 toInt() {
> > + return data | rot << 8;
>
> Should you ASSERT(!invalid) at this point?
>
Yes!
> @@ +418,5 @@
> > + Imm12Data(uint32 imm) : data(imm) { JS_ASSERT(data == imm); }
> > + uint32 toInt() { return data; }
> > + };
> > +
> > + struct RSI {
>
> What's this for? What does RSI mean? If it's "register-shifted immediate",
> then why does it take an immediate argument?
>
It is the immediate in a register shifted immediate. I couldn't think of a better name
specifically for the immediate.
> @@ +458,5 @@
> > +
> > +class Imm8 : public Operand2 {
> > +public:
> > + static datastore::Imm8mData encodeImm(uint32 imm) {
> > + // on arm, clz is undefined if you call it with 0.
>
> The ARMv5T CLZ instruction returns 32 if you call it with 0. I suggest using
> that, rather than __builtin_clz (which is undefined for 0), but that
> requires a change in jsbit.h. (Note that __builtin_clz probably uses CLZ
> anyway.)
>
Yes I'm hoping that it always compiles to a single CLZ on recent enough arms.
The only reason this should be faster is that CLZ should be a single cycle (or maybe 2).
> @@ +463,5 @@
> > + if (imm == 0)
> > + return datastore::Imm8mData(0, 0);
> > + int left = js_bitscan_clz32(imm) & 30;
> > + // if the first bit set
> > + if (left >= 24)
>
> "left >= 24" doesn't check the first bit. That would be "left >= 16", surely.
not too sure what that comment was supposed to mean, but it is checking to see if it is
a "simple" case that can be encoded with a rotate of 0.
> @@ +476,5 @@
> > + return datastore::Imm8mData();
> > + unsigned int mask = imm << (8 - right) | imm >> (24 + right);
> > + if (mask <= 0xff)
> > + return datastore::Imm8mData(mask, (8-right) >> 1);
> > + return datastore::Imm8mData();
>
> In the interest of getting this patch reviewed quickly, I've not studied how
> this bit works, but it has many conditional paths which all need testing as
> these routines can easily hide bugs in rarely-hit paths.
>
This function was tested at the same time that I tested encodeTwoImms (all 2^32 integers)
> @@ +496,5 @@
> > + : Operand2(datastore::Reg(rm.code(), type, 0, shiftImm.toInt())) {}
> > + Op2Reg(Register rm, ShiftType type, datastore::RSR shiftReg)
> > + : Operand2(datastore::Reg(rm.code(), type, 1, shiftReg.toInt())) {}
> > +};
> > +class O2RegShiftImm : public Op2Reg {
>
> Regsiter-shifted immediate doesn't make sense. Is this an immediate-shifted
> register?
>
Yes. This was me attempting to manually namespace by suffixing the more general case
with any relevant specifics. I read it as "Operand 2, Register, Shifted by an Immediate".
I'm open to suggestions for a better naming scheme.
> @@ +536,5 @@
> > + { JS_ASSERT((imm < 4096) && (imm > -4096)); }
> > +};
> > +
> > +class DtrOffReg : public DtrOff {
> > + // don't call these unless you know what you're doing
>
> Why not? What do they do that's unexpected?
>
Nothing really, Just they take arguments that aren't easy to be generated, and the intention is that code
that isn't in a subclasses' constructor shouldn't construct these. I think I wrote that while writing all of
these classes, and calling the wrong (similar looking) constructors.
> @@ +701,5 @@
> > + GreaterThanOrEqual = JSC::ARMAssembler::GE,
> > + LessThan = JSC::ARMAssembler::LT,
> > + LessThanOrEqual = JSC::ARMAssembler::LE,
> > + Overflow = JSC::ARMAssembler::VS,
> > + Signed = JSC::ARMAssembler::MI,
>
> I dislike the name "Signed". It really means "negative". A signed value can
> still be positive.
>
Those names are for compatibility with x86/indep code. (names and order copied from Assembler-x86-shared.h)
> @@ +933,5 @@
> > + }
> > + // handles all of the other integral data transferring functions:
> > + // ldrsb, ldrsh, ldrd, etc.
> > + void as_extdtr(LoadStore ls, int size, bool IsSigned, Index mode,
> > + Register rt, EDtrAddr addr, Condition c = Always)
>
> LDRD and STRD have two source/destination registers, so I'd add as_dtr64 to
> handle those specially. The second register isn't encoded (in ARM code) but
> specifying it explicitly allows a handy assertion and makes it clear to
> calling code that two registers will be read or clobbered.
>
> Also, again, calling code could be more readable if this were split into
> as_dtr8s, as_dtr16u, as_dtr16s and as_dtr64.
>
I'll write those as wrappers. This was my intention from the beginning.
> @@ +1056,5 @@
> > +
> > + // VFP instructions!
> > + // Float only variants
> > + void as_vfp_float(VFPRegister vd, VFPRegister vn, VFPRegister vm,
> > + VFPOp op, Condition c = Always)
>
> What's this function for?
>
All of the 3-register vfp operations like vadd, vsub, etc. It is meant to be analogous to as_alu.
Any name with "alu" in it didn't really sound right.
> @@ +1120,5 @@
> > + * and vfp registers are passed in based on their type, and src/dest is
> > + * determined by the float2core.
> > + */
> > + void as_vxfer(Register vt1, Register vt2, VFPRegister vn, FloatToCore_ f2c,
> > + Condition c = Always)
>
> Why not overload as_vmov to keep the naming consistent with the assembly
> mnemonics?
>
The massive overloading of vmov is one of the things that I like the least about UAL. When actually writing
assembly, It is somewhat easy to determine what the instruction is doing based on the "sigils", (r)0, (s)2, (d)3,
etc. However, when writing c++ code, it is not always as obvious. Also because (as far as I can tell), the encodings for the variants of vmov are not encoded in similar ways.
> @@ +1227,4 @@
> > }
> >
> > + void as_bkpt() {
> > + writeBlob(0xe1200070);
>
> Be sure that you actually want a real breakpoint. They don't break in GDB in
> any useful fashion. I find the following more useful for debugging: "LDR ip,
> [ip, -ip]"
>
Is that what gdb uses for breakpoints? I don't think that was used in nitro. In fact, I'm pretty sure I just
copied the constant from nitro.
> @@ +1431,5 @@
> > + case 3:
> > + *out = r3;
> > + return true;
> > + case 4:
> > + *out = r4;
>
> No, argument registers are r0-r3. (If this isn't for the EABI calling
> convention, add a comment to make that clear.)
Can you tell that this code is untested?
there are a bunch of other quite embarrassing bugs that I've found through testing.
>
> ::: js/src/ion/arm/CodeGenerator-arm.cpp
> @@ +113,5 @@
> > LBlock *ifTrue = test->ifTrue()->lir();
> > LBlock *ifFalse = test->ifFalse()->lir();
> >
> > // Test the operand
> > + masm.ma_tst(ToRegister(opd), ToRegister(opd));
>
> ma_cmp would be preferable.
>
> @@ +337,5 @@
> >
> > bool
> > +CodeGeneratorARM::visitDivI(LDivI *ins)
> > +{
> > + JS_NOT_REACHED("codegen for DIVI NYI");
>
> On cores without UDIV/SDIV, integer division is best handled by a call to a
> helper function,
>
I was planning on generating it inline, and falling back to generating a function if there are enough occurances
of it in the function.
> @@ +558,2 @@
> > default:
> > JS_NOT_REACHED("unexpected opcode");
>
> Do we really only support ADD and MUL?
>
we support MUL?
> @@ +586,5 @@
> > + // then it was clamped to INT_MIN/INT_MAX, and we can test it.
> > + // NOTE: if the value really was supposed to be INT_MAX / INT_MIN
> > + // then it will be wrong.
> > + // 2) convert the floating point value to an integer, if it did not fit,
> > + // then it set one or two bits in the fpcsr. Check those.
>
> Option 1 is probably the fastest here. Option 2 catches INT_MAX / INT_MIN
> but I don't think we care too much about those. Option 3 is probably the
> slowest because it involves several VFP operations with dependencies.
>
I think at least one of those comments is actually describing a different function.
> @@ +596,1 @@
> > // "x86-only"
>
> That's probably in the wrong place.
Sadly, no. I initially generated all of this by copying x86-shared and x86 into the arm directory.
that is the boundary between the shared and non-shared sections. I've been attempting to use these markers
to keep track of what file the code originated from. I'll remove them eventually, but for now they are still useful.
>
> ::: js/src/ion/arm/Lowering-arm.cpp
> @@ +167,5 @@
> > LSnapshot *snapshot = LSnapshot::New(gen, lastResumePoint_);
> > if (!snapshot)
> > return false;
> >
> > + for (size_t i = 0; i < lastResumePoint_->numOperands(); i++) {
>
> Incorrect indentation.
>
dvander already caught that one :)
>
> ::: js/src/ion/arm/MacroAssembler-arm.h
> @@ +47,5 @@
> > namespace js {
> > namespace ion {
> >
> > +static Register CallReg = ip;
> > +// this is grabbing the arm assembler
>
> What is? This comment isn't clear.
>
It is not. That was supposed to mean the following class MacroAssemblerARM : Assembler is
subclassing from Assembler defined in Assembler-arm, rather than anyplace else
> @@ +237,5 @@
> > + ma_alu(InvalidReg, imm, dest, op_mov, sc, c);
> > + }
> > + void ma_mov(const ImmGCPtr &ptr, Register dest) {
> > + ma_mov(Imm32(ptr.value), dest);
> > + JS_NOT_REACHED("todo:make gc more sane.");
>
> What does this have to do with GC?
ImmGCPtr is a pointer that the gc needs to know about through some means. I don't know what
that is going to be for ARM yet.
>
> @@ +622,2 @@
> > }
> > void checkCallAlignment() {
>
> Make this #ifdef DEBUG to ensure that we don't emit these checks for release
> code.
I think this is only called from within #ifdef DEBUG.
>
> @@ +136,1 @@
> > // random, and if it ends up being bad, we can use actual heuristics later.
>
> I think that might end up being bad. It'll do for now, though.
I think it is actually completely wrong. r12 is being grabbed as a scratch register.
we shouldn't be pushing to the stack at all. We should just use the scratch register
as a scratch register to break cycles.
>
> @@ +157,5 @@
> > if (pushedAtDoubleSpill_ == -1) {
> > masm.reserveStack(sizeof(double));
> > pushedAtDoubleSpill_ = masm.framePushed();
> > }
> > + //masm.ma_vstr(spilledFloatReg_, doubleSpillSlot().EDTRAddr());
>
> Why is this commented out? Isn't that broken?
>
It is broken, but ma_vstr isn't implemented yet either.
the issue was that it says EDTRAddr, which is for the extended dtr core instructions, not for the
vfp load/store instructions. When I'd realized that those were completely unhandled, I decided to fix it later
JS_NOT_REACHED will be inserted.
> @@ +319,2 @@
> > if (pushedAtSpill_ != -1 && spilledReg_ != InvalidReg)
> > + masm.ma_ldr(spillSlot().toDTRAddr(), spilledReg_);
>
> Curly braces.
>
> ::: js/src/ion/arm/Trampoline-arm.cpp
> @@ +74,3 @@
> > * =r0 =r1 =r2 =r3
> > * ...using standard cdecl calling convention
> > * ...cdecl has no business existing on ARM
>
> CDECL isn't a term that's used in ARM (as far as I've seen).
>
> Note that 'this' takes the first argument in C++ member functions. I assume
> that's not important here.
>
Oh, what is the name of the ARM calling convention. Is it just "the EABI calling convention"?
> @@ +127,2 @@
> >
> > // loop over arguments, copying them over
>
> Why do we do this? Can't we access the arguments in-place, above the
> preserved registers?
>
this is looping over the arguments in the argv vector, copying them into the stack frame of the Ion function
that is being called (as per the ion abi). I'll update the comment.
> @@ +194,2 @@
> > for (uint32 i = 0; i < Registers::Total; i++)
> > + masm.transferReg(Register::FromCode(i));
>
> From the PUSH documentation:
>
> The SP and PC can be in the list in ARM instructions, but not in Thumb
> instructions. However:
> * ARM deprecates the use of ARM instructions that include the SP or the
> PC in the list
> * if the SP is in the list, and it is not the lowest-numbered register in
> the list, the instruction stores an unknown value for the SP.
>
I believe the stack is pushed so that we have a one to one mapping between locations on the stack and
register numbers (if we need to reference a value stored in r14, it should be located at regs[14], not regs[13])
pc was then pushed so I didn't have to worry about fixing alignment later. As long as demons won't run out my nose, I'm okay with this.
> @@ +225,5 @@
> > + masm.as_sub(sp, sp, Imm8(4));
> > +
> > + // set the old (4-byte aligned) value of the sp as the first argument
> > + masm.setABIArg(0, r0);
> > +
>
> Why not just push a dummy register along with the frameClass value? The
> following sequence will suffice, if I've understood correctly:
>
> PUSH {r4,ip} // r4 (frameClass) goes into the _lower_ address here.
> MOV r0, sp
this would have to be ADD r0, sp, 4. For the most part, I was concerned with getting this written.
I'll experiment with a bunch of variants later. Also, r4 would need to go into the higher address, since
that is where it is going currently. That being said, I haven't tested the bailout code yet, so I'll wait until I know it has a chance of being correct.
Assignee | ||
Comment 8•13 years ago
|
||
I suspect that there are few nits about comments that haven't been addressed (capitalization, punctuation, etc.), but all other things should have been fixed, or addressed in my response to your comment. I'm mostly flagging you for review to ensure that everything got fixed up, and possibly to push. Evidently, I should be able to push with level 1 commit access, but if that doesn't actually work, then a push should be useful.
Attachment #565095 -
Flags: review?(Jacob.Bramley)
Comment 9•13 years ago
|
||
Sorry for the length of these replies. Long patches demand
proportionally long responses! Most changes are good but I think one or
two are missing. (Maybe a missing qrefresh or something.)
(In reply to Marty Rosenberg [:Marty] from comment #7)
> (In reply to Jacob Bramley [:jbramley] from comment #6)
>
> yeah. "too lazy" was actually my response to "what message should I use to
> indicate this hasn't been implemented?"
> "eh, I'm too lazy to think of one"
> thankfully, sed exists.
I think you missed a few.
> > @@ +135,5 @@
> > > {
> > > + ASSERT(m_buffer.sizeOfConstantPool() == 0);
> > > + memcpy(buffer, m_buffer.data(), m_buffer.size());
> > > + // whoops, can't call this; it is ARMAssembler-specific.
> > > + //fixUpOffsets(buffer);
> >
> > It might be ARMAssembler-specific, but it's really important. It's used for
> > patching relative jumps. Nasty things will happen if you don't call it.
> >
> I agree. Right now those aren't actually generated, so it is a moot point.
> I will add a comment reminding me to re-implement/copy that function when
> things
> inevitably break.
I don't see a comment.
> > Tip: ARM condition codes are encoded in four bits. The least significant of
> > those bits can be inverted to give you the opposite condition. This is true
> > for all conditions except 'AL' (always), which you should assert for anyway.
> >
> yeah. I'm actually curious if gcc could do that optimization based on the
> switch.
Probably not, but it's worth trying it to find out. The 'default' case
might throw it off course, and it'd be hard to convey to GCC that you
don't need a special case for AL (other than an assertion).
> > @@ +319,5 @@
> > > +
> > > +bool js::ion::condsAreSafe(ALUOp op) {
> > > + // even when we are setting condition codes, sametimes the
> > > + // codes only depend on the value of the result (N,Z)
> > > + // other times, (such as add)
> >
> > This comment doesn't make the purpose of the function clear. Is it something
> > to do with logical operations not updating the carry (C) and overflow (V)
> > flags? Some condition codes are still safe to use after those.
> >
> will fix.
The comment hasn't changed. It just looks like the end of it is missing.
> > @@ +115,5 @@
> > > + enum RegType {
> > > + Double,
> > > + Single,
> > > + UInt,
> > > + Int
> >
> > Since UInt and Int are indistinguishable from the public interface, why
> > separate them now?
> >
> That is purely for vcvt, where it can do a signed or unsigned conversion.
> granted, we will likely never generate an unsigned conversion, but if we can
> think
> of some way of using it, this will make our lives easier. I can add a
> comment to this effect
> if you'd like.
Oh, I see. In that case, perhaps the following scheme would be clearer:
enum { ... UInt, SInt }
bool isInt() {return UInt | SInt;}
bool isSInt() {return SInt;}
bool isUInt() {return UInt;}
That way, we can use isInt if we don't care and an explicit signed or
unsigned variant if we do.
> > @@ +121,5 @@
> > > +private:
> > > + RegType kind:2;
> > > + // ARM doesn't have more than 32 registers...
> > > + // don't take more bits than we'll need.
> > > + int _code:5;
> >
> > This isn't enough to address D16-D31. That doesn't matter if you don't
> > support them, but you mention the lack of S registers for D16-D31 in
> > singleOverlay.
> So it isn't.
It hasn't changed. I don't think it really matters too much for now
anyway, but did you mean to change it?
> > Yes, you get another register for shifting (Rs). It's another 4-bit field
> that was more of "this bit is answering said question"
Oh, sorry, I misread it I think.
> > @@ +418,5 @@
> > > + Imm12Data(uint32 imm) : data(imm) { JS_ASSERT(data == imm); }
> > > + uint32 toInt() { return data; }
> > > + };
> > > +
> > > + struct RSI {
> >
> > What's this for? What does RSI mean? If it's "register-shifted immediate",
> > then why does it take an immediate argument?
> >
> It is the immediate in a register shifted immediate. I couldn't think of a
> better name
> specifically for the immediate.
How about RIS? (Register, Immediate-Shifted.) This also implies RRS.
Normally the difference would be trivial, but the ARM documentation
talks a lot about register-shifted-immediates and
register-shifted-registers.
> > The ARMv5T CLZ instruction returns 32 if you call it with 0. I suggest using
> > that, rather than __builtin_clz (which is undefined for 0), but that
> > requires a change in jsbit.h. (Note that __builtin_clz probably uses CLZ
> > anyway.)
> >
> Yes I'm hoping that it always compiles to a single CLZ on recent enough arms.
> The only reason this should be faster is that CLZ should be a single cycle
> (or maybe 2).
It probably does. What I'm saying is that asm("clz %out %in\n") is more
certain about this. In any case, it doesn't matter too much.
> > @@ +463,5 @@
> > > + if (imm == 0)
> > > + return datastore::Imm8mData(0, 0);
> > > + int left = js_bitscan_clz32(imm) & 30;
> > > + // if the first bit set
> > > + if (left >= 24)
> >
> > "left >= 24" doesn't check the first bit. That would be "left >= 16", surely.
> not too sure what that comment was supposed to mean, but it is checking to
> see if it is
> a "simple" case that can be encoded with a rotate of 0.
The comment is still there. What you wrote here in the bug would be ideal.
> > @@ +496,5 @@
> > > + : Operand2(datastore::Reg(rm.code(), type, 0, shiftImm.toInt())) {}
> > > + Op2Reg(Register rm, ShiftType type, datastore::RSR shiftReg)
> > > + : Operand2(datastore::Reg(rm.code(), type, 1, shiftReg.toInt())) {}
> > > +};
> > > +class O2RegShiftImm : public Op2Reg {
> >
> > Regsiter-shifted immediate doesn't make sense. Is this an immediate-shifted
> > register?
> >
> Yes. This was me attempting to manually namespace by suffixing the more
> general case
> with any relevant specifics. I read it as "Operand 2, Register, Shifted by
> an Immediate".
> I'm open to suggestions for a better naming scheme.
How about "Op2, Register, Immediate Shifted" (Op2RegImmShift)? (Just as I
suggested for RSI/RIS above.) It's not a better scheme by itself, but it
avoids confusion with ARM documentation.
You can infer Op2RegRegShift from that too.
> > @@ +701,5 @@
> > > + GreaterThanOrEqual = JSC::ARMAssembler::GE,
> > > + LessThan = JSC::ARMAssembler::LT,
> > > + LessThanOrEqual = JSC::ARMAssembler::LE,
> > > + Overflow = JSC::ARMAssembler::VS,
> > > + Signed = JSC::ARMAssembler::MI,
> >
> > I dislike the name "Signed". It really means "negative". A signed value can
> > still be positive.
> >
> Those names are for compatibility with x86/indep code. (names and order
> copied from Assembler-x86-shared.h)
That's very unfortunate, but perhaps not practical to fix here.
> > Also, again, calling code could be more readable if this were split into
> > as_dtr8s, as_dtr16u, as_dtr16s and as_dtr64.
> >
> I'll write those as wrappers. This was my intention from the beginning.
Ok. Will that go into a separate patch?
> > @@ +1056,5 @@
> > > +
> > > + // VFP instructions!
> > > + // Float only variants
> > > + void as_vfp_float(VFPRegister vd, VFPRegister vn, VFPRegister vm,
> > > + VFPOp op, Condition c = Always)
> >
> > What's this function for?
> >
> All of the 3-register vfp operations like vadd, vsub, etc. It is meant to
> be analogous to as_alu.
> Any name with "alu" in it didn't really sound right.
Ok, that makes sense, but what are as_vadd and so on for? Will they just
be wrappers?
Also, 'Float only variants' sounds like it's single-precision.
> The massive overloading of vmov is one of the things that I like the least
> about UAL.
Yes, me too, to be honest.
> When actually writing assembly, It is somewhat easy to determine what
> the instruction is doing based on the "sigils", (r)0, (s)2, (d)3, etc.
> However, when writing c++ code, it is not always as obvious. Also
> because (as far as I can tell), the encodings for the variants of vmov
> are not encoded in similar ways.
Ok. I'd still prefer to have the function name match the mnemonics, but
I understand your reasoning. Just make sure the disassembly spew prints
UAL :-)
> Is that what gdb uses for breakpoints? I don't think that was used in nitro.
> In fact, I'm pretty sure I just
> copied the constant from nitro.
Yes, GDB uses it for breakpoints (I think), but it gets confused if it
sees a BKPT that it didn't put there. I thought I'd changed the
instruction in Nitro, but perhaps not. It's not critical for this patch
anyway.
> >
> > ::: js/src/ion/arm/CodeGenerator-arm.cpp
> > @@ +113,5 @@
> > > LBlock *ifTrue = test->ifTrue()->lir();
> > > LBlock *ifFalse = test->ifFalse()->lir();
> > >
> > > // Test the operand
> > > + masm.ma_tst(ToRegister(opd), ToRegister(opd));
> >
> > ma_cmp would be preferable.
Actually this occurs elsewhere too, but I didn't pick them up in my
review. In general, CMP is better than TST if you just want to check
equality (or non-equality). It's not a big problem though so don't worry
about it too much for now.
> > @@ +337,5 @@
> > >
> > > bool
> > > +CodeGeneratorARM::visitDivI(LDivI *ins)
> > > +{
> > > + JS_NOT_REACHED("codegen for DIVI NYI");
> >
> > On cores without UDIV/SDIV, integer division is best handled by a call to a
> > helper function,
> >
> I was planning on generating it inline, and falling back to generating a
> function if there are enough occurances
> of it in the function.
I wouldn't bother. The implementation in glibc (?) is probably well
optimized and writing your own probably isn't time-efficient (in that
you'd get bigger gains by optimizing elsewhere). Having said that, most
divisions will result in a double value anyway, and I'm not sure exactly
how we'll use integer division in IM. Integer division by a constant,
for example, is easy to optimize, and that would surely benefit from
being inline.
> > @@ +622,2 @@
> > > }
> > > void checkCallAlignment() {
> >
> > Make this #ifdef DEBUG to ensure that we don't emit these checks for release
> > code.
> I think this is only called from within #ifdef DEBUG.
Yes, I know, and that is why it's still valid to wrap the whole function
in #ifdef DEBUG. That way, if it's accidentally called outside of a
DEBUG-only region, the build will fail and someone will have to make a
conscious decision about whether they want the alignment check in their
release code or not.
> > ::: js/src/ion/arm/Trampoline-arm.cpp
> > @@ +74,3 @@
> > > * =r0 =r1 =r2 =r3
> > > * ...using standard cdecl calling convention
> > > * ...cdecl has no business existing on ARM
> >
> > CDECL isn't a term that's used in ARM (as far as I've seen).
> >
> > Note that 'this' takes the first argument in C++ member functions. I assume
> > that's not important here.
> >
> Oh, what is the name of the ARM calling convention. Is it just "the EABI
> calling convention"?
Yes, exactly. I'd like to say that there is only one, but here's the
appoximate state of things (in approximate chronological order):
* 'Legacy ABI', 'WinCE ABI': Not actually the same, but similar
enough for the purposes of JIT-compiling function calls. Since
Fennec stopped supporting Windows Mobile, we just ignore this
variant.
* 'EABI', 'AAPCS, softfp': What we're using today.
* 'EABI', 'AAPCS-VFP, hard(fp)': A variant of EABI that puts FP
arguments in d0-d7. It doesn't work without VFP hardware (which
wasn't always present on older platforms), but it offers better
performance than softfp. The new Debian ARMv7 target will use
hardfp and it's likely to get wider usage elsewhere too.
Simple, eh?
> > @@ +194,2 @@
> > > for (uint32 i = 0; i < Registers::Total; i++)
> > > + masm.transferReg(Register::FromCode(i));
> >
> > From the PUSH documentation:
> >
> > The SP and PC can be in the list in ARM instructions, but not in Thumb
> > instructions. However:
> > * ARM deprecates the use of ARM instructions that include the SP or the
> > PC in the list
> > * if the SP is in the list, and it is not the lowest-numbered register in
> > the list, the instruction stores an unknown value for the SP.
> >
> I believe the stack is pushed so that we have a one to one mapping between
> locations on the stack and
> register numbers (if we need to reference a value stored in r14, it should
> be located at regs[14], not regs[13])
> pc was then pushed so I didn't have to worry about fixing alignment later.
> As long as demons won't run out my nose, I'm okay with this.
Ok, well as long as it's clear that regs[13] cannot be used, that's Ok.
For safety, we could put 0xdeadbeef in there or something like that (in
debug builds only). Also, we need to be sure not to pop into SP.
On the other hand, this will cause a problem for FP registers. It's one
thing pushing all the integer registers for convenience, but there are
potentially 32 D registers, which will require 256 bytes of stack space.
That really hurts! Only D7-D15 actually need to be preserved. (I realize
that the FP push is disabled for now.)
> > @@ +225,5 @@
> > > + masm.as_sub(sp, sp, Imm8(4));
> > > +
> > > + // set the old (4-byte aligned) value of the sp as the first argument
> > > + masm.setABIArg(0, r0);
> > > +
> >
> > Why not just push a dummy register along with the frameClass value? The
> > following sequence will suffice, if I've understood correctly:
> >
> > PUSH {r4,ip} // r4 (frameClass) goes into the _lower_ address here.
> > MOV r0, sp
>
> this would have to be ADD r0, sp, 4.
That could work too. My example was just using IP by convention, I
didn't realize that the actual stack position was important.
> For the most part, I was concerned with getting this written. I'll
> experiment with a bunch of variants later.
Fair enough.
Assignee | ||
Updated•13 years ago
|
Attachment #564320 -
Flags: review?(Jacob.Bramley)
Assignee | ||
Updated•13 years ago
|
Attachment #565095 -
Flags: review?(Jacob.Bramley)
Assignee | ||
Comment 10•13 years ago
|
||
Most of those were me simply not saving changes in a file, then not noticing due to the length of the comments
Attachment #563692 -
Attachment is obsolete: true
Attachment #564320 -
Attachment is obsolete: true
Attachment #565095 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #565363 -
Flags: review?(Jacob.Bramley)
Comment 11•13 years ago
|
||
Comment on attachment 565363 [details] [diff] [review]
fix second round of complaints
Review of attachment 565363 [details] [diff] [review]:
-----------------------------------------------------------------
There's one minor nit left to pick (below), but otherwise it looks good!
::: js/src/ion/arm/Assembler-arm.h
@@ +484,5 @@
> + return datastore::Imm8mData(0, 0);
> + int left = js_bitscan_clz32(imm) & 30;
> + // See if imm is a simple value that can be encoded with a rotate of 0.
> + // This is effectively imm <= 0xff, but I assume this can be optimized
> + // more
For the record, I doubt it makes any difference. Both variables are live and both constants are encodable in a single Operand 2.
::: js/src/ion/arm/MacroAssembler-arm.h
@@ +631,1 @@
> }
I meant put the #ifdef around the whole function, so the compile would fail if someone tried to use it in release code:
#ifdef DEBUG
void checkCallAlignment() {
...
}
#endif
Attachment #565363 -
Flags: review?(Jacob.Bramley) → review+
Assignee | ||
Comment 12•13 years ago
|
||
Comment 13•13 years ago
|
||
Assignee: general → mrosenberg
Status: NEW → ASSIGNED
Assignee | ||
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•