527083 - Nanojit should have 8-bit and 16-bit store operations

Assignee

Description

•

16 years ago

Nanojit currently offers 8-bit and 16-bit load operations (ldcb and ldcs), but no equivalent store operations. Code that wants to manipulate bytes and shorts (eg code generated by the Adobe Alchemy tool) needs these operations to run efficiently.

Steven Johnson

Assignee

Updated

•

16 years ago

Blocks: 458279

Steven Johnson

Assignee

Comment 1

•

16 years ago

Attached patch Patch #1 (obsolete) — Details — Splinter Review

Initial proof-of-concept patch, with i386 backend implementation. Also includes features requested in bug 527085 and bug 527246, as it seemed simpler to implement them all simultaneously. I'm guessing this design may need some iteration; it's tested and functional, but I'm not sure if all the design decisions are the best ones. Notes: -- defines new instructions for load byte (sign extend or zero extend), short (sign extend or zero extend), store byte, store short, load 32-bit float, store 32-bit float. -- the stores assume that there aren't alignment issues on architectures that require it (ie, if the backend is requested to load/store a short or word at an inappropriate address, an access fault will result) -- a backend declares itself capable of these new ops by defining NJ_EXPANDED_LOADSTORE_SUPPORTED to 1. (the ops are defined on all platforms, but may generate incorrect code on non-supporting platforms; it is up to the caller to avoid using them incorrectly.) It is anticipated that all backends will eventually support this, at which time this define will be removed. -- the signature of LirWriter::insStorei() was changed, to include an LOpcode; this means that existing embedders need to update any overrides appropriately. (I'm not happy about this but it seemed the cleanest long-term solution, especially considering that insLoad() already did this.) LirWriter::ins_store() was added as a (non-overridable) helper method to provide the legacy behavior of infer-op-from-arg-size. I don't (yet) have any standalone testcases for the new operations; the testing I've done so far is in the context of Tamarin (I'm not sure what the unit-test approach for nanojit-central will be, opinions welcome).

Assignee: nobody → stejohns

Attachment #412085 - Flags: review?(edwsmith)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412085 - Flags: review?(rreitmai)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412085 - Flags: review?(graydon)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412085 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Comment 2

•

16 years ago

Initial comments after a first read: - It's unclear to me the cases where these new instructions are useful are unclear to me. For this bug you mention Adobe Alchemy, for the other two bugs you just say "it would be useful". Now that you have a patch, can you report any results from uses within Tamarin, eg. "program X runs faster" or "code generation is much easier in this case"? - For loads, not all combinations of size, CSEability and extension-kind are present. This isn't good.

Steven Johnson

Assignee

Comment 3

•

16 years ago

(In reply to comment #2) > - It's unclear to me the cases where these new instructions are useful are > unclear to me. For this bug you mention Adobe Alchemy, for the other two bugs > you just say "it would be useful". Now that you have a patch, can you report > any results from uses within Tamarin, eg. "program X runs faster" or "code > generation is much easier in this case"? Good point, I'll get some concrete numbers later today and post them. Short version is that it's a major speedup for code generators (like Alchemy) that do frequent operations of this kind, since the alternative is to call out to a function that does a single operation (load or store)... > - For loads, not all combinations of size, CSEability and extension-kind are > present. This isn't good. This is true; I only added the ones needed for my immediate purposes. In my defense, though, that was already the case (we had LIR_ldcb and LIR_ldsb but no other load/store ops for byte or short)....

Edwin Smith

Comment 4

•

16 years ago

(In reply to comment #2) > Initial comments after a first read: > > - It's unclear to me the cases where these new instructions are useful are > unclear to me. For this bug you mention Adobe Alchemy, for the other two bugs > you just say "it would be useful". Now that you have a patch, can you report > any results from uses within Tamarin, eg. "program X runs faster" or "code > generation is much easier in this case"? The motivation should have been more clear. Tamarin supports a series of opcodes in its bytecode instruction set for loading and storing little-endian machine values (int8, int16, int32, float32, float64) to its ByteArray class. They are 0-extending by definition. We also have three sign-extension bytecodes (1->32, 8->32, 16->32). collectively these opcodes are called MOPS (memory opcodes) and they're used heavily by a tool called Alchemy, which translates C++ to ABC. (using an LLVM backend). Naturally we want to optimize the snot out of these operations, and currently the int8, int16, and float32 operations are implemented with handlers because we can't express them in LIR. so this bug aims to add the extra opcodes necessary to express them in LIR. A secondary reason to add them is to access 8-bit and 16-bit fields in VM structures. Currently we only have a handful of use cases, and we artificially make the field 32-bit, if it needs to be accessed from LIR. > > - For loads, not all combinations of size, CSEability and extension-kind are > present. This isn't good. for our needs at the moment, we don't need the cse-ability. we certianly can add those opcodes for symmetry, however. Another open question is how nanojit should deal with sign extension in LIR. options: 1. loads are sized (8/16/32), its an error to mismatch size, and we add explicit zero-extension and sign-extension opcodes plus a float32->double opcode. precedent: 32bit loads plus u2q/i2q for zero/sign extension). 2. small loads are 32-bit, zero-extended by definition, add new sign-extension opcodes at least for the 8->32 and 16->32 bit use cases. load-float32 implicitly extends to double64. precedent: ldc8/16. 3. zero and sign extending loads for 8/16. (and cse variants) To me (2) is tightest, but (3) is closer to what cpu's really support, and the pattern matching required for instruction selction in (2), to take advantage of zero/sign-extending cpu ops, isn't obviously easy because the sign-extension forms the root of the expression tree and is seen before the load. (pattern matching would have to guard against moving the load past any stores).

Nicholas Nethercote [inactive]

Comment 5

•

16 years ago

I think separate sign/zero-extend opcodes is a bad idea. So I guess I'm favouring (3). I'm imagining: ldz8, lds8 ldz16, lds16 ld32 ld64 ld32f (which extends the double32 giving a double64 result) (or whatever names we end up choosing). And I guess CSEable variants of each.

Edwin Smith

Comment 6

•

16 years ago

I checked ARM and x86, both support both zero-extending and sign-extending 8bit and 16bit loads. So, I agree, (3) maps best to the most important ISA's. If we keep with the naming conventions proposed here: https://developer.mozilla.org/en/NanojitMerge# which are b/s/i/q for 8/16/32/64 then we get: non-cse with-cse 8 ldzb,ldsb ldczb,ldcsb # ld-[cse]-[zero|signed]-byte 16 ldzs,ldss ldczs,ldcss # ld-[cse]-[zero|signed]-short 32 ldi ldci # ld-[cse]-int 64 ldq ldcq # ld-[cse]-quad for symmetry, we could have zero and sign-extending 32->64 loads as well, but a) scope creep, and b) it might not mean we should drop u2q and i2q, because at least one way we use i2q/u2q is to sanitize values passed into functions in registers or returned from calls in registers (there is no implied load). the implication for the tamarin MOPS opcodes is we will want to have something that can fold OP_li8 + OP_sxi8 into LIR_ldsb. The frontend must deal with pattern matching while avoiding moving loads-past-stores, which is an okay tradeoff to make LIR match popular ISA's.

Steven Johnson

Assignee

Comment 7

•

16 years ago

(In reply to comment #6) > So, I agree, (3) maps best to the most important ISA's. +1 here (unsurprisingly since that's how I've implemented it so far), so it seems we have a consensus. > If we keep with the naming conventions proposed here: > https://developer.mozilla.org/en/NanojitMerge# > > which are b/s/i/q for 8/16/32/64 then we get: > > non-cse with-cse > 8 ldzb,ldsb ldczb,ldcsb # ld-[cse]-[zero|signed]-byte > 16 ldzs,ldss ldczs,ldcss # ld-[cse]-[zero|signed]-short > 32 ldi ldci # ld-[cse]-int > 64 ldq ldcq # ld-[cse]-quad Sounds good, I'll revise the patch accordingly and add the missing opcodes. Is there a precedent for the load-float-to-double and store-double-to-float opcodes, in terms of naming convention? I used "pf" to indicate "packed float" in this patch; not totally happy with that but I'm open for better suggestions. > for symmetry, we could have zero and sign-extending 32->64 loads as well Yeah, scope creep IMHO. Unless there's philosophical objection let's postpone those. > the implication for the tamarin MOPS opcodes is we will want to have something > that can fold OP_li8 + OP_sxi8 into LIR_ldsb. Yep, completely acceptable because our pattern tends to be highly predictable -- virtually always is those two in immediate sequence. (I already have such an optimization in place in my local branch.)

Edwin Smith

Comment 8

•

16 years ago

review comments (ahead of updated patch) to ease migration, how about: insStorei() doesn't change signature, but becomes a convenience function on LirWriter that calls a new function: insStore(op, ...) which takes the opcode. All the subclasses of LirWriter that override insStorei will need to change, to override insStore() instead. rationale: Its time to deprecate insStorei (the name), and the fact that it sniffs its operands to determine the store opcode (and the sniffing doesn't happen until the instruction is written to LirBuffer... madness!). this pattern has already been all-but-expunged from nanojit. ordinarily i'd say this is scope creep, but i can't see how to add these new store opcodes without facing up to the issue. please add the proposed new opcode names and opcodes to the table here: https://developer.mozilla.org/en/NanojitMerge#LIR_opcode_assignments for 8/16/float32 stores I suggest stb/sts/st32f (arg, maybe we should use 'd' for doubles everywhere, and 'f' for floats, but again, out of scope). the actual code changes for x86 seem mostly fine. are movsx16 and movzx16 intel assembler mnemonics? we try to follow the cpu mfgr notational conventions.

Edwin Smith

Updated

•

16 years ago

Attachment #412085 - Flags: review?(edwsmith) → review-

Steven Johnson

Assignee

Comment 9

•

16 years ago

(In reply to comment #8) > are movsx16 and movzx16 intel assembler mnemonics? we try to follow the cpu > mfgr notational conventions. No, IIRC it uses movzx/movsx, with size indicated by the operand. I'll see if I can fit something closer to the proper syntax into our existing infrastructure. (If it's painful to do so, however, it may be worth cheating a bit.)

Edwin Smith

Comment 10

•

16 years ago

sure, cheat if you have to. i think the 16bit and 8bit intel notation would be something like "mov WORD/BYTE PTR disp(off), src" which is ugly.

Steven Johnson

Assignee

Comment 11

•

16 years ago

(In reply to comment #8) > please add the proposed new opcode names and opcodes to the table here: > https://developer.mozilla.org/en/NanojitMerge#LIR_opcode_assignments Seems like I should wait 'till the changes are reviewed (and landed?) before updating the wiki. Also, I think the changes you outline are good in that they regularize the set, it does change existing ones (eg LIR_ld -> LIR_ldi) so existing code will see a bit of churn if we do this. (IMHO it's worth the short-term pain). We also need a cse version of the load32float op ... I vote for ldc32f. Re: cheating on opnames, in this case, it's not clear to me that the "official" intel forms are better; they are both longer and clunkier. If exact assembler compliance is desired I'll work something out, but for purposes of debugging verbose output, I think the current clunky hack is adequate (probably with adding a comment that we know it's wrong...)

Nicholas Nethercote [inactive]

Comment 12

•

16 years ago

See also bug 504506, aka. "The Great Opcode Renaming". The table on the wiki is totally out of date and several columns can be removed now that TM and TR have merged. I'll fix it up later today.

Steven Johnson

Assignee

Comment 13

•

16 years ago

Attached patch Patch #2 (obsolete) — Details — Splinter Review

Revised version of the first patch, incorporating suggestions from Edwin and Nicholas.

Attachment #412085 - Attachment is obsolete: true

Attachment #412999 - Flags: review?(edwsmith)

Attachment #412085 - Flags: review?(rreitmai)

Attachment #412085 - Flags: review?(nnethercote)

Attachment #412085 - Flags: review?(graydon)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412999 - Flags: review?(rreitmai)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412999 - Flags: review?(graydon)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #412999 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Comment 14

•

16 years ago

(In reply to comment #12) > > The table on the wiki is totally out of date and several columns can be removed > now that TM and TR have merged. I'll fix it up later today. Done. It's now 2 columns instead of 4 which makes it much easier to read.

Steven Johnson

Assignee

Comment 15

•

16 years ago

(In reply to comment #12) > See also bug 504506, aka. "The Great Opcode Renaming". That bug doesn't seem to have much action since July -- is it still an active concern?

Graydon Hoare :graydon

Comment 16

•

16 years ago

Comment on attachment 412999 [details] [diff] [review] Patch #2 Needs splitting into NJ-specific and TR/TM-specific pieces. Also needs modifications to lirasm.

Nicholas Nethercote [inactive]

Comment 17

•

16 years ago

(In reply to comment #15) > (In reply to comment #12) > > See also bug 504506, aka. "The Great Opcode Renaming". > > That bug doesn't seem to have much action since July -- is it still an active > concern? Yes. We decided to leave it until after the merge, which took almost four months.

Steven Johnson

Assignee

Comment 18

•

16 years ago

(In reply to comment #16) > (From update of attachment 412999 [details] [diff] [review]) > Needs splitting into NJ-specific and TR/TM-specific pieces. Oops. My bad. Is this patch reviewable as-is or do I need to resubmit? > Also needs modifications to lirasm. Hmm... it appears that lirasm doesn't get integrated into TR just yet -- Rick/Ed, what's the story, is it going to get moved in or do I need to pull directly from nj-central?

Graydon Hoare :graydon

Comment 19

•

16 years ago

I can review this portion as-is, but landing should go via nj-central, and for that must carry a lirasm part. If you commit it as-is the nj-central tree[1] will turn red. [1] http://tinderbox.mozilla.org/showbuilds.cgi?tree=Nanojit

Rick Reitmaier

Comment 20

•

16 years ago

(In reply to comment #18) > (In reply to comment #16) > > Hmm... it appears that lirasm doesn't get integrated into TR just yet -- > Rick/Ed, what's the story, is it going to get moved in or do I need to pull > directly from nj-central? We talked about it a bit last week and decided not to integrate it into TR. Instead we can just use the one in NC.

Steven Johnson

Assignee

Comment 21

•

16 years ago

(In reply to comment #19) > I can review this portion as-is, but landing should go via nj-central, and for > that must carry a lirasm part. If you commit it as-is the nj-central tree[1] > will turn red. OK, good to know -- I haven't had to do any NJ changes since the merge + nj-central, so I have to get up to speed on that. Is it necessary for me to revise TM as well, or can I rely on mozillans to make the necessary changes (assuming review+ eventually comes from your side)?

Graydon Hoare :graydon

Comment 22

•

16 years ago

A TM patch is appreciated but not required if we r+ it. Likewise we'll wait on your r+ for anything that implies a necessary TR patch (whether or not we compose that patch). Whether or not such an NC change gets an r+ without an accompanying TR/TM patch depends on level of confidence that it can be made on-the-fly and in-a-hurry: once you land something on NC that requires a TM or TR patch, that TM or TR patch becomes a roadblock to further integration of other NC landings that follow. So it's generally good to have all the patches lined up before doing the NC landing.

Steven Johnson

Assignee

Comment 23

•

16 years ago

(In reply to comment #22) > So it's generally good to have all the patches lined up before doing > the NC landing. Hmm... the name changes in this patch are pretty mechanical, so I don't think it's an onerous burden on the integrator. That said, I could always avoid renaming the existing instructions (see comments above) on the assumption that a great renaming would happen in a later pass.

Edwin Smith

Comment 24

•

16 years ago

I would recommend the new opcodes be named according to the Great Migration, but not to rename anything that already exists as part of this work. too much churn.

Nicholas Nethercote [inactive]

Comment 25

•

16 years ago

https://developer.mozilla.org/en/NanojitMerge has the full instructions on landing patches in the post-merge world, BTW. Worth reading, there are a number of non-obvious aspects.

Steven Johnson

Assignee

Comment 26

•

16 years ago

(In reply to comment #24) > I would recommend the new opcodes be named according to the Great Migration, > but not to rename anything that already exists as part of this work. too much > churn. Fair enough -- I'll resubmit a new patch that doesn't change the existing names (and also is split into chunks per Graydon's suggestion). (In reply to comment #25) > https://developer.mozilla.org/en/NanojitMerge has the full instructions on > landing patches in the post-merge world, BTW. Worth reading, there are a > number of non-obvious aspects. Thanks, will do.

Steven Johnson

Assignee

Comment 27

•

16 years ago

Attached patch Patch #3 (obsolete) — Details — Splinter Review

Principle differences from previous patch: -- patch is now against current nj-central (vs tamarin-redux) -- no renaming of existing opcodes at all (just adding new ones) -- attempting to use new opcodes on a backend that doesn't support them will NanoAssert(0) -- added new opcodes to the lirasm --random list (and i386 version passes with them). Note that I deliberately didn't add any specific lirasm testcases for the new opcodes yet, as those tests would fail on targets that don't support them yet (ie everything but i386).

Attachment #412999 - Attachment is obsolete: true

Attachment #413221 - Flags: review?(edwsmith)

Attachment #412999 - Flags: review?(rreitmai)

Attachment #412999 - Flags: review?(nnethercote)

Attachment #412999 - Flags: review?(graydon)

Attachment #412999 - Flags: review?(edwsmith)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #413221 - Flags: review?(rreitmai)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #413221 - Flags: review?(graydon)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #413221 - Flags: review?(nnethercote)

Edwin Smith

Comment 28

•

16 years ago

Comment on attachment 413221 [details] [diff] [review] Patch #3 Nativei386.cpp:543 - yes, XORPD is still needed to zero the upper 64bits; however, XORPS is actually preferred (same effect, one less byte). (this should be fixed everywhere, out of scope for this bug) I havent reviewd the encodings of the new instructions. a good method is to put an assert after each new kind of instruction, run a test, let the isntruction get generated, then disassemble starting at _nIns (eg in gdb: disass _nIns _nIns+10) and convince yourself you're getting the proper instruction (check the registers, etc). once it looks good, remove the assert. hopefully all the new instructions get exersized by the test suite. as for lirasm, maybe --random should be ifdefed as well so we only use the new opcodes on cpu's that support them?

Attachment #413221 - Flags: review?(edwsmith) → review+

Nicholas Nethercote [inactive]

Comment 29

•

16 years ago

Comment on attachment 413221 [details] [diff] [review] Patch #3 >+ case LIR_ldzb: >+ case LIR_ldzs: >+ case LIR_ldsb: >+ case LIR_ldss: >+ case LIR_ldcsb: >+ case LIR_ldcss: >+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED >+ // shouldn't issue these instructions for backends that >+ // don't support them yet >+ NanoAssert(0); >+#endif Better would be to avoid the #if here and make each back-end check for the new opcodes in asm_ld() and abort themselves if they can't handle them, rather than Assembler doing it on their behalf. Ie. we've already got arch-specific code in each backend, so let's avoid polluting non-arch-specific code with this kind of stuff where possible. >@@ -1047,6 +1058,14 @@ > asm_ld(ins); > break; > } >+ >+ case LIR_ld32f: >+ case LIR_ldc32f: >+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED >+ // shouldn't issue these instructions for backends that >+ // don't support them yet >+ NanoAssert(0); >+#endif Same here. >@@ -1134,19 +1153,36 @@ > asm_promote(ins); > break; > } >+ case LIR_stb: >+ case LIR_sts: >+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED >+ // shouldn't issue these instructions for backends that >+ // don't support them yet >+ NanoAssert(0); >+#endif And here. > case LIR_sti: > { > countlir_st(); >+#if NJ_EXPANDED_LOADSTORE_SUPPORTED >+ asm_store32(ins->oprnd1(), ins->disp(), ins->oprnd2(), op); >+#else > asm_store32(ins->oprnd1(), ins->disp(), ins->oprnd2()); >+#endif > break; > } Just add the parameter to all backends now, don't have two versions of asm_store32(). >+ case LIR_st32f: >+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED >+ // shouldn't issue these instructions for backends that >+ // don't support them yet >+ NanoAssert(0); >+#endif As above. >@@ -1154,7 +1190,11 @@ > } > else > { >+#if NJ_EXPANDED_LOADSTORE_SUPPORTED >+ asm_store64(value, dr, base, op); >+#else > asm_store64(value, dr, base); >+#endif Again, just one version please. >diff -r 018f529d8030 nanojit/Assembler.h >--- a/nanojit/Assembler.h Wed Nov 18 16:07:09 2009 -0500 >+++ b/nanojit/Assembler.h Wed Nov 18 16:38:01 2009 -0800 >@@ -315,8 +315,13 @@ > NIns* asm_exit(LInsp guard); > NIns* asm_leave_trace(LInsp guard); > void asm_qjoin(LIns *ins); >+ #if NJ_EXPANDED_LOADSTORE_SUPPORTED >+ void asm_store32(LIns *val, int d, LIns *base, LOpcode op = LIR_sti); >+ void asm_store64(LIns *val, int d, LIns *base, LOpcode op = LIR_stqi); Why the default parameter? IMO it makes things less clear, and there are few callers of asm_store32/64 so it doesn't save much typing. >diff -r 018f529d8030 nanojit/LIR.cpp >--- a/nanojit/LIR.cpp Wed Nov 18 16:07:09 2009 -0500 >+++ b/nanojit/LIR.cpp Wed Nov 18 16:38:01 2009 -0800 >@@ -245,9 +245,8 @@ > return startOfRoom; > } > >- LInsp LirBufWriter::insStorei(LInsp val, LInsp base, int32_t d) >+ LInsp LirBufWriter::insStore(LOpcode op, LInsp val, LInsp base, int32_t d) Removing the implicit opcode sniffing is a good idea, but it makes the TM-specific patch tricky -- it's not obvious to me what the type of every insStore() is, indeed in some cases it may depend on the calling context. So I think the patch needs an additional function that does explicit type-sniffing. Something like this: LOpcode storeInsForIns(LInsp ins); You'd call it like this: insStore(storeInsForIns(ins), ins, base, disp); If 'ins' is a LIR_add the result would be LIR_sti, if 'ins' is a LIR_qadd the result would be LIR_stqi, etc. This will gain further use for bug 520714 which will add LIR_ldf and LIR_ldcf. The obvious way is to implement storeInsForIns as a big switch. If there's a performance hit a table-based approach would be better. Eventually I envision that LIRopcode.tbl has a field so that each instruction is annotated with its return type (eg. I32, I64, F64, Void). This could be used by storeInsForIns(), and it could also be used by the LIR type-checker (bug 463137). >@@ -884,6 +883,12 @@ > #endif > } > >+ LIns* LirWriter::insStorei(LIns* value, LIns* base, int32_t d) >+ { >+ LOpcode op = value->isQuad() ? LIR_stqi : LIR_sti; >+ return insStore(op, value, base, d); >+ } With the explicit type-sniffing implemented, insStorei() can be removed. LIns::isLoad() is getting very branchy. Perhaps it could just call isLInsLd(), which is equivalent but uses a table lookup. > /* non-pure operations */ >+OPDEF(ldsb, 3, 1, Ld) // 8-bit load, sign-extend to 32-bit >+OPDEF(ldss, 4, 1, Ld) // 16-bit load, sign-extend to 32-bit >+OPDEF(ldzb, 5, 1, Ld) // 8-bit load, zero extend to 32-bit >+OPDEF(ldzs, 6, 1, Ld) // 16-bit load, zero extend to 32-bit > OPDEF(iaddp, 7, 2, Op2) // integer addition for temporary pointer calculations (32bit only) Can you add change "load" to "integer load"? Some of the existing ones (eg. LIR_ld) could be changed likewise. >+OPDEF(stb, 9, 2, Sti) // 8-bit store >+OPDEF(ld, 10, 1, Ld) // 32-bit load ("ldi") Don't need the '("ldi")', that'll be covered by The Great Opcode Renaming. > OPDEF(ialloc, 11, 0, I) // alloc some stack space (value is 32bit address) > OPDEF(sti, 12, 2, Sti) // 32-bit store > OPDEF(ret, 13, 1, Op1) // return a word-sized value > OPDEF(live, 14, 1, Op1) // extend live range of reference > OPDEF(flive, 15, 1, Op1) // extend live range of a floating point value reference > OPDEF(icall, 16, 0, C) // subroutine call returning a 32-bit value >-OPDEF(unused17, 17, 0, None) >+OPDEF(sts, 17, 2, Sti) // 16-bit store "store" -> "integer store" >+OPDEF64(st32f, 14, 2, Sti) // store double as a 32-bit float (dropping precision) >+OPDEF64(ld32f, 15, 1, Ld) // load 32-bit float and widen to double Change "double" to "64-bit float"?

Steven Johnson

Assignee

Comment 30

•

16 years ago

(In reply to comment #29) > Better would be to avoid the #if here and make each back-end check for the new > opcodes in asm_ld() and abort themselves if they can't handle them, rather than > Assembler doing it on their behalf. Ie. we've already got arch-specific code > in each backend, so let's avoid polluting non-arch-specific code with this kind > of stuff where possible. Fair enough. This scheme (and the subsequent similar ones, and the default args, etc) were all intended as a temporary scheme to allow other back-ends to work as-is with no modification, but if we want to make the relevant API change to all back-ends at once (before they support the actual ops) that is easy enough to do. > Removing the implicit opcode sniffing is a good idea, but it makes the > TM-specific patch tricky -- it's not obvious to me what the type of every > insStore() is, indeed in some cases it may depend on the calling context. So I > think the patch needs an additional function that does explicit type-sniffing. > Something like this: > > LOpcode storeInsForIns(LInsp ins); Scope creep alert :-) This may indeed be useful, but isn't necessary to maintain existing functionality... the insStorei() wrapper I provided should exactly mimic the existing behavior, shouldn't it? (I'm not at all opposed to such a change, but it doesn't seem *necessary* for this patch) > LIns::isLoad() is getting very branchy. Perhaps it could just call isLInsLd(), > which is equivalent but uses a table lookup. A good thought. I hadn't noticed that it existed. Alternately, perhaps we could rearrange the Load instructions to be in a single range... I briefly considered doing this but there are already enough subtle dependencies on magic ranges in LIR that I didn't want to bite it off in this patch. > Can you add change "load" to "integer load"? Some of the existing ones (eg. > LIR_ld) could be changed likewise. sure > >+OPDEF(stb, 9, 2, Sti) // 8-bit store > >+OPDEF(ld, 10, 1, Ld) // 32-bit load ("ldi") ok > "store" -> "integer store" > ok > Change "double" to "64-bit float"? sure

Nicholas Nethercote [inactive]

Comment 31

•

16 years ago

(In reply to comment #30) > > think the patch needs an additional function that does explicit type-sniffing. > > Something like this: > > > > LOpcode storeInsForIns(LInsp ins); > > Scope creep alert :-) This may indeed be useful, but isn't necessary to > maintain existing functionality... the insStorei() wrapper I provided should > exactly mimic the existing behavior, shouldn't it? (I'm not at all opposed to > such a change, but it doesn't seem *necessary* for this patch) True. A later patch, then. > Alternately, perhaps we could rearrange the Load instructions to be in a single > range... I briefly considered doing this but there are already enough subtle > dependencies on magic ranges in LIR that I didn't want to bite it off in this > patch. Please please no... I want to eventually get rid of all magic opcode meanings so that opcode numbers are truly arbitrary :)

Steven Johnson

Assignee

Comment 32

•

16 years ago

(In reply to comment #31) > Please please no... I want to eventually get rid of all magic opcode meanings > so that opcode numbers are truly arbitrary :) A good plan in any event.

Rick Reitmaier

Comment 33

•

16 years ago

Comment on attachment 413221 [details] [diff] [review] Patch #3 (1) Agree with Nick; please remove the preprocessor gunk from Assembler.cpp. Its a bit more work up front, but clears up Assembler and facilitates future incremental support work in the back-ends. (2) Also a bit of a nit but I find it confusing that st32f takes the asm_store64() path. (3) Agree with Nick, the default arguments ('op=LIR_sti/stqi') are unnecessary. (4) Outside of this patch but it would be nice to agree on *unique* characters for our naming. Right now we have s=signed and s=short and 's' also appears in *st*ore; all very confusing. Something like: 'c' = cse-able (non-vollatile) 'x' for sign-extension 'z'= zero 'b' = byte (8bit) 'w' = word (16bit) 'i' = integer (32bit) 'q' = quad (64bit) 'f' = 32bit floating point value (float) 'd' = 64bit floating point value (double) means stores look like : stb, stw, sti, stq, stf loads: ldzb, ldxb, ldzw, ldxw, ldf and the non-volitile versions ldczb, ldcxb, ldczw, ldcxw, ldcf Just an observation ... we are almost getting to the point where a type field encoded in the opcode might make sense. (5) In line Nativei386.cpp:571 you removed a 'return' ; Was this intentional? (6) ST8i() is the underrunProtect value large enough?

Attachment #413221 - Flags: review?(rreitmai) → review+

Nicholas Nethercote [inactive]

Comment 34

•

16 years ago

(In reply to comment #33) > (4) Outside of this patch but it would be nice to agree on *unique* characters > for our naming. Right now we have s=signed and s=short and 's' also appears > in *st*ore; all very confusing. > > Something like: > 'c' = cse-able (non-vollatile) > 'x' for sign-extension > 'z'= zero > 'b' = byte (8bit) > 'w' = word (16bit) > 'i' = integer (32bit) > 'q' = quad (64bit) > 'f' = 32bit floating point value (float) > 'd' = 64bit floating point value (double) Feel free to add to the discussion at the bottom of https://developer.mozilla.org/en/NanojitMerge. I think using size numbers (eg. 8/16/32/64/32f/64f) is looking good...

Jeff Walden [:Waldo]

Comment 35

•

16 years ago

Comment 34 anticipates me!

Steven Johnson

Assignee

Comment 36

•

16 years ago

(In reply to comment #33) > (2) Also a bit of a nit but I find it confusing that st32f takes the > asm_store64() path. why? it's storing a 64-bit value, so that's the proper place for it.

Rick Reitmaier

Comment 37

•

16 years ago

(In reply to comment #36) > (In reply to comment #33) > > (2) Also a bit of a nit but I find it confusing that st32f takes the > > asm_store64() path. > > why? it's storing a 64-bit value, so that's the proper place for it. Hmm, the comment for st32f states store 'x' as 'y'. I guess I tripped over the 'as'. Maybe 'stores the quad result of converting a 64bit to 32bit floating point value.'

Steven Johnson

Assignee

Comment 38

•

16 years ago

Perhaps unrelated, but the PPC backend doesn't appear to be emitted the existing ldcb and ldcs ops correctly... it does #if !PEDANTIC if (isS16(d)) { if (ins->isop(LIR_ldcb)) { LBZ(rr, d, ra); } else { LWZ(rr, d, ra); } return; } #endif // general case underrunProtect(12); LWZX(rr, ra, R0); // rr = [ra+R0] asm_li(R0,d); which only emits ldcb is isS16(d) is true, and *never* properly emits ldcs...

Steven Johnson

Assignee

Comment 39

•

16 years ago

(In reply to comment #37) > Maybe 'stores the quad result of converting a 64bit to 32bit floating point > value.' It doesn't store a quad *result*... it stores a quad-sized *value* into a word-sized result. Yeah, it is confusing... you could make a case for it in either location. When I implemented it, it seemed to fit better there.

Steven Johnson

Assignee

Comment 40

•

16 years ago

(In reply to comment #33) > (5) In line Nativei386.cpp:571 you removed a 'return' ; Was this intentional? nope, good catch! back it goes. > (6) ST8i() is the underrunProtect value large enough? I'll doublecheck.

Steven Johnson

Assignee

Comment 41

•

16 years ago

(In reply to comment #33) > (6) ST8i() is the underrunProtect value large enough? actually, it appears to be too large... MODRMm and MODRMs both appear to have a worst-case of 6, so ST8i, ST16i, and STi are all too large. I'll adjust.

Steven Johnson

Assignee

Comment 42

•

16 years ago

Attached patch Patch #4 — Details — Splinter Review

Revised patch incorporating suggestions from Rick and Nicholas.

Attachment #413221 - Attachment is obsolete: true

Attachment #415012 - Flags: review?(rreitmai)

Attachment #413221 - Flags: review?(nnethercote)

Attachment #413221 - Flags: review?(graydon)

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #415012 - Flags: superreview?

Attachment #415012 - Flags: review?(nnethercote)

Steven Johnson

Assignee

Comment 43

•

16 years ago

Attached patch Patch #4a (Tamarin component) — Details — Splinter Review

Patch to Tamarin to make necessary changes to use Patch #4 above.

Attachment #415013 - Flags: review?(rreitmai)

Rick Reitmaier

Updated

•

16 years ago

Attachment #415013 - Flags: review?(rreitmai) → review+

Rick Reitmaier

Updated

•

16 years ago

Attachment #415012 - Flags: review?(rreitmai) → review+

Nicholas Nethercote [inactive]

Comment 44

•

16 years ago

Comment on attachment 415012 [details] [diff] [review] Patch #4 >diff -r 1c05dd6490d9 nanojit/LIRopcode.tbl >--- a/nanojit/LIRopcode.tbl Sat Nov 28 09:34:48 2009 -0800 >+++ b/nanojit/LIRopcode.tbl Sun Nov 29 12:20:18 2009 -0800 >@@ -74,23 +74,23 @@ > OPDEF(start, 0, 0, Op0) // start of a fragment It looks like you haven't updated recently. Unfortunately with bug 505662 I removed a field from OPDEF/OPD64, thus changing every single code line in this file. You'll have to redo it, sorry. If it's any consolation, this patch will force me to redo some pieces of the patch I'm working on for bug 516347 and bug 520714 :) >+#ifndef NJ_EXPANDED_LOADSTORE_SUPPORTED >+# define NJ_EXPANDED_LOADSTORE_SUPPORTED 0 >+#endif Is this even needed any more? It looks like there aren't any functional uses of it, it just gets mentioned in NanoAssertMsg() messages. > void >-Assembler::asm_store32(LIns *value, int dr, LIns *base) >+Assembler::asm_store32(LOpcode op, LIns *value, int dr, LIns *base) > { >+ switch (op) { >+ case LIR_sti: >+ // handled by mainline code below for now >+ break; >+ case LIR_stb: >+ case LIR_sts: >+ NanoAssertMsg(0, "NJ_EXPANDED_LOADSTORE_SUPPORTED not yet supported for this architecture"); >+ return; >+ default: >+ NanoAssertMsg(0, "asm_store32 should never receive this LIR opcode"); >+ return; >+ } I don't like this pattern much -- in optimised builds, if you get an unhandled or unexpected opcode it'll silently succeed but produce bogus code. I don't expect you to change it for this patch because it's used in lots of places in Nanojit, but I wonder if there should be some kind of assert/abort that is enabled for optimised builds. >- void Assembler::asm_store64(LInsp value, int dr, LInsp base) >+ void Assembler::asm_store64(LOpcode op, LInsp value, int dr, LInsp base) > { >+ if (op == LIR_st32f) >+ { >+ Register rb; >+ if (base->isop(LIR_alloc)) { >+ rb = FP; >+ dr += findMemFor(base); >+ } else { >+ rb = findRegFor(base, GpRegs); >+ } This if-then-else can be replaced with a call to getBaseReg(). There may be other cases like that in the patch but this was the only one I saw. r=me with these nits considered. A couple of other things: - Can you add the new opcodes to the opcode table at the bottom of https://developer.mozilla.org/en/NanojitMerge once you've committed. - Have you tested with Tracemonkey yet? I'd like to see a Tracemonkey patch before the the NJ part is committed (assuming one is necessary) so that when someone (possibly me) updates Tracemonkey with this patch their job is as easy as possible.

Attachment #415012 - Flags: review?(nnethercote) → review+

Steven Johnson

Assignee

Comment 45

•

16 years ago

(In reply to comment #44) > It looks like you haven't updated recently. Unfortunately with bug 505662 I > removed a field from OPDEF/OPD64, thus changing every single code line in this > file. No worries, I've already updated locally -- if you want to re-review the changes let me know, but it's pretty mechanical. > >+#ifndef NJ_EXPANDED_LOADSTORE_SUPPORTED > >+# define NJ_EXPANDED_LOADSTORE_SUPPORTED 0 > >+#endif > > Is this even needed any more? It looks like there aren't any functional uses > of it, it just gets mentioned in NanoAssertMsg() messages. Yes, there's a big use of it: downstream clients examine it to know if it's safe to issue the new instructions or not. > I don't like this pattern much -- in optimised builds, if you get an unhandled > or unexpected opcode it'll silently succeed but produce bogus code. I don't > expect you to change it for this patch because it's used in lots of places in > Nanojit, but I wonder if there should be some kind of assert/abort that is > enabled for optimised builds. Not that I'm aware of. IIRC there are other places where similar things happen (or at least, could have happened) before. I'm open for better suggestions, but in general, only Assembler will call these (from a giant switch) so the likelihood of incorrect arguments isn't really dependent on what clients might do. > This if-then-else can be replaced with a call to getBaseReg(). There may be > other cases like that in the patch but this was the only one I saw. Ah, is that new? I was mimicking existing code when I wrote it. > - Can you add the new opcodes to the opcode table at the bottom of > https://developer.mozilla.org/en/NanojitMerge once you've committed. Yep, was waiting to commit to update. > - Have you tested with Tracemonkey yet? I'd like to see a Tracemonkey patch > before the the NJ part is committed (assuming one is necessary) so that when > someone (possibly me) updates Tracemonkey with this patch their job is as easy > as possible. Nope, not at all... Graydon had indicated to me that a TM patch was not necessary for these things (though obviously it would be a nice thing to do) so I hadn't done it. Is there a minimal-TM-harness I can use for such testing, or do I need to pull and build all of mozilla? (Guess I better go read the nanojit-central wiki page again...)

Steven Johnson

Assignee

Comment 46

•

16 years ago

OK, I have pulled TM and done some minimal testing with the patch... there is *no* change necessary to TM, thus no patch will be forthcoming. (The only source change necessary is for embedders which implement LirWriters and override insStorei(), which became insStore()... TM does not appear to do this.) If all the relevant perf tests run acceptably then I'll be landing this today.

Steven Johnson

Assignee

Comment 47

•

16 years ago

pushed to nanojit-central as changeset: 1096:7c5395e67078

Steven Johnson

Assignee

Updated

•

16 years ago

Attachment #415012 - Flags: superreview? → superreview+

Steven Johnson

Assignee

Comment 48

•

16 years ago

wiki updated with new opcodes. pushed to tamarin-redux as of changeset: 3242:8b42130cb465. I'm closing this bug as resolved/fixed and opening new bugs to implement the new opcodes for the remaining backends: x64: https://bugzilla.mozilla.org/show_bug.cgi?id=532240 ARM: https://bugzilla.mozilla.org/show_bug.cgi?id=532241 PPC: https://bugzilla.mozilla.org/show_bug.cgi?id=532242 Sparc: https://bugzilla.mozilla.org/show_bug.cgi?id=532243 MIPS: https://bugzilla.mozilla.org/show_bug.cgi?id=532251

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Nicholas Nethercote [inactive]

Comment 49

•

16 years ago

BTW, the procedure is that we don't mark bugs as RESOLVED FIXED at least until they've been merged to both TM and TR. Actually, on the Mozilla side we don't mark them as RESOLVED FIXED until they're merged with mozilla-central, but I'm not sure what the policy is in general for nanojit-central.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey

Nicholas Nethercote [inactive]

Comment 50

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/adead3f70760 http://hg.mozilla.org/tracemonkey/rev/e7ae58dcc58c

Steven Johnson

Assignee

Comment 51

•

16 years ago

(In reply to comment #49) > BTW, the procedure is that we don't mark bugs as RESOLVED FIXED at least until > they've been merged to both TM and TR. Oops. Wasn't aware. Will keep this in mind in the future.

Steven Johnson

Assignee

Comment 52

•

16 years ago

Pretty sure that this has been pushed to all relevant repos, shall we close it?

Trevor Baker

Updated

•

16 years ago

Component: JIT Compiler (NanoJIT) → Nanojit

Product: Tamarin → Core

QA Contact: nanojit → nanojit

Target Milestone: --- → Future

Ryan VanderMeulen [:RyanVM]

Updated

•

14 years ago

Status: REOPENED → RESOLVED

Closed: 16 years ago → 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Core → Core Graveyard

Patch #1 16 years ago Steven Johnson 28.80 KB, patch	edwsmith : review-	Details \| Diff \| Splinter Review
Patch #2 16 years ago Steven Johnson 44.84 KB, patch		Details \| Diff \| Splinter Review
Patch #3 16 years ago Steven Johnson 33.63 KB, patch	edwsmith : review+ rreitmai : review+	Details \| Diff \| Splinter Review
Patch #4 16 years ago Steven Johnson 50.55 KB, patch	rreitmai : review+ n.nethercote : review+ stejohns : superreview+	Details \| Diff \| Splinter Review
Patch #4a (Tamarin component) 16 years ago Steven Johnson 1.58 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review