Last Comment Bug 527083 - Nanojit should have 8-bit and 16-bit store operations
: Nanojit should have 8-bit and 16-bit store operations
Status: RESOLVED FIXED
fixed-in-nanojit, fixed-in-tamarin, f...
:
Product: Core Graveyard
Classification: Graveyard
Component: Nanojit (show other bugs)
: unspecified
: All All
: -- normal (vote)
: Future
Assigned To: Steven Johnson
:
:
Mentors:
: 527085 (view as bug list)
Depends on:
Blocks: 458279
  Show dependency treegraph
 
Reported: 2009-11-06 11:53 PST by Steven Johnson
Modified: 2014-03-17 08:00 PDT (History)
6 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch #1 (28.80 KB, patch)
2009-11-12 15:03 PST, Steven Johnson
edwsmith: review-
Details | Diff | Splinter Review
Patch #2 (44.84 KB, patch)
2009-11-17 19:02 PST, Steven Johnson
no flags Details | Diff | Splinter Review
Patch #3 (33.63 KB, patch)
2009-11-18 16:41 PST, Steven Johnson
edwsmith: review+
rreitmai: review+
Details | Diff | Splinter Review
Patch #4 (50.55 KB, patch)
2009-11-29 12:20 PST, Steven Johnson
rreitmai: review+
n.nethercote: review+
stejohns: superreview+
Details | Diff | Splinter Review
Patch #4a (Tamarin component) (1.58 KB, patch)
2009-11-29 12:22 PST, Steven Johnson
rreitmai: review+
Details | Diff | Splinter Review

Description Steven Johnson 2009-11-06 11:53:25 PST
Nanojit currently offers 8-bit and 16-bit load operations (ldcb and ldcs), but no equivalent store operations. Code that wants to manipulate bytes and shorts (eg code generated by the Adobe Alchemy tool) needs these operations to run efficiently.
Comment 1 Steven Johnson 2009-11-12 15:03:08 PST
Created attachment 412085 [details] [diff] [review]
Patch #1

Initial proof-of-concept patch, with i386 backend implementation. Also includes features requested in bug 527085 and bug 527246, as it seemed simpler to implement them all simultaneously.

I'm guessing this design may need some iteration; it's tested and functional, but I'm not sure if all the design decisions are the best ones. Notes:

-- defines new instructions for load byte (sign extend or zero extend), short (sign extend or zero extend), store byte, store short, load 32-bit float, store 32-bit float. 

-- the stores assume that there aren't alignment issues on architectures that require it (ie, if the backend is requested to load/store a short or word at an inappropriate address, an access fault will result)

-- a backend declares itself capable of these new ops by defining NJ_EXPANDED_LOADSTORE_SUPPORTED to 1. (the ops are defined on all platforms, but may generate incorrect code on non-supporting platforms; it is up to the caller to avoid using them incorrectly.) It is anticipated that all backends will eventually support this, at which time this define will be removed.

-- the signature of LirWriter::insStorei() was changed, to include an LOpcode; this means that existing embedders need to update any overrides appropriately. (I'm not happy about this but it seemed the cleanest long-term solution, especially considering that insLoad() already did this.) LirWriter::ins_store() was added as a (non-overridable) helper method to provide the legacy behavior of infer-op-from-arg-size.

I don't (yet) have any standalone testcases for the new operations; the testing I've done so far is in the context of Tamarin (I'm not sure what the unit-test approach for nanojit-central will be, opinions welcome).
Comment 2 Nicholas Nethercote [:njn] 2009-11-12 23:27:05 PST
Initial comments after a first read:

- It's unclear to me the cases where these new instructions are useful are unclear to me.  For this bug you mention Adobe Alchemy, for the other two bugs you just say "it would be useful".  Now that you have a patch, can you report any results from uses within Tamarin, eg. "program X runs faster" or "code generation is much easier in this case"?

- For loads, not all combinations of size, CSEability and extension-kind are present.  This isn't good.
Comment 3 Steven Johnson 2009-11-13 09:28:07 PST
(In reply to comment #2)
> - It's unclear to me the cases where these new instructions are useful are
> unclear to me.  For this bug you mention Adobe Alchemy, for the other two bugs
> you just say "it would be useful".  Now that you have a patch, can you report
> any results from uses within Tamarin, eg. "program X runs faster" or "code
> generation is much easier in this case"?

Good point, I'll get some concrete numbers later today and post them. Short version is that it's a major speedup for code generators (like Alchemy) that do frequent operations of this kind, since the alternative is to call out to a function that does a single operation (load or store)...
 
> - For loads, not all combinations of size, CSEability and extension-kind are
> present.  This isn't good.

This is true; I only added the ones needed for my immediate purposes. In my defense, though, that was already the case (we had LIR_ldcb and LIR_ldsb but no other load/store ops for byte or short)....
Comment 4 Edwin Smith 2009-11-16 11:02:24 PST
(In reply to comment #2)
> Initial comments after a first read:
> 
> - It's unclear to me the cases where these new instructions are useful are
> unclear to me.  For this bug you mention Adobe Alchemy, for the other two bugs
> you just say "it would be useful".  Now that you have a patch, can you report
> any results from uses within Tamarin, eg. "program X runs faster" or "code
> generation is much easier in this case"?

The motivation should have been more clear.  Tamarin supports a series of opcodes in its bytecode instruction set for loading and storing little-endian machine values (int8, int16, int32, float32, float64) to its ByteArray class.  They are 0-extending by definition.  We also have three sign-extension bytecodes (1->32, 8->32, 16->32).

collectively these opcodes are called MOPS (memory opcodes) and they're used heavily by a tool called Alchemy, which translates C++ to ABC.  (using an LLVM backend).  

Naturally we want to optimize the snot out of these operations, and currently the int8, int16, and float32 operations are implemented with handlers because we can't express them in LIR.  so this bug aims to add the extra opcodes necessary to express them in LIR.

A secondary reason to add them is to access 8-bit and 16-bit fields in VM structures.  Currently we only have a handful of use cases, and we artificially make the field 32-bit, if it needs to be accessed from LIR.

> 
> - For loads, not all combinations of size, CSEability and extension-kind are
> present.  This isn't good.

for our needs at the moment, we don't need the cse-ability.  we certianly can add those opcodes for symmetry, however.

Another open question is how nanojit should deal with sign extension in LIR.  

options:

1. loads are sized (8/16/32), its an error to mismatch size, and we add explicit zero-extension and sign-extension opcodes plus a float32->double opcode.  precedent: 32bit loads plus u2q/i2q for zero/sign extension).  

2. small loads are 32-bit, zero-extended by definition, add new sign-extension opcodes at least for the 8->32 and 16->32 bit use cases.  load-float32 implicitly extends to double64.  precedent: ldc8/16.  

3. zero and sign extending loads for 8/16.  (and cse variants)

To me (2) is tightest, but (3) is closer to what cpu's really support, and the pattern matching required for instruction selction in (2), to take advantage of zero/sign-extending cpu ops, isn't obviously easy because the sign-extension forms the root of the expression tree and is seen before the load. (pattern matching would have to guard against moving the load past any stores).
Comment 5 Nicholas Nethercote [:njn] 2009-11-16 22:50:45 PST
I think separate sign/zero-extend opcodes is a bad idea.  So I guess I'm favouring (3).  I'm imagining:

  ldz8, lds8
  ldz16, lds16
  ld32
  ld64
  ld32f (which extends the double32 giving a double64 result)

(or whatever names we end up choosing).  And I guess CSEable variants of each.
Comment 6 Edwin Smith 2009-11-17 07:59:56 PST
I checked ARM and x86, both support both zero-extending and sign-extending 8bit and 16bit loads.  So, I agree, (3) maps best to the most important ISA's.

If we keep with the naming conventions proposed here:
https://developer.mozilla.org/en/NanojitMerge#

which are b/s/i/q for 8/16/32/64 then we get:

     non-cse       with-cse
8    ldzb,ldsb     ldczb,ldcsb  # ld-[cse]-[zero|signed]-byte
16   ldzs,ldss     ldczs,ldcss  # ld-[cse]-[zero|signed]-short
32   ldi           ldci         # ld-[cse]-int
64   ldq           ldcq         # ld-[cse]-quad

for symmetry, we could have zero and sign-extending 32->64 loads as well, but a) scope creep, and b) it might not mean we should drop u2q and i2q, because at least one way we use i2q/u2q is to sanitize values passed into functions in registers or returned from calls in registers (there is no implied load).

the implication for the tamarin MOPS opcodes is we will want to have something that can fold OP_li8 + OP_sxi8 into LIR_ldsb.  The frontend must deal with pattern matching while avoiding moving loads-past-stores, which is an okay tradeoff to make LIR match popular ISA's.
Comment 7 Steven Johnson 2009-11-17 09:18:10 PST
(In reply to comment #6)
> So, I agree, (3) maps best to the most important ISA's.

+1 here (unsurprisingly since that's how I've implemented it so far), so it seems we have a consensus.

> If we keep with the naming conventions proposed here:
> https://developer.mozilla.org/en/NanojitMerge#
> 
> which are b/s/i/q for 8/16/32/64 then we get:
> 
>      non-cse       with-cse
> 8    ldzb,ldsb     ldczb,ldcsb  # ld-[cse]-[zero|signed]-byte
> 16   ldzs,ldss     ldczs,ldcss  # ld-[cse]-[zero|signed]-short
> 32   ldi           ldci         # ld-[cse]-int
> 64   ldq           ldcq         # ld-[cse]-quad

Sounds good, I'll revise the patch accordingly and add the missing opcodes.

Is there a precedent for the load-float-to-double and store-double-to-float opcodes, in terms of naming convention? I used "pf" to indicate "packed float" in this patch; not totally happy with that but I'm open for better suggestions.
 
> for symmetry, we could have zero and sign-extending 32->64 loads as well

Yeah, scope creep IMHO. Unless there's philosophical objection let's postpone those.

> the implication for the tamarin MOPS opcodes is we will want to have something
> that can fold OP_li8 + OP_sxi8 into LIR_ldsb.  

Yep, completely acceptable because our pattern tends to be highly predictable -- virtually always is those two in immediate sequence. (I already have such an optimization in place in my local branch.)
Comment 8 Edwin Smith 2009-11-17 13:17:07 PST
review comments (ahead of updated patch)

to ease migration, how about:  insStorei() doesn't change signature, but becomes a convenience function on LirWriter that calls a new function: insStore(op, ...) which takes the opcode.  All the subclasses of LirWriter that override insStorei will need to change, to override insStore() instead.

rationale: Its time to deprecate insStorei (the name), and the fact that it sniffs its operands to determine the store opcode (and the sniffing doesn't happen until the instruction is written to LirBuffer... madness!).  this pattern has already been all-but-expunged from nanojit.  ordinarily i'd say this is scope creep, but i can't see how to add these new store opcodes without facing up to the issue.

please add the proposed new opcode names and opcodes to the table here:
https://developer.mozilla.org/en/NanojitMerge#LIR_opcode_assignments

for 8/16/float32 stores I suggest stb/sts/st32f

(arg, maybe we should use 'd' for doubles everywhere, and 'f' for floats, but again, out of scope).

the actual code changes for x86 seem mostly fine.

are movsx16 and movzx16 intel assembler mnemonics?  we try to follow the cpu mfgr notational conventions.
Comment 9 Steven Johnson 2009-11-17 13:22:12 PST
(In reply to comment #8)

> are movsx16 and movzx16 intel assembler mnemonics?  we try to follow the cpu
> mfgr notational conventions.

No, IIRC it uses movzx/movsx, with size indicated by the operand. I'll see if I can fit something closer to the proper syntax into our existing infrastructure. (If it's painful to do so, however, it may be worth cheating a bit.)
Comment 10 Edwin Smith 2009-11-17 13:38:02 PST
sure, cheat if you have to.  i think the 16bit and 8bit intel notation would be something like  "mov WORD/BYTE PTR disp(off), src" which is ugly.
Comment 11 Steven Johnson 2009-11-17 18:43:21 PST
(In reply to comment #8)

> please add the proposed new opcode names and opcodes to the table here:
> https://developer.mozilla.org/en/NanojitMerge#LIR_opcode_assignments

Seems like I should wait 'till the changes are reviewed (and landed?) before updating the wiki.

Also, I think the changes you outline are good in that they regularize the set, it does change existing ones (eg LIR_ld -> LIR_ldi) so existing code will see a bit of churn if we do this. (IMHO it's worth the short-term pain).

We also need a cse version of the load32float op ... I vote for ldc32f.

Re: cheating on opnames, in this case, it's not clear to me that the "official" intel forms are better; they are both longer and clunkier. If exact assembler compliance is desired I'll work something out, but for purposes of debugging verbose output, I think the current clunky hack is adequate (probably with adding a comment that we know it's wrong...)
Comment 12 Nicholas Nethercote [:njn] 2009-11-17 18:50:40 PST
See also bug 504506, aka. "The Great Opcode Renaming".

The table on the wiki is totally out of date and several columns can be removed now that TM and TR have merged.  I'll fix it up later today.
Comment 13 Steven Johnson 2009-11-17 19:02:02 PST
Created attachment 412999 [details] [diff] [review]
Patch #2

Revised version of the first patch, incorporating suggestions from Edwin and Nicholas.
Comment 14 Nicholas Nethercote [:njn] 2009-11-17 21:19:23 PST
(In reply to comment #12)
> 
> The table on the wiki is totally out of date and several columns can be removed
> now that TM and TR have merged.  I'll fix it up later today.

Done.  It's now 2 columns instead of 4 which makes it much easier to read.
Comment 15 Steven Johnson 2009-11-18 10:18:58 PST
(In reply to comment #12)
> See also bug 504506, aka. "The Great Opcode Renaming".

That bug doesn't seem to have much action since July -- is it still an active concern?
Comment 16 Graydon Hoare :graydon 2009-11-18 11:36:21 PST
Comment on attachment 412999 [details] [diff] [review]
Patch #2

Needs splitting into NJ-specific and TR/TM-specific pieces. Also needs modifications to lirasm.
Comment 17 Nicholas Nethercote [:njn] 2009-11-18 11:40:53 PST
(In reply to comment #15)
> (In reply to comment #12)
> > See also bug 504506, aka. "The Great Opcode Renaming".
> 
> That bug doesn't seem to have much action since July -- is it still an active
> concern?

Yes.  We decided to leave it until after the merge, which took almost four months.
Comment 18 Steven Johnson 2009-11-18 11:45:16 PST
(In reply to comment #16)
> (From update of attachment 412999 [details] [diff] [review])
> Needs splitting into NJ-specific and TR/TM-specific pieces. 

Oops. My bad. Is this patch reviewable as-is or do I need to resubmit?

> Also needs modifications to lirasm.

Hmm... it appears that lirasm doesn't get integrated into TR just yet -- Rick/Ed, what's the story, is it going to get moved in or do I need to pull directly from nj-central?
Comment 19 Graydon Hoare :graydon 2009-11-18 11:49:25 PST
I can review this portion as-is, but landing should go via nj-central, and for that must carry a lirasm part. If you commit it as-is the nj-central tree[1] will turn red.

[1] http://tinderbox.mozilla.org/showbuilds.cgi?tree=Nanojit
Comment 20 Rick Reitmaier 2009-11-18 11:51:53 PST
(In reply to comment #18)
> (In reply to comment #16)
> 
> Hmm... it appears that lirasm doesn't get integrated into TR just yet --
> Rick/Ed, what's the story, is it going to get moved in or do I need to pull
> directly from nj-central?

We talked about it a bit last week and decided not to integrate it into TR.  Instead we can just use the one in NC.
Comment 21 Steven Johnson 2009-11-18 11:57:05 PST
(In reply to comment #19)
> I can review this portion as-is, but landing should go via nj-central, and for
> that must carry a lirasm part. If you commit it as-is the nj-central tree[1]
> will turn red.

OK, good to know -- I haven't had to do any NJ changes since the merge + nj-central, so I have to get up to speed on that.

Is it necessary for me to revise TM as well, or can I rely on mozillans to make the necessary changes (assuming review+ eventually comes from your side)?
Comment 22 Graydon Hoare :graydon 2009-11-18 12:11:44 PST
A TM patch is appreciated but not required if we r+ it. Likewise we'll wait on your r+ for anything that implies a necessary TR patch (whether or not we compose that patch).

Whether or not such an NC change gets an r+ without an accompanying TR/TM patch depends on level of confidence that it can be made on-the-fly and in-a-hurry: once you land something on NC that requires a TM or TR patch, that TM or TR patch becomes a roadblock to further integration of other NC landings that follow. So it's generally good to have all the patches lined up before doing the NC landing.
Comment 23 Steven Johnson 2009-11-18 12:20:52 PST
(In reply to comment #22)
> So it's generally good to have all the patches lined up before doing
> the NC landing.

Hmm... the name changes in this patch are pretty mechanical, so I don't think it's an onerous burden on the integrator. That said, I could always avoid renaming the existing instructions (see comments above) on the assumption that a great renaming would happen in a later pass.
Comment 24 Edwin Smith 2009-11-18 12:38:17 PST
I would recommend the new opcodes be named according to the Great Migration, but not to rename anything that already exists as part of this work.  too much churn.
Comment 25 Nicholas Nethercote [:njn] 2009-11-18 12:42:33 PST
https://developer.mozilla.org/en/NanojitMerge has the full instructions on landing patches in the post-merge world, BTW.  Worth reading, there are a number of non-obvious aspects.
Comment 26 Steven Johnson 2009-11-18 13:23:11 PST
(In reply to comment #24)
> I would recommend the new opcodes be named according to the Great Migration,
> but not to rename anything that already exists as part of this work.  too much
> churn.

Fair enough -- I'll resubmit a new patch that doesn't change the existing names (and also is split into chunks per Graydon's suggestion).

(In reply to comment #25)
> https://developer.mozilla.org/en/NanojitMerge has the full instructions on
> landing patches in the post-merge world, BTW.  Worth reading, there are a
> number of non-obvious aspects.

Thanks, will do.
Comment 27 Steven Johnson 2009-11-18 16:41:11 PST
Created attachment 413221 [details] [diff] [review]
Patch #3

Principle differences from previous patch:

-- patch is now against current nj-central (vs tamarin-redux)
-- no renaming of existing opcodes at all (just adding new ones)
-- attempting to use new opcodes on a backend that doesn't support them will NanoAssert(0)
-- added new opcodes to the lirasm --random list (and i386 version passes with them).

Note that I deliberately didn't add any specific lirasm testcases for the new opcodes yet, as those tests would fail on targets that don't support them yet (ie everything but i386).
Comment 28 Edwin Smith 2009-11-19 13:41:07 PST
Comment on attachment 413221 [details] [diff] [review]
Patch #3

Nativei386.cpp:543 - yes, XORPD is still needed to zero the upper 64bits; however, XORPS is actually preferred (same effect, one less byte).  (this should be fixed everywhere, out of scope for this bug)

I havent reviewd the encodings of the new instructions.  a good method is to put an assert after each new kind of instruction, run a test, let the isntruction get generated, then disassemble starting at _nIns (eg in gdb:  disass _nIns _nIns+10) and convince yourself you're getting the proper instruction (check the registers, etc).  once it looks good, remove the assert.  hopefully all the new instructions get exersized by the test suite.

as for lirasm, maybe --random should be ifdefed as well so we only use the new opcodes on cpu's that support them?
Comment 29 Nicholas Nethercote [:njn] 2009-11-22 23:04:13 PST
Comment on attachment 413221 [details] [diff] [review]
Patch #3

>+                case LIR_ldzb:
>+                case LIR_ldzs:
>+                case LIR_ldsb:
>+                case LIR_ldss:
>+                case LIR_ldcsb:
>+                case LIR_ldcss:
>+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED 
>+                    // shouldn't issue these instructions for backends that
>+                    // don't support them yet
>+                    NanoAssert(0);
>+#endif

Better would be to avoid the #if here and make each back-end check for the new opcodes in asm_ld() and abort themselves if they can't handle them, rather than Assembler doing it on their behalf.  Ie. we've already got arch-specific code in each backend, so let's avoid polluting non-arch-specific code with this kind of stuff where possible.


>@@ -1047,6 +1058,14 @@
>                     asm_ld(ins);
>                     break;
>                 }
>+
>+                case LIR_ld32f:
>+                case LIR_ldc32f:
>+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED 
>+                    // shouldn't issue these instructions for backends that
>+                    // don't support them yet
>+                    NanoAssert(0);
>+#endif

Same here.


>@@ -1134,19 +1153,36 @@
>                     asm_promote(ins);
>                     break;
>                 }
>+                case LIR_stb:
>+                case LIR_sts:
>+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED 
>+                    // shouldn't issue these instructions for backends that
>+                    // don't support them yet
>+                    NanoAssert(0);
>+#endif

And here.


>                 case LIR_sti:
>                 {
>                     countlir_st();
>+#if NJ_EXPANDED_LOADSTORE_SUPPORTED
>+                    asm_store32(ins->oprnd1(), ins->disp(), ins->oprnd2(), op);
>+#else
>                     asm_store32(ins->oprnd1(), ins->disp(), ins->oprnd2());
>+#endif
>                     break;
>                 }

Just add the parameter to all backends now, don't have two versions of asm_store32().


>+                case LIR_st32f:
>+#if !NJ_EXPANDED_LOADSTORE_SUPPORTED 
>+                    // shouldn't issue these instructions for backends that
>+                    // don't support them yet
>+                    NanoAssert(0);
>+#endif

As above.


>@@ -1154,7 +1190,11 @@
>                     }
>                     else
>                     {
>+#if NJ_EXPANDED_LOADSTORE_SUPPORTED
>+                        asm_store64(value, dr, base, op);
>+#else
>                         asm_store64(value, dr, base);
>+#endif

Again, just one version please.


>diff -r 018f529d8030 nanojit/Assembler.h
>--- a/nanojit/Assembler.h	Wed Nov 18 16:07:09 2009 -0500
>+++ b/nanojit/Assembler.h	Wed Nov 18 16:38:01 2009 -0800
>@@ -315,8 +315,13 @@
>             NIns*       asm_exit(LInsp guard);
>             NIns*       asm_leave_trace(LInsp guard);
>             void        asm_qjoin(LIns *ins);
>+        #if NJ_EXPANDED_LOADSTORE_SUPPORTED
>+            void        asm_store32(LIns *val, int d, LIns *base, LOpcode op = LIR_sti);
>+            void        asm_store64(LIns *val, int d, LIns *base, LOpcode op = LIR_stqi);

Why the default parameter?  IMO it makes things less clear, and there are few callers of asm_store32/64 so it doesn't save much typing.


>diff -r 018f529d8030 nanojit/LIR.cpp
>--- a/nanojit/LIR.cpp	Wed Nov 18 16:07:09 2009 -0500
>+++ b/nanojit/LIR.cpp	Wed Nov 18 16:38:01 2009 -0800
>@@ -245,9 +245,8 @@
>         return startOfRoom;
>     }
> 
>-    LInsp LirBufWriter::insStorei(LInsp val, LInsp base, int32_t d)
>+    LInsp LirBufWriter::insStore(LOpcode op, LInsp val, LInsp base, int32_t d)

Removing the implicit opcode sniffing is a good idea, but it makes the TM-specific patch tricky -- it's not obvious to me what the type of every insStore() is, indeed in some cases it may depend on the calling context.  So I think the patch needs an additional function that does explicit type-sniffing.  Something like this:

  LOpcode storeInsForIns(LInsp ins);

You'd call it like this:

  insStore(storeInsForIns(ins), ins, base, disp);

If 'ins' is a LIR_add the result would be LIR_sti, if 'ins' is a LIR_qadd the result would be LIR_stqi, etc.  This will gain further use for bug 520714 which will add LIR_ldf and LIR_ldcf.

The obvious way is to implement storeInsForIns as a big switch.  If there's a performance hit a table-based approach would be better.  Eventually I envision that LIRopcode.tbl has a field so that each instruction is annotated with its return type (eg. I32, I64, F64, Void).  This could be used by storeInsForIns(), and it could also be used by the LIR type-checker (bug 463137).


>@@ -884,6 +883,12 @@
> #endif
>     }
> 
>+    LIns* LirWriter::insStorei(LIns* value, LIns* base, int32_t d)
>+    {
>+        LOpcode op = value->isQuad() ? LIR_stqi : LIR_sti;
>+        return insStore(op, value, base, d);
>+    }

With the explicit type-sniffing implemented, insStorei() can be removed.


LIns::isLoad() is getting very branchy.  Perhaps it could just call isLInsLd(), which is equivalent but uses a table lookup.


> /* non-pure operations */
>+OPDEF(ldsb,      3, 1, Ld)      // 8-bit load, sign-extend to 32-bit
>+OPDEF(ldss,      4, 1, Ld)      // 16-bit load, sign-extend to 32-bit
>+OPDEF(ldzb,      5, 1, Ld)      // 8-bit load, zero extend to 32-bit
>+OPDEF(ldzs,      6, 1, Ld)      // 16-bit load, zero extend to 32-bit
> OPDEF(iaddp,     7, 2, Op2)     // integer addition for temporary pointer calculations (32bit only)

Can you add change "load" to "integer load"?  Some of the existing ones (eg. LIR_ld) could be changed likewise.


>+OPDEF(stb,       9, 2, Sti)     // 8-bit store
>+OPDEF(ld,       10, 1, Ld)      // 32-bit load ("ldi")

Don't need the '("ldi")', that'll be covered by The Great Opcode Renaming.


> OPDEF(ialloc,   11, 0, I)       // alloc some stack space (value is 32bit address)
> OPDEF(sti,      12, 2, Sti)     // 32-bit store
> OPDEF(ret,      13, 1, Op1)     // return a word-sized value
> OPDEF(live,     14, 1, Op1)     // extend live range of reference
> OPDEF(flive,    15, 1, Op1)     // extend live range of a floating point value reference
> OPDEF(icall,    16, 0, C)       // subroutine call returning a 32-bit value
>-OPDEF(unused17, 17, 0, None)
>+OPDEF(sts,      17, 2, Sti)     // 16-bit store

"store" -> "integer store"


>+OPDEF64(st32f,  14, 2, Sti)        // store double as a 32-bit float (dropping precision)
>+OPDEF64(ld32f,  15, 1, Ld)         // load 32-bit float and widen to double

Change "double" to "64-bit float"?
Comment 30 Steven Johnson 2009-11-23 10:55:36 PST
(In reply to comment #29)
> Better would be to avoid the #if here and make each back-end check for the new
> opcodes in asm_ld() and abort themselves if they can't handle them, rather than
> Assembler doing it on their behalf.  Ie. we've already got arch-specific code
> in each backend, so let's avoid polluting non-arch-specific code with this kind
> of stuff where possible.

Fair enough. This scheme (and the subsequent similar ones, and the default args, etc) were all intended as a temporary scheme to allow other back-ends to work as-is with no modification, but if we want to make the relevant API change to all back-ends at once (before they support the actual ops) that is easy enough to do.

> Removing the implicit opcode sniffing is a good idea, but it makes the
> TM-specific patch tricky -- it's not obvious to me what the type of every
> insStore() is, indeed in some cases it may depend on the calling context.  So I
> think the patch needs an additional function that does explicit type-sniffing. 
> Something like this:
> 
>   LOpcode storeInsForIns(LInsp ins);

Scope creep alert :-) This may indeed be useful, but isn't necessary to maintain existing functionality... the insStorei() wrapper I provided should exactly mimic the existing behavior, shouldn't it? (I'm not at all opposed to such a change, but it doesn't seem *necessary* for this patch)


> LIns::isLoad() is getting very branchy.  Perhaps it could just call isLInsLd(),
> which is equivalent but uses a table lookup.

A good thought. I hadn't noticed that it existed.

Alternately, perhaps we could rearrange the Load instructions to be in a single range... I briefly considered doing this but there are already enough subtle dependencies on magic ranges in LIR that I didn't want to bite it off in this patch.

> Can you add change "load" to "integer load"?  Some of the existing ones (eg.
> LIR_ld) could be changed likewise.

sure

> >+OPDEF(stb,       9, 2, Sti)     // 8-bit store
> >+OPDEF(ld,       10, 1, Ld)      // 32-bit load ("ldi")

ok

> "store" -> "integer store"
> 

ok

> Change "double" to "64-bit float"?

sure
Comment 31 Nicholas Nethercote [:njn] 2009-11-23 12:19:54 PST
(In reply to comment #30)
> > think the patch needs an additional function that does explicit type-sniffing. 
> > Something like this:
> > 
> >   LOpcode storeInsForIns(LInsp ins);
> 
> Scope creep alert :-) This may indeed be useful, but isn't necessary to
> maintain existing functionality... the insStorei() wrapper I provided should
> exactly mimic the existing behavior, shouldn't it? (I'm not at all opposed to
> such a change, but it doesn't seem *necessary* for this patch)

True.  A later patch, then.


> Alternately, perhaps we could rearrange the Load instructions to be in a single
> range... I briefly considered doing this but there are already enough subtle
> dependencies on magic ranges in LIR that I didn't want to bite it off in this
> patch.

Please please no... I want to eventually get rid of all magic opcode meanings so that opcode numbers are truly arbitrary :)
Comment 32 Steven Johnson 2009-11-23 15:02:02 PST
(In reply to comment #31)
 
> Please please no... I want to eventually get rid of all magic opcode meanings
> so that opcode numbers are truly arbitrary :)

A good plan in any event.
Comment 33 Rick Reitmaier 2009-11-24 15:44:11 PST
Comment on attachment 413221 [details] [diff] [review]
Patch #3


(1) Agree with Nick; please remove the preprocessor gunk from Assembler.cpp.  Its a bit more work up front, but clears up Assembler and facilitates future incremental support work in the back-ends.

(2)  Also a bit of a nit but I find it confusing that st32f takes the asm_store64() path.

(3) Agree with Nick, the default arguments ('op=LIR_sti/stqi') are unnecessary.

(4) Outside of this patch but it would be nice to agree on *unique* characters for our naming.  Right now we have s=signed and s=short and 's'  also appears in *st*ore; all very confusing.   

Something like:
   'c' = cse-able (non-vollatile)
   'x' for sign-extension
   'z'= zero
   'b' = byte (8bit)
   'w' = word (16bit)
   'i' = integer (32bit)
   'q' = quad (64bit)
   'f' =  32bit floating point value (float)
   'd' =  64bit floating point value (double)

means stores look like :
   stb, stw, sti, stq, stf

loads:
  ldzb, ldxb, ldzw, ldxw, ldf and the non-volitile versions  ldczb, ldcxb, ldczw, ldcxw, ldcf

Just an observation ... we are almost getting to the point where a type field encoded in the opcode might make sense.

(5) In line Nativei386.cpp:571 you removed a 'return' ;  Was this intentional?

(6) ST8i() is the underrunProtect value large enough?
Comment 34 Nicholas Nethercote [:njn] 2009-11-24 15:57:11 PST
(In reply to comment #33)
> (4) Outside of this patch but it would be nice to agree on *unique* characters
> for our naming.  Right now we have s=signed and s=short and 's'  also appears
> in *st*ore; all very confusing.   
> 
> Something like:
>    'c' = cse-able (non-vollatile)
>    'x' for sign-extension
>    'z'= zero
>    'b' = byte (8bit)
>    'w' = word (16bit)
>    'i' = integer (32bit)
>    'q' = quad (64bit)
>    'f' =  32bit floating point value (float)
>    'd' =  64bit floating point value (double)

Feel free to add to the discussion at the bottom of https://developer.mozilla.org/en/NanojitMerge.  I think using size numbers (eg. 8/16/32/64/32f/64f) is looking good...
Comment 35 Jeff Walden [:Waldo] (remove +bmo to email) 2009-11-24 16:00:25 PST
Comment 34 anticipates me!
Comment 36 Steven Johnson 2009-11-24 16:16:39 PST
(In reply to comment #33)
> (2)  Also a bit of a nit but I find it confusing that st32f takes the
> asm_store64() path.

why? it's storing a 64-bit value, so that's the proper place for it.
Comment 37 Rick Reitmaier 2009-11-25 12:47:23 PST
(In reply to comment #36)
> (In reply to comment #33)
> > (2)  Also a bit of a nit but I find it confusing that st32f takes the
> > asm_store64() path.
> 
> why? it's storing a 64-bit value, so that's the proper place for it.

Hmm, the comment for st32f states store 'x' as 'y'.  I guess I tripped over the 'as'.  

Maybe 'stores the quad result of converting a 64bit to 32bit floating point value.'
Comment 38 Steven Johnson 2009-11-29 10:21:48 PST
Perhaps unrelated, but the PPC backend doesn't appear to be emitted the existing ldcb and ldcs ops correctly... it does

        #if !PEDANTIC
        if (isS16(d)) {
            if (ins->isop(LIR_ldcb)) {
                LBZ(rr, d, ra);
            } else {
                LWZ(rr, d, ra);
            }
            return;
        }
        #endif

        // general case
        underrunProtect(12);
        LWZX(rr, ra, R0); // rr = [ra+R0]
        asm_li(R0,d);

which only emits ldcb is isS16(d) is true, and *never* properly emits ldcs...
Comment 39 Steven Johnson 2009-11-29 10:54:43 PST
(In reply to comment #37)

> Maybe 'stores the quad result of converting a 64bit to 32bit floating point
> value.'

It doesn't store a quad *result*... it stores a quad-sized *value* into a word-sized result. Yeah, it is confusing... you could make a case for it in either location. When I implemented it, it seemed to fit better there.
Comment 40 Steven Johnson 2009-11-29 11:02:52 PST
(In reply to comment #33)
> (5) In line Nativei386.cpp:571 you removed a 'return' ;  Was this intentional?

nope, good catch! back it goes.
 
> (6) ST8i() is the underrunProtect value large enough?

I'll doublecheck.
Comment 41 Steven Johnson 2009-11-29 11:08:12 PST
(In reply to comment #33)

> (6) ST8i() is the underrunProtect value large enough?

actually, it appears to be too large... MODRMm and MODRMs both appear to have a worst-case of 6, so ST8i, ST16i, and STi are all too large. I'll adjust.
Comment 42 Steven Johnson 2009-11-29 12:20:50 PST
Created attachment 415012 [details] [diff] [review]
Patch #4

Revised patch incorporating suggestions from Rick and Nicholas.
Comment 43 Steven Johnson 2009-11-29 12:22:22 PST
Created attachment 415013 [details] [diff] [review]
Patch #4a (Tamarin component)

Patch to Tamarin to make necessary changes to use Patch #4 above.
Comment 44 Nicholas Nethercote [:njn] 2009-11-30 18:19:46 PST
Comment on attachment 415012 [details] [diff] [review]
Patch #4

>diff -r 1c05dd6490d9 nanojit/LIRopcode.tbl
>--- a/nanojit/LIRopcode.tbl	Sat Nov 28 09:34:48 2009 -0800
>+++ b/nanojit/LIRopcode.tbl	Sun Nov 29 12:20:18 2009 -0800
>@@ -74,23 +74,23 @@
> OPDEF(start,     0, 0, Op0)     // start of a fragment

It looks like you haven't updated recently.  Unfortunately with bug 505662 I removed a field from OPDEF/OPD64, thus changing every single code line in this file.  You'll have to redo it, sorry.  If it's any consolation, this patch will force me to redo some pieces of the patch I'm working on for bug 516347 and bug 520714 :)


>+#ifndef NJ_EXPANDED_LOADSTORE_SUPPORTED
>+#  define NJ_EXPANDED_LOADSTORE_SUPPORTED 0
>+#endif

Is this even needed any more?  It looks like there aren't any functional uses of it, it just gets mentioned in NanoAssertMsg() messages.


> void
>-Assembler::asm_store32(LIns *value, int dr, LIns *base)
>+Assembler::asm_store32(LOpcode op, LIns *value, int dr, LIns *base)
> {
>+    switch (op) {
>+        case LIR_sti:
>+            // handled by mainline code below for now
>+            break;
>+        case LIR_stb:
>+        case LIR_sts:
>+            NanoAssertMsg(0, "NJ_EXPANDED_LOADSTORE_SUPPORTED not yet supported for this architecture");
>+            return;
>+        default:
>+            NanoAssertMsg(0, "asm_store32 should never receive this LIR opcode");
>+            return;
>+    }

I don't like this pattern much -- in optimised builds, if you get an unhandled or unexpected opcode it'll silently succeed but produce bogus code.  I don't expect you to change it for this patch because it's used in lots of places in Nanojit, but I wonder if there should be some kind of assert/abort that is enabled for optimised builds.


>-    void Assembler::asm_store64(LInsp value, int dr, LInsp base)
>+    void Assembler::asm_store64(LOpcode op, LInsp value, int dr, LInsp base)
>     {
>+        if (op == LIR_st32f)
>+        {
>+            Register rb;
>+            if (base->isop(LIR_alloc)) {
>+                rb = FP;
>+                dr += findMemFor(base);
>+            } else {
>+                rb = findRegFor(base, GpRegs);
>+            }

This if-then-else can be replaced with a call to getBaseReg().  There may be other cases like that in the patch but this was the only one I saw.

r=me with these nits considered.

A couple of other things:

- Can you add the new opcodes to the opcode table at the bottom of https://developer.mozilla.org/en/NanojitMerge once you've committed.

- Have you tested with Tracemonkey yet?  I'd like to see a Tracemonkey patch before the the NJ part is committed (assuming one is necessary) so that when someone (possibly me) updates Tracemonkey with this patch their job is as easy as possible.
Comment 45 Steven Johnson 2009-11-30 18:29:06 PST
(In reply to comment #44)

> It looks like you haven't updated recently.  Unfortunately with bug 505662 I
> removed a field from OPDEF/OPD64, thus changing every single code line in this
> file. 
 
No worries, I've already updated locally -- if you want to re-review the changes let me know, but it's pretty mechanical.


> >+#ifndef NJ_EXPANDED_LOADSTORE_SUPPORTED
> >+#  define NJ_EXPANDED_LOADSTORE_SUPPORTED 0
> >+#endif
> 
> Is this even needed any more?  It looks like there aren't any functional uses
> of it, it just gets mentioned in NanoAssertMsg() messages.

Yes, there's a big use of it: downstream clients examine it to know if it's safe to issue the new instructions or not.


> I don't like this pattern much -- in optimised builds, if you get an unhandled
> or unexpected opcode it'll silently succeed but produce bogus code.  I don't
> expect you to change it for this patch because it's used in lots of places in
> Nanojit, but I wonder if there should be some kind of assert/abort that is
> enabled for optimised builds.

Not that I'm aware of. IIRC there are other places where similar things happen (or at least, could have happened) before. I'm open for better suggestions, but in general, only Assembler will call these (from a giant switch) so the likelihood of incorrect arguments isn't really dependent on what clients might do.
 

> This if-then-else can be replaced with a call to getBaseReg().  There may be
> other cases like that in the patch but this was the only one I saw.

Ah, is that new? I was mimicking existing code when I wrote it. 
 
> - Can you add the new opcodes to the opcode table at the bottom of
> https://developer.mozilla.org/en/NanojitMerge once you've committed.

Yep, was waiting to commit to update.
 
> - Have you tested with Tracemonkey yet?  I'd like to see a Tracemonkey patch
> before the the NJ part is committed (assuming one is necessary) so that when
> someone (possibly me) updates Tracemonkey with this patch their job is as easy
> as possible.

Nope, not at all... Graydon had indicated to me that a TM patch was not necessary for these things (though obviously it would be a nice thing to do) so I hadn't done it.

Is there a minimal-TM-harness I can use for such testing, or do I need to pull and build all of mozilla? (Guess I better go read the nanojit-central wiki page again...)
Comment 46 Steven Johnson 2009-12-01 12:40:08 PST
OK, I have pulled TM and done some minimal testing with the patch... there is *no* change necessary to TM, thus no patch will be forthcoming. (The only source change necessary is for embedders which implement LirWriters and override insStorei(), which became insStore()... TM does not appear to do this.)

If all the relevant perf tests run acceptably then I'll be landing this today.
Comment 47 Steven Johnson 2009-12-01 16:56:14 PST
pushed to nanojit-central as changeset:   1096:7c5395e67078
Comment 48 Steven Johnson 2009-12-01 17:16:42 PST
wiki updated with new opcodes.

pushed to tamarin-redux as of changeset:   3242:8b42130cb465.

I'm closing this bug as resolved/fixed and opening new bugs to implement the new opcodes for the remaining backends:

x64: https://bugzilla.mozilla.org/show_bug.cgi?id=532240
ARM: https://bugzilla.mozilla.org/show_bug.cgi?id=532241
PPC: https://bugzilla.mozilla.org/show_bug.cgi?id=532242
Sparc: https://bugzilla.mozilla.org/show_bug.cgi?id=532243
MIPS: https://bugzilla.mozilla.org/show_bug.cgi?id=532251
Comment 49 Nicholas Nethercote [:njn] 2009-12-06 16:10:01 PST
BTW, the procedure is that we don't mark bugs as RESOLVED FIXED at least until they've been merged to both TM and TR.  Actually, on the Mozilla side we don't mark them as RESOLVED FIXED until they're merged with mozilla-central, but I'm not sure what the policy is in general for nanojit-central.
Comment 51 Steven Johnson 2009-12-06 16:48:09 PST
(In reply to comment #49)
> BTW, the procedure is that we don't mark bugs as RESOLVED FIXED at least until
> they've been merged to both TM and TR. 

Oops. Wasn't aware. Will keep this in mind in the future.
Comment 52 Steven Johnson 2009-12-23 09:57:27 PST
Pretty sure that this has been pushed to all relevant repos, shall we close it?
Comment 53 Chris Peyer 2010-01-18 16:12:49 PST
*** Bug 527085 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.