Closed Bug 504506 Opened 15 years ago Closed 14 years ago

nanojit: The Great Opcode Renaming

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(2 files, 4 obsolete files)

Attached patch patch (obsolete) — Splinter Review
Discussion between myself, edwsmith, rreitmai and sewardj led to the proposal for opcode renaming in the table at the bottom of this link:

https://developer.mozilla.org/en/NanojitMerge

This patch implements this renaming for TM.  The changes to LIRopcode.tbl were done by hand (I took the opportunity to clean up its formatting, esp. using __N instead of unusedN for unused opcodes because it's shorter), the rest were done with a script that I'll attach shortly.  Because the patch is very large, apart from LIRopcode.tbl (which should be checked carefully) a review should focus on the script more than the diff, as that'll be easier to check.
Attachment #388870 - Flags: review?(graydon)
Attached file script to do the renaming (obsolete) —
This is the script.  It's run from within js/src/.  Note that '\b' in a Perl regexp is equivalent to '\<' or '\>' in a normal regexp, ie. it marks a word boundary.  This is important so that "\bLIR_call\b" does not match "LIR_callh" for example.

trace-test.js passes.  I'll run unit tests and try server tests shortly.
Blocks: 503990
I continue to believe the suggestion of 8/16/32/64/f suffixes proposed in bug 498521 is the right one.  I think those are clearer than letters-corresponding-vaguely-to-C-types, most importantly because the C types don't actually have exact sizes.
Blocks: 504507
See also bug 527083, which wants to introduce a full set of 8/16 bit loads/stores; as part of that it would regularize some existing names. If there's consensus on something other than the existing letters-corresponding-vaguely-to-C-types approach, now would be a good time for people to speak up. (personally I find the existing approach acceptable but I'm not married to it; the numeric suffixes proposed in Comment 2 also strike me as OK.)
The current names on the wiki page are what several of us iterated to in an IRC discussion a few months ago.  They're certainly more consistent and better than what we currently have, but not necessary ideal.  (I'm sympathetic to sized names myself, eg. add32, add64, add64f.)  So (probably the largest) part of the work for this bug is agreeing on the names.
Just to point out how they would look:

means stores look like :
   st8, st16, st32, st64, st64f

loads:
  ldz8, ldx8, ldz16, ldx32, ld64f and the non-volitile versions  ldcz8, ldcx16,
ldcz32, ldcx32, ldc64f

These may look fine when seen in isolation but I'm afraid when combing through verbose output the #s will cause difficulty in quickly parsing the meaning of the op.

While using a single character for sizes does have a disadvantage in that its somewhat cryptic, it does make the verbose output look somewhat cleaner in my eyes.

An idea extracted from the bug comment Steve referred to earlier:

...it would be nice to agree on *unique* characters for our naming.  Right now we have s=signed and s=short and 's'  also appears in *st*ore; all very confusing.   

Something like:
   'c' = cse-able (non-vollatile)
   'x' = for sign-extension
   'z' = zero
   'b' = byte (8bit)
   'w' = word (16bit)
   'i' = integer (32bit)
   'q' = quad (64bit)
   'f' = 32bit floating point value (float)
   'd' = 64bit floating point value (double)

means stores look like :
   stb, stw, sti, stq, stf

loads:
  ldzb, ldxb, ldzw, ldxw, ldf and the non-volitile versions  ldczb, ldcxb,
ldczw, ldcxw, ldcf


Which ever we go with, I'd recommend posting a patch and then each of us have a look at -Dverbose output for readability.
No longer blocks: 503990
This is just an idea, but the great opcode renaming might be easier to do sooner, if we introduce a different prefix than LIR_.  perhaps even, "lir_" which is easier on the fingers.  (lir_ld, etc).  at any time we can introduce the new canonical names and make LIR_xxx alias to them, then in a series of patches over time, rip out all the aliases (except maybe pointer aliases using the new naming conventions).
We could do that, but I think with the script the mechanics of the change won't be too difficult.  The hard part will be deciding exactly what the new names are!
(In reply to comment #7)
> We could do that, but I think with the script the mechanics of the change won't
> be too difficult.  The hard part will be deciding exactly what the new names
> are!

Yeah, I tend to agree with Nick here... sticking with the constants-are-initial-cap convention is IMHO much better, and the renaming isn't likely to be the hard part.
As part of this, we might as well reorder the opcodes to be in a more sensible order, eg. put all the loads together (and many more like it).

This will be possible once bug 542932 is finished, which will get rid of almost all of the remaining constraints on opcode ordering (and the remaining ones are easy to work with, eg. int32 comparisons must all be together).
Depends on: 542932
Attached file proposal
It's time to move this along.  I want to avoid bike-shedding this bug to death, so I've attached a detailed proposal which includes rationales for all the choices.  Ed's seen it and is happy with it.

Note that the proposal suggests doing the change in steps, which will mandate filing a bunch of smaller bugs that block this one.  If no-one complains about the proposal over the next few days I'll start doing just that.
Overall: good.  Small suggestion:

> Recommendation:  Use the following type indicators:
> - 'b': 8-bit integer.
> - 's': 16-bit integer.
> - 'i': 32-bit integer.
> - 'q': 64-bit integer.
> - 'd': 64-bit float (double).
> - 'f': 32-bit float.
> - 'u': prefix for unsigned integers, when it matters.

Rather than have to remember yet another set of mappings from
letters to types, how about we simply use the Intel scheme?

  b/w/l/q    8/16/32/64-bit integral
  s/d        32/64-bit floating
(In reply to comment #12)
> 
> Rather than have to remember yet another set of mappings from
> letters to types, how about we simply use the Intel scheme?
> 
>   b/w/l/q    8/16/32/64-bit integral
>   s/d        32/64-bit floating

Presumably we'd have to keep using 'u' as the prefix for unsigned, eg. 'ul'.

This suggestion definitely has merit.  I particularly like the single/double distinction.  My main dislike is Intel's use of "word" for a 16-bit value.  I realise "word" has had many meanings, but in my mind it always means "pointer-sized value".  We currently use it that way for insImmWord().  We'd have to avoid that usage, and just for two measly LIR opcodes (the 16-bit loads).  Then again, we mostly use pointer/'p' for pointer-sized things, so we could remove insImmWord() easily.  We also use "word" in lots of comments in LIR.h for a pointer-sized value;  this might be confusing.

But maybe we could write in a comment "when talking about type-indicators 'w' means 16-bits, but when used in more general contexts it means 'pointer-sized value'".  In which case I'd be happy with the Intel suffixes.  Ed, what do you think?
+1, sticking to a defacto standard is compelling.  except for 'q', the current letters match Java/C# type names, but i'm happier with 'w' for int16 than i would be with 'l' for int64, and happier following the intel defacto standard 100% than a mutant variant of jvm/clr type suffixes. 

We end up using C99 types a lot in code, so "intptr_t" and "uintptr_t" are common, so I would be okay with relegating "word" to mean "16bits" in our assembly type nomenclature, and stick to 'ptr' when we mean machine-sized.  Maybe an overloaded insImmPtr() helper for convenience (void* and uintptr_t).
Depends on: 555633
Blocks: 557880
No longer blocks: 504507
Depends on: 504507
Depends on: 557887
Attached file (v2) script to do the renaming (obsolete) —
Updated to the final names, added some missing names, and removed #xx opcode comments, since the opcodes were renumbered prior to this.
Attachment #388871 - Attachment is obsolete: true
Thanks for updating the script.  The plan's still to do the renaming in stages -- a few (or few dozen) opcodes at a time -- right?
I was debating with myself whether to do it a few opcodes at a time or a few files at a time, but agree, stages is the goal.
(In reply to comment #17)
> I was debating with myself whether to do it a few opcodes at a time or a few
> files at a time, but agree, stages is the goal.

Files is probably better because then we don't need multiple patches (NJ, TM, TR) per change.  Good idea!
in a comment on bug 557880, Rick wrote:
> Seems like we need an updated proposal prior to landing these patches, no? 
> There still seems to be some room for interpretation of the rules outlined in
> the comments.  
>
> For example, the load operators mostly look like conversions in this patch,
> i.e. 'x2y', whereas the store versions do not.
> 
> loading a byte is ldb2l where store is stb?  shouldn't it be stl2b ?

It would have been good to get more feedback earlier, but it might not be too late.  in fact there is std2s (store double to single).  

Youre proposing:
stb => stl2b
stw => stl2w

and that's all?  I'm somewhat ambivalent because I'm eager to get on with removing the legacy aliases and having this behind us, but it is slightly more consistent.

Let this serve as last call for name changes!
(In reply to comment #19)
> 
> Youre proposing:
> stb => stl2b
> stw => stl2w

Sounds good to me -- good catch, Rick!
Attached file (v3) script to do the renaming (obsolete) —
Updated sts & stb
Attachment #437644 - Attachment is obsolete: true
Depends on: 559968
Depends on: 559969
Depends on: 559971
Depends on: 559972
Depends on: 559973
Depends on: 559974
Depends on: 559975
Depends on: 559977
Depends on: 560160
Attachment #388870 - Attachment is obsolete: true
Attachment #388870 - Flags: review?(graydon)
Gal dislikes 'L' for int32 and I'm coming around to that view as well.  I feel like we ended up with bad suffixes for the common cases (int32 most of all) in order to get easily explainable names for the uncommon cases.  (Taking cues from Intel w.r.t. instruction set design seems like a bad idea in hindsight, too!)

I realise this is late to be saying this :(
(In reply to comment #22)
> Gal dislikes 'L' for int32 and I'm coming around to that view as well.

To make that more constructive:  if we changed 'L' to 'I' and left everything else alone I think that would be an improvement.  (The 'W' for 16-bit values still bugs me, given the usual modern meaning of "word", but I can't think of a better suffix.)
Another possibility would be more (but not entirely) C-ish:  b/s/i/q for integers, f/d for FP.  That fixes the 'w' problem.
Its not too late to contain this, better to get agreement.  I have to commute but i'll post again.  what seems to be at issue is whether we use an existing convention as-is, or tweaking one.

existing conventions:  (i need to do more research to fill all in)
C99: c,s,i,l,f,d  or int8,int16,int32,int64,float32,float64
JVM: b,s,i,l,f,d
CLR:
LLVM: i8,i16,i32,i64,f32,f64
Intel: b,w,l,q,s,d
ARM:
PPC:
others?
I still advocate a single letter suffix.  As the attached proposal says:  it follows Huffman encoding (common names should be short), and allows the x2y form for conversions.  (Otherwise we'd need i32tof32 instead of i2d, for example.)

As for which single letter suffix... the Intel ones are looking archaic.  C99 or JVM would be fine with me.  We'll probably have a boolean/condition type too, that'll be either 'b' (boolean) or 'c' (condition).  With that in mind I lean towards the C99 ones, because boolean (if added) will be used a lot more than int8, and 'b' is more obvious for boolean than 'c'.

Nb:  the int32, int64, float64 ones are by far the most frequent and thus important (boolean, too, if it's added).  int8, int16 and float32 only occur in memory.  Using the Intel suffixes was easy to explain, but it hurt those common cases (most specifically int32).
(In reply to comment #26)
> (Otherwise we'd need i32tof32 instead of i2d, for example.)

That should be "i32tof64".
Please lets use anything BUT Intel. Intel's naming schemes are and always have been insane (sorry Moh). I like JVM. C99 sounds good to me too. LLVM is a bit verbose, but I could live with it.
Ed and I just discussed this on IRC.  We now both think that C-ish names are best:

  'c' == char  == int8_t
  's' == short == int16_t
  'i' == int   == int32_t
  'l' == long long == int64_t  (not quite, but close enough)

  'f' == float == float32
  'd' == double == float64

Also:

  'p' == "pointer" == synonym for 'i' or 'l', depending on word-size
  'ui' == unsigned int == uint32_t
  'ul' == unsigned long long == uint64_t

Possible in the future (see bug 559265)

  'b' == bool

Any last objections before we redo this?

(BTW, longer names ("i32") aren't going to happen, sorry Waldo.)

(And apologies for the intel-based scheme, it seemed like a good idea at the time.)
Sounds good. Makes sense to use nanojit's implementation language's naming convention. r=me
I would rather "q" instead of "l" (since there is only one "l", and "long" is ambiguous - also the character looks like a numeral 1). Other than that, I like it.
q? You have been hanging around with x86 hack3rs too much :)
Maybe :) but "l" isn't C for "long long".
l is for Java long to put the "Java" in "JS" :-P.

Glad to see this new consensus (I hope). I knew gal was gonna say something so I held fire. Thanks to njn for bringing the topic up.

/be
I heartily endorse Nicolas' and Ed's proposal in comment #29.
I actually do kind of think "q" is better (more readable and less ambiguous with "1", if lacking in much mnemonic value) than "l".  (Pardon me whilst I retch at saying that.)  But this can work regardless.
I too have grown accustomed to 'q' and so would be happy with 'q' or 'l'.  Not that that helps us make a final decision...
l as in long. Stick to the plan =)
I echo Jeff's vote: I prefer "q" but could live with "l"...
After IRCing with Edwin, Brendan and Andreas, we decided on 'q':  it's less coherent (ie. strays further from JVM/C99) but for day-to-day use it is nicer, because it's easier to read than 'l'.
This version also fixes the intel names
Attachment #439541 - Attachment is obsolete: true
Blocks: 526914
Depends on: 563590
Blocks: 537926
Depends on: 564941
Depends on: 565257
AFAICT TraceMonkey is now ready for the old opcodes to be removed -- all the relevant TM-only files have been updated, and all the NJ changes have been merged into TM.
Depends on: 566759
All blocking bugs have been fixed!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: