Closed
Bug 504506
Opened 16 years ago
Closed 15 years ago
nanojit: The Great Opcode Renaming
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(2 files, 4 obsolete files)
Discussion between myself, edwsmith, rreitmai and sewardj led to the proposal for opcode renaming in the table at the bottom of this link:
https://developer.mozilla.org/en/NanojitMerge
This patch implements this renaming for TM. The changes to LIRopcode.tbl were done by hand (I took the opportunity to clean up its formatting, esp. using __N instead of unusedN for unused opcodes because it's shorter), the rest were done with a script that I'll attach shortly. Because the patch is very large, apart from LIRopcode.tbl (which should be checked carefully) a review should focus on the script more than the diff, as that'll be easier to check.
Attachment #388870 -
Flags: review?(graydon)
![]() |
Assignee | |
Comment 1•16 years ago
|
||
This is the script. It's run from within js/src/. Note that '\b' in a Perl regexp is equivalent to '\<' or '\>' in a normal regexp, ie. it marks a word boundary. This is important so that "\bLIR_call\b" does not match "LIR_callh" for example.
trace-test.js passes. I'll run unit tests and try server tests shortly.
Comment 2•16 years ago
|
||
I continue to believe the suggestion of 8/16/32/64/f suffixes proposed in bug 498521 is the right one. I think those are clearer than letters-corresponding-vaguely-to-C-types, most importantly because the C types don't actually have exact sizes.
Comment 3•15 years ago
|
||
See also bug 527083, which wants to introduce a full set of 8/16 bit loads/stores; as part of that it would regularize some existing names. If there's consensus on something other than the existing letters-corresponding-vaguely-to-C-types approach, now would be a good time for people to speak up. (personally I find the existing approach acceptable but I'm not married to it; the numeric suffixes proposed in Comment 2 also strike me as OK.)
![]() |
Assignee | |
Comment 4•15 years ago
|
||
The current names on the wiki page are what several of us iterated to in an IRC discussion a few months ago. They're certainly more consistent and better than what we currently have, but not necessary ideal. (I'm sympathetic to sized names myself, eg. add32, add64, add64f.) So (probably the largest) part of the work for this bug is agreeing on the names.
Comment 5•15 years ago
|
||
Just to point out how they would look:
means stores look like :
st8, st16, st32, st64, st64f
loads:
ldz8, ldx8, ldz16, ldx32, ld64f and the non-volitile versions ldcz8, ldcx16,
ldcz32, ldcx32, ldc64f
These may look fine when seen in isolation but I'm afraid when combing through verbose output the #s will cause difficulty in quickly parsing the meaning of the op.
While using a single character for sizes does have a disadvantage in that its somewhat cryptic, it does make the verbose output look somewhat cleaner in my eyes.
An idea extracted from the bug comment Steve referred to earlier:
...it would be nice to agree on *unique* characters for our naming. Right now we have s=signed and s=short and 's' also appears in *st*ore; all very confusing.
Something like:
'c' = cse-able (non-vollatile)
'x' = for sign-extension
'z' = zero
'b' = byte (8bit)
'w' = word (16bit)
'i' = integer (32bit)
'q' = quad (64bit)
'f' = 32bit floating point value (float)
'd' = 64bit floating point value (double)
means stores look like :
stb, stw, sti, stq, stf
loads:
ldzb, ldxb, ldzw, ldxw, ldf and the non-volitile versions ldczb, ldcxb,
ldczw, ldcxw, ldcf
Which ever we go with, I'd recommend posting a patch and then each of us have a look at -Dverbose output for readability.
Comment 6•15 years ago
|
||
This is just an idea, but the great opcode renaming might be easier to do sooner, if we introduce a different prefix than LIR_. perhaps even, "lir_" which is easier on the fingers. (lir_ld, etc). at any time we can introduce the new canonical names and make LIR_xxx alias to them, then in a series of patches over time, rip out all the aliases (except maybe pointer aliases using the new naming conventions).
![]() |
Assignee | |
Comment 7•15 years ago
|
||
We could do that, but I think with the script the mechanics of the change won't be too difficult. The hard part will be deciding exactly what the new names are!
Comment 8•15 years ago
|
||
(In reply to comment #7)
> We could do that, but I think with the script the mechanics of the change won't
> be too difficult. The hard part will be deciding exactly what the new names
> are!
Yeah, I tend to agree with Nick here... sticking with the constants-are-initial-cap convention is IMHO much better, and the renaming isn't likely to be the hard part.
![]() |
Assignee | |
Comment 9•15 years ago
|
||
As part of this, we might as well reorder the opcodes to be in a more sensible order, eg. put all the loads together (and many more like it).
This will be possible once bug 542932 is finished, which will get rid of almost all of the remaining constraints on opcode ordering (and the remaining ones are easy to work with, eg. int32 comparisons must all be together).
Depends on: 542932
![]() |
Assignee | |
Comment 11•15 years ago
|
||
It's time to move this along. I want to avoid bike-shedding this bug to death, so I've attached a detailed proposal which includes rationales for all the choices. Ed's seen it and is happy with it.
Note that the proposal suggests doing the change in steps, which will mandate filing a bunch of smaller bugs that block this one. If no-one complains about the proposal over the next few days I'll start doing just that.
Comment 12•15 years ago
|
||
Overall: good. Small suggestion:
> Recommendation: Use the following type indicators:
> - 'b': 8-bit integer.
> - 's': 16-bit integer.
> - 'i': 32-bit integer.
> - 'q': 64-bit integer.
> - 'd': 64-bit float (double).
> - 'f': 32-bit float.
> - 'u': prefix for unsigned integers, when it matters.
Rather than have to remember yet another set of mappings from
letters to types, how about we simply use the Intel scheme?
b/w/l/q 8/16/32/64-bit integral
s/d 32/64-bit floating
![]() |
Assignee | |
Comment 13•15 years ago
|
||
(In reply to comment #12)
>
> Rather than have to remember yet another set of mappings from
> letters to types, how about we simply use the Intel scheme?
>
> b/w/l/q 8/16/32/64-bit integral
> s/d 32/64-bit floating
Presumably we'd have to keep using 'u' as the prefix for unsigned, eg. 'ul'.
This suggestion definitely has merit. I particularly like the single/double distinction. My main dislike is Intel's use of "word" for a 16-bit value. I realise "word" has had many meanings, but in my mind it always means "pointer-sized value". We currently use it that way for insImmWord(). We'd have to avoid that usage, and just for two measly LIR opcodes (the 16-bit loads). Then again, we mostly use pointer/'p' for pointer-sized things, so we could remove insImmWord() easily. We also use "word" in lots of comments in LIR.h for a pointer-sized value; this might be confusing.
But maybe we could write in a comment "when talking about type-indicators 'w' means 16-bits, but when used in more general contexts it means 'pointer-sized value'". In which case I'd be happy with the Intel suffixes. Ed, what do you think?
Comment 14•15 years ago
|
||
+1, sticking to a defacto standard is compelling. except for 'q', the current letters match Java/C# type names, but i'm happier with 'w' for int16 than i would be with 'l' for int64, and happier following the intel defacto standard 100% than a mutant variant of jvm/clr type suffixes.
We end up using C99 types a lot in code, so "intptr_t" and "uintptr_t" are common, so I would be okay with relegating "word" to mean "16bits" in our assembly type nomenclature, and stick to 'ptr' when we mean machine-sized. Maybe an overloaded insImmPtr() helper for convenience (void* and uintptr_t).
Updated•15 years ago
|
Comment 15•15 years ago
|
||
Updated to the final names, added some missing names, and removed #xx opcode comments, since the opcodes were renumbered prior to this.
Attachment #388871 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 16•15 years ago
|
||
Thanks for updating the script. The plan's still to do the renaming in stages -- a few (or few dozen) opcodes at a time -- right?
Comment 17•15 years ago
|
||
I was debating with myself whether to do it a few opcodes at a time or a few files at a time, but agree, stages is the goal.
![]() |
Assignee | |
Comment 18•15 years ago
|
||
(In reply to comment #17)
> I was debating with myself whether to do it a few opcodes at a time or a few
> files at a time, but agree, stages is the goal.
Files is probably better because then we don't need multiple patches (NJ, TM, TR) per change. Good idea!
Comment 19•15 years ago
|
||
in a comment on bug 557880, Rick wrote:
> Seems like we need an updated proposal prior to landing these patches, no?
> There still seems to be some room for interpretation of the rules outlined in
> the comments.
>
> For example, the load operators mostly look like conversions in this patch,
> i.e. 'x2y', whereas the store versions do not.
>
> loading a byte is ldb2l where store is stb? shouldn't it be stl2b ?
It would have been good to get more feedback earlier, but it might not be too late. in fact there is std2s (store double to single).
Youre proposing:
stb => stl2b
stw => stl2w
and that's all? I'm somewhat ambivalent because I'm eager to get on with removing the legacy aliases and having this behind us, but it is slightly more consistent.
Let this serve as last call for name changes!
![]() |
Assignee | |
Comment 20•15 years ago
|
||
(In reply to comment #19)
>
> Youre proposing:
> stb => stl2b
> stw => stl2w
Sounds good to me -- good catch, Rick!
![]() |
Assignee | |
Updated•15 years ago
|
Attachment #388870 -
Attachment is obsolete: true
Attachment #388870 -
Flags: review?(graydon)
![]() |
Assignee | |
Comment 22•15 years ago
|
||
Gal dislikes 'L' for int32 and I'm coming around to that view as well. I feel like we ended up with bad suffixes for the common cases (int32 most of all) in order to get easily explainable names for the uncommon cases. (Taking cues from Intel w.r.t. instruction set design seems like a bad idea in hindsight, too!)
I realise this is late to be saying this :(
![]() |
Assignee | |
Comment 23•15 years ago
|
||
(In reply to comment #22)
> Gal dislikes 'L' for int32 and I'm coming around to that view as well.
To make that more constructive: if we changed 'L' to 'I' and left everything else alone I think that would be an improvement. (The 'W' for 16-bit values still bugs me, given the usual modern meaning of "word", but I can't think of a better suffix.)
![]() |
Assignee | |
Comment 24•15 years ago
|
||
Another possibility would be more (but not entirely) C-ish: b/s/i/q for integers, f/d for FP. That fixes the 'w' problem.
Comment 25•15 years ago
|
||
Its not too late to contain this, better to get agreement. I have to commute but i'll post again. what seems to be at issue is whether we use an existing convention as-is, or tweaking one.
existing conventions: (i need to do more research to fill all in)
C99: c,s,i,l,f,d or int8,int16,int32,int64,float32,float64
JVM: b,s,i,l,f,d
CLR:
LLVM: i8,i16,i32,i64,f32,f64
Intel: b,w,l,q,s,d
ARM:
PPC:
others?
![]() |
Assignee | |
Comment 26•15 years ago
|
||
I still advocate a single letter suffix. As the attached proposal says: it follows Huffman encoding (common names should be short), and allows the x2y form for conversions. (Otherwise we'd need i32tof32 instead of i2d, for example.)
As for which single letter suffix... the Intel ones are looking archaic. C99 or JVM would be fine with me. We'll probably have a boolean/condition type too, that'll be either 'b' (boolean) or 'c' (condition). With that in mind I lean towards the C99 ones, because boolean (if added) will be used a lot more than int8, and 'b' is more obvious for boolean than 'c'.
Nb: the int32, int64, float64 ones are by far the most frequent and thus important (boolean, too, if it's added). int8, int16 and float32 only occur in memory. Using the Intel suffixes was easy to explain, but it hurt those common cases (most specifically int32).
![]() |
Assignee | |
Comment 27•15 years ago
|
||
(In reply to comment #26)
> (Otherwise we'd need i32tof32 instead of i2d, for example.)
That should be "i32tof64".
Comment 28•15 years ago
|
||
Please lets use anything BUT Intel. Intel's naming schemes are and always have been insane (sorry Moh). I like JVM. C99 sounds good to me too. LLVM is a bit verbose, but I could live with it.
![]() |
Assignee | |
Comment 29•15 years ago
|
||
Ed and I just discussed this on IRC. We now both think that C-ish names are best:
'c' == char == int8_t
's' == short == int16_t
'i' == int == int32_t
'l' == long long == int64_t (not quite, but close enough)
'f' == float == float32
'd' == double == float64
Also:
'p' == "pointer" == synonym for 'i' or 'l', depending on word-size
'ui' == unsigned int == uint32_t
'ul' == unsigned long long == uint64_t
Possible in the future (see bug 559265)
'b' == bool
Any last objections before we redo this?
(BTW, longer names ("i32") aren't going to happen, sorry Waldo.)
(And apologies for the intel-based scheme, it seemed like a good idea at the time.)
Comment 30•15 years ago
|
||
Sounds good. Makes sense to use nanojit's implementation language's naming convention. r=me
I would rather "q" instead of "l" (since there is only one "l", and "long" is ambiguous - also the character looks like a numeral 1). Other than that, I like it.
Comment 32•15 years ago
|
||
q? You have been hanging around with x86 hack3rs too much :)
Maybe :) but "l" isn't C for "long long".
Comment 34•15 years ago
|
||
l is for Java long to put the "Java" in "JS" :-P.
Glad to see this new consensus (I hope). I knew gal was gonna say something so I held fire. Thanks to njn for bringing the topic up.
/be
Comment 35•15 years ago
|
||
I heartily endorse Nicolas' and Ed's proposal in comment #29.
Comment 36•15 years ago
|
||
I actually do kind of think "q" is better (more readable and less ambiguous with "1", if lacking in much mnemonic value) than "l". (Pardon me whilst I retch at saying that.) But this can work regardless.
![]() |
Assignee | |
Comment 37•15 years ago
|
||
I too have grown accustomed to 'q' and so would be happy with 'q' or 'l'. Not that that helps us make a final decision...
Comment 38•15 years ago
|
||
l as in long. Stick to the plan =)
Comment 39•15 years ago
|
||
I echo Jeff's vote: I prefer "q" but could live with "l"...
![]() |
Assignee | |
Comment 40•15 years ago
|
||
After IRCing with Edwin, Brendan and Andreas, we decided on 'q': it's less coherent (ie. strays further from JVM/C99) but for day-to-day use it is nicer, because it's easier to read than 'l'.
Comment 41•15 years ago
|
||
This version also fixes the intel names
Attachment #439541 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 42•15 years ago
|
||
AFAICT TraceMonkey is now ready for the old opcodes to be removed -- all the relevant TM-only files have been updated, and all the NJ changes have been merged into TM.
![]() |
Assignee | |
Comment 43•15 years ago
|
||
All blocking bugs have been fixed!
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•