Closed Bug 504506 Opened 16 years ago Closed 15 years ago

nanojit: The Great Opcode Renaming

Categories

(Core :: JavaScript Engine, defect)

Product:

Component:

Platform:

x86

Linux

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(2 files, 4 obsolete files)

patch 16 years ago Nicholas Nethercote [inactive] 243.80 KB, patch		Details \| Diff \| Splinter Review
script to do the renaming 16 years ago Nicholas Nethercote [inactive] 2.51 KB, text/plain		Details
proposal 15 years ago Nicholas Nethercote [inactive] 7.75 KB, text/plain		Details
(v2) script to do the renaming 15 years ago Edwin Smith 3.05 KB, text/x-perl-script		Details
(v3) script to do the renaming 15 years ago Edwin Smith 3.07 KB, text/x-perl-script		Details
(v4) script to do the renaming 15 years ago Edwin Smith 4.14 KB, text/x-perl-script		Details

Nicholas Nethercote [inactive]

Assignee

Description

•

16 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Discussion between myself, edwsmith, rreitmai and sewardj led to the proposal for opcode renaming in the table at the bottom of this link: https://developer.mozilla.org/en/NanojitMerge This patch implements this renaming for TM. The changes to LIRopcode.tbl were done by hand (I took the opportunity to clean up its formatting, esp. using __N instead of unusedN for unused opcodes because it's shorter), the rest were done with a script that I'll attach shortly. Because the patch is very large, apart from LIRopcode.tbl (which should be checked carefully) a review should focus on the script more than the diff, as that'll be easier to check.

Attachment #388870 - Flags: review?(graydon)

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

16 years ago

Attached file script to do the renaming (obsolete) — Details

This is the script. It's run from within js/src/. Note that '\b' in a Perl regexp is equivalent to '\<' or '\>' in a normal regexp, ie. it marks a word boundary. This is important so that "\bLIR_call\b" does not match "LIR_callh" for example. trace-test.js passes. I'll run unit tests and try server tests shortly.

Nicholas Nethercote [inactive]

Assignee

Updated

•

16 years ago

Blocks: 503990

Jeff Walden [:Waldo]

Comment 2

•

16 years ago

I continue to believe the suggestion of 8/16/32/64/f suffixes proposed in bug 498521 is the right one. I think those are clearer than letters-corresponding-vaguely-to-C-types, most importantly because the C types don't actually have exact sizes.

Nicholas Nethercote [inactive]

Assignee

Updated

•

16 years ago

Blocks: 504507

Comment 3

•

15 years ago

See also bug 527083, which wants to introduce a full set of 8/16 bit loads/stores; as part of that it would regularize some existing names. If there's consensus on something other than the existing letters-corresponding-vaguely-to-C-types approach, now would be a good time for people to speak up. (personally I find the existing approach acceptable but I'm not married to it; the numeric suffixes proposed in Comment 2 also strike me as OK.)

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

15 years ago

The current names on the wiki page are what several of us iterated to in an IRC discussion a few months ago. They're certainly more consistent and better than what we currently have, but not necessary ideal. (I'm sympathetic to sized names myself, eg. add32, add64, add64f.) So (probably the largest) part of the work for this bug is agreeing on the names.

Comment 5

•

15 years ago

Just to point out how they would look: means stores look like : st8, st16, st32, st64, st64f loads: ldz8, ldx8, ldz16, ldx32, ld64f and the non-volitile versions ldcz8, ldcx16, ldcz32, ldcx32, ldc64f These may look fine when seen in isolation but I'm afraid when combing through verbose output the #s will cause difficulty in quickly parsing the meaning of the op. While using a single character for sizes does have a disadvantage in that its somewhat cryptic, it does make the verbose output look somewhat cleaner in my eyes. An idea extracted from the bug comment Steve referred to earlier: ...it would be nice to agree on *unique* characters for our naming. Right now we have s=signed and s=short and 's' also appears in *st*ore; all very confusing. Something like: 'c' = cse-able (non-vollatile) 'x' = for sign-extension 'z' = zero 'b' = byte (8bit) 'w' = word (16bit) 'i' = integer (32bit) 'q' = quad (64bit) 'f' = 32bit floating point value (float) 'd' = 64bit floating point value (double) means stores look like : stb, stw, sti, stq, stf loads: ldzb, ldxb, ldzw, ldxw, ldf and the non-volitile versions ldczb, ldcxb, ldczw, ldcxw, ldcf Which ever we go with, I'd recommend posting a patch and then each of us have a look at -Dverbose output for readability.

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

No longer blocks: 503990

Comment 6

•

15 years ago

This is just an idea, but the great opcode renaming might be easier to do sooner, if we introduce a different prefix than LIR_. perhaps even, "lir_" which is easier on the fingers. (lir_ld, etc). at any time we can introduce the new canonical names and make LIR_xxx alias to them, then in a series of patches over time, rip out all the aliases (except maybe pointer aliases using the new naming conventions).

Nicholas Nethercote [inactive]

Assignee

Comment 7

•

15 years ago

We could do that, but I think with the script the mechanics of the change won't be too difficult. The hard part will be deciding exactly what the new names are!

Comment 8

•

15 years ago

(In reply to comment #7) > We could do that, but I think with the script the mechanics of the change won't > be too difficult. The hard part will be deciding exactly what the new names > are! Yeah, I tend to agree with Nick here... sticking with the constants-are-initial-cap convention is IMHO much better, and the renaming isn't likely to be the hard part.

Nicholas Nethercote [inactive]

Assignee

Comment 9

•

15 years ago

As part of this, we might as well reorder the opcodes to be in a more sensible order, eg. put all the loads together (and many more like it). This will be possible once bug 542932 is finished, which will get rid of almost all of the remaining constraints on opcode ordering (and the remaining ones are easy to work with, eg. int32 comparisons must all be together).

Depends on: 542932

Nicholas Nethercote [inactive]

Assignee

Comment 11

•

15 years ago

Attached file proposal — Details

It's time to move this along. I want to avoid bike-shedding this bug to death, so I've attached a detailed proposal which includes rationales for all the choices. Ed's seen it and is happy with it. Note that the proposal suggests doing the change in steps, which will mandate filing a bunch of smaller bugs that block this one. If no-one complains about the proposal over the next few days I'll start doing just that.

Julian Seward [:jseward]

Comment 12

•

15 years ago

Overall: good. Small suggestion: > Recommendation: Use the following type indicators: > - 'b': 8-bit integer. > - 's': 16-bit integer. > - 'i': 32-bit integer. > - 'q': 64-bit integer. > - 'd': 64-bit float (double). > - 'f': 32-bit float. > - 'u': prefix for unsigned integers, when it matters. Rather than have to remember yet another set of mappings from letters to types, how about we simply use the Intel scheme? b/w/l/q 8/16/32/64-bit integral s/d 32/64-bit floating

Nicholas Nethercote [inactive]

Assignee

Comment 13

•

15 years ago

(In reply to comment #12) > > Rather than have to remember yet another set of mappings from > letters to types, how about we simply use the Intel scheme? > > b/w/l/q 8/16/32/64-bit integral > s/d 32/64-bit floating Presumably we'd have to keep using 'u' as the prefix for unsigned, eg. 'ul'. This suggestion definitely has merit. I particularly like the single/double distinction. My main dislike is Intel's use of "word" for a 16-bit value. I realise "word" has had many meanings, but in my mind it always means "pointer-sized value". We currently use it that way for insImmWord(). We'd have to avoid that usage, and just for two measly LIR opcodes (the 16-bit loads). Then again, we mostly use pointer/'p' for pointer-sized things, so we could remove insImmWord() easily. We also use "word" in lots of comments in LIR.h for a pointer-sized value; this might be confusing. But maybe we could write in a comment "when talking about type-indicators 'w' means 16-bits, but when used in more general contexts it means 'pointer-sized value'". In which case I'd be happy with the Intel suffixes. Ed, what do you think?

Comment 14

•

15 years ago

+1, sticking to a defacto standard is compelling. except for 'q', the current letters match Java/C# type names, but i'm happier with 'w' for int16 than i would be with 'l' for int64, and happier following the intel defacto standard 100% than a mutant variant of jvm/clr type suffixes. We end up using C99 types a lot in code, so "intptr_t" and "uintptr_t" are common, so I would be okay with relegating "word" to mean "16bits" in our assembly type nomenclature, and stick to 'ptr' when we mean machine-sized. Maybe an overloaded insImmPtr() helper for convenience (void* and uintptr_t).

Updated

•

15 years ago

Depends on: 555633

Updated

•

15 years ago

Blocks: 557880

Updated

•

15 years ago

No longer blocks: 504507

Depends on: 504507

Updated

•

15 years ago

Depends on: 557887

Comment 15

•

15 years ago

Attached file (v2) script to do the renaming (obsolete) — Details

Updated to the final names, added some missing names, and removed #xx opcode comments, since the opcodes were renumbered prior to this.

Attachment #388871 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Assignee

Comment 16

•

15 years ago

Thanks for updating the script. The plan's still to do the renaming in stages -- a few (or few dozen) opcodes at a time -- right?

Comment 17

•

15 years ago

I was debating with myself whether to do it a few opcodes at a time or a few files at a time, but agree, stages is the goal.

Nicholas Nethercote [inactive]

Assignee

Comment 18

•

15 years ago

(In reply to comment #17) > I was debating with myself whether to do it a few opcodes at a time or a few > files at a time, but agree, stages is the goal. Files is probably better because then we don't need multiple patches (NJ, TM, TR) per change. Good idea!

Comment 19

•

15 years ago

in a comment on bug 557880, Rick wrote: > Seems like we need an updated proposal prior to landing these patches, no? > There still seems to be some room for interpretation of the rules outlined in > the comments. > > For example, the load operators mostly look like conversions in this patch, > i.e. 'x2y', whereas the store versions do not. > > loading a byte is ldb2l where store is stb? shouldn't it be stl2b ? It would have been good to get more feedback earlier, but it might not be too late. in fact there is std2s (store double to single). Youre proposing: stb => stl2b stw => stl2w and that's all? I'm somewhat ambivalent because I'm eager to get on with removing the legacy aliases and having this behind us, but it is slightly more consistent. Let this serve as last call for name changes!

Nicholas Nethercote [inactive]

Assignee

Comment 20

•

15 years ago

(In reply to comment #19) > > Youre proposing: > stb => stl2b > stw => stl2w Sounds good to me -- good catch, Rick!

Comment 21

•

15 years ago

Attached file (v3) script to do the renaming (obsolete) — Details

Updated sts & stb

Attachment #437644 - Attachment is obsolete: true

Updated

•

15 years ago

Depends on: 559968

Updated

•

15 years ago

Depends on: 559969

Updated

•

15 years ago

Depends on: 559971

Updated

•

15 years ago

Depends on: 559972

Updated

•

15 years ago

Depends on: 559973

Updated

•

15 years ago

Depends on: 559974

Updated

•

15 years ago

Depends on: 559975

Updated

•

15 years ago

Depends on: 559977

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Depends on: 560160

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Attachment #388870 - Attachment is obsolete: true

Attachment #388870 - Flags: review?(graydon)

Nicholas Nethercote [inactive]

Assignee

Comment 22

•

15 years ago

Gal dislikes 'L' for int32 and I'm coming around to that view as well. I feel like we ended up with bad suffixes for the common cases (int32 most of all) in order to get easily explainable names for the uncommon cases. (Taking cues from Intel w.r.t. instruction set design seems like a bad idea in hindsight, too!) I realise this is late to be saying this :(

Nicholas Nethercote [inactive]

Assignee

Comment 23

•

15 years ago

(In reply to comment #22) > Gal dislikes 'L' for int32 and I'm coming around to that view as well. To make that more constructive: if we changed 'L' to 'I' and left everything else alone I think that would be an improvement. (The 'W' for 16-bit values still bugs me, given the usual modern meaning of "word", but I can't think of a better suffix.)

Nicholas Nethercote [inactive]

Assignee

Comment 24

•

15 years ago

Another possibility would be more (but not entirely) C-ish: b/s/i/q for integers, f/d for FP. That fixes the 'w' problem.

Comment 25

•

15 years ago

Its not too late to contain this, better to get agreement. I have to commute but i'll post again. what seems to be at issue is whether we use an existing convention as-is, or tweaking one. existing conventions: (i need to do more research to fill all in) C99: c,s,i,l,f,d or int8,int16,int32,int64,float32,float64 JVM: b,s,i,l,f,d CLR: LLVM: i8,i16,i32,i64,f32,f64 Intel: b,w,l,q,s,d ARM: PPC: others?

Nicholas Nethercote [inactive]

Assignee

Comment 26

•

15 years ago

I still advocate a single letter suffix. As the attached proposal says: it follows Huffman encoding (common names should be short), and allows the x2y form for conversions. (Otherwise we'd need i32tof32 instead of i2d, for example.) As for which single letter suffix... the Intel ones are looking archaic. C99 or JVM would be fine with me. We'll probably have a boolean/condition type too, that'll be either 'b' (boolean) or 'c' (condition). With that in mind I lean towards the C99 ones, because boolean (if added) will be used a lot more than int8, and 'b' is more obvious for boolean than 'c'. Nb: the int32, int64, float64 ones are by far the most frequent and thus important (boolean, too, if it's added). int8, int16 and float32 only occur in memory. Using the Intel suffixes was easy to explain, but it hurt those common cases (most specifically int32).

Nicholas Nethercote [inactive]

Assignee

Comment 27

•

15 years ago

(In reply to comment #26) > (Otherwise we'd need i32tof32 instead of i2d, for example.) That should be "i32tof64".

Andreas Gal :gal

Comment 28

•

15 years ago

Please lets use anything BUT Intel. Intel's naming schemes are and always have been insane (sorry Moh). I like JVM. C99 sounds good to me too. LLVM is a bit verbose, but I could live with it.

Nicholas Nethercote [inactive]

Assignee

Comment 29

•

15 years ago

Ed and I just discussed this on IRC. We now both think that C-ish names are best: 'c' == char == int8_t 's' == short == int16_t 'i' == int == int32_t 'l' == long long == int64_t (not quite, but close enough) 'f' == float == float32 'd' == double == float64 Also: 'p' == "pointer" == synonym for 'i' or 'l', depending on word-size 'ui' == unsigned int == uint32_t 'ul' == unsigned long long == uint64_t Possible in the future (see bug 559265) 'b' == bool Any last objections before we redo this? (BTW, longer names ("i32") aren't going to happen, sorry Waldo.) (And apologies for the intel-based scheme, it seemed like a good idea at the time.)

Andreas Gal :gal

Comment 30

•

15 years ago

Sounds good. Makes sense to use nanojit's implementation language's naming convention. r=me

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 31

•

15 years ago

I would rather "q" instead of "l" (since there is only one "l", and "long" is ambiguous - also the character looks like a numeral 1). Other than that, I like it.

Andreas Gal :gal

Comment 32

•

15 years ago

q? You have been hanging around with x86 hack3rs too much :)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 33

•

15 years ago

Maybe :) but "l" isn't C for "long long".

Brendan Eich [:brendan]

Comment 34

•

15 years ago

l is for Java long to put the "Java" in "JS" :-P. Glad to see this new consensus (I hope). I knew gal was gonna say something so I held fire. Thanks to njn for bringing the topic up. /be

Comment 35

•

15 years ago

I heartily endorse Nicolas' and Ed's proposal in comment #29.

Jeff Walden [:Waldo]

Comment 36

•

15 years ago

I actually do kind of think "q" is better (more readable and less ambiguous with "1", if lacking in much mnemonic value) than "l". (Pardon me whilst I retch at saying that.) But this can work regardless.

Nicholas Nethercote [inactive]

Assignee

Comment 37

•

15 years ago

I too have grown accustomed to 'q' and so would be happy with 'q' or 'l'. Not that that helps us make a final decision...

Andreas Gal :gal

Comment 38

•

15 years ago

l as in long. Stick to the plan =)

Comment 39

•

15 years ago

I echo Jeff's vote: I prefer "q" but could live with "l"...

Nicholas Nethercote [inactive]

Assignee

Comment 40

•

15 years ago

After IRCing with Edwin, Brendan and Andreas, we decided on 'q': it's less coherent (ie. strays further from JVM/C99) but for day-to-day use it is nicer, because it's easier to read than 'l'.

Comment 41

•

15 years ago

Attached file (v4) script to do the renaming — Details

This version also fixes the intel names

Attachment #439541 - Attachment is obsolete: true

Updated

•

15 years ago

Blocks: 526914

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Depends on: 563590

Updated

•

15 years ago

Blocks: 537926

Updated

•

15 years ago

Depends on: 564941

Nicholas Nethercote [inactive]

Assignee

Updated

•

15 years ago

Depends on: 565257

Nicholas Nethercote [inactive]

Assignee

Comment 42

•

15 years ago

AFAICT TraceMonkey is now ready for the old opcodes to be removed -- all the relevant TM-only files have been updated, and all the NJ changes have been merged into TM.

Updated

•

15 years ago

Depends on: 566759

Nicholas Nethercote [inactive]

Assignee

Comment 43

•

15 years ago

All blocking bugs have been fixed!

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.