Closed Bug 565571 Opened 10 years ago Closed 9 years ago

nanojit: improve X64 codegen for stores of integer immediates

Categories

(Core Graveyard :: Nanojit, defect)

x86_64
All
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: njn, Unassigned)

Details

(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 1 obsolete file)

Storing a 32-bit immediate on i386:

      immi4 = immi 1
      sti.s sp[40] = immi4/*1*/
  ......   mov 40(esi),1

Doing likewise on X64:

      immi4 = immi 1
  ...... movl edi, 1
      sti.s sp[40] = immi4/*1*/
  ...... movl 40(r13), edi

It's easy to avoid putting the immediate in a register, just compare
asm_store32() in Nativei386.cpp and NativeX64.cpp.  The only hard part is
(as usual) working out the X64 encoding for the immediate forms.

Ed or Rick -- do you want to take this one?
Summary: nanojit: specialize X64 codegen for stores of 32-bit immediates → nanojit: improve X64 codegen for stores of 32-bit immediates
The same thing can be done for stores of 64-bit integer immediates:

    immq1 = immq 1
  ...... movl edi, 1
      stq.r rp[0] = immq1/*1*/
  ...... movq 0(r15), rdi
Summary: nanojit: improve X64 codegen for stores of 32-bit immediates → nanojit: improve X64 codegen for stores of integer immediates
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
Whiteboard: PACMAN
Target Milestone: --- → Future
OS: Mac OS X → All
Hardware: x86 → x86_64
I set the PACMAN whiteboard flag and the cpu flags, making this more likely to be poached.  i agree its low hanging fruit.
Assignee: nobody → wsharp
Patched 64-bit tamarin passes all sanity tests with these changes.  Looking for feedback on name of functions STi/ST8i/ST16i (matches 32-bit) and emitrm_imm, etc.
Attachment #453494 - Flags: feedback?(nnethercote)
Attachment #453494 - Flags: feedback?(edwsmith)
Comment on attachment 453494 [details] [diff] [review]
64-bit support for store immediate for 8/16/32-bit immediate value

>+        void STi(Register base, int disp, int32_t imm); \
>+        void ST16i(Register base, int disp, int32_t imm16); \
>+        void ST8i(Register base, int disp, int32_t imm8); \

The X64 function naming convention is different to the i386 one.  I'd suggest MOVBMI/MOVSMI/MOVLMI.  You'll need to change the X64_st* constants as well.

Van you do asm_store64() as well, as comment 1 suggested?  It should be similar.  The LIR_stq case is most important, the others would be nice as well if they can be done without jumping through hoops.

Do you know how to run the Mozilla tests?  (Ask me on IRC or via email if you don't.)  I tried running them and got a failure.  It was on a test that involves a 20,000+ instruction fragment, so on a hunch I tried adding an extra 8 to each of the underrunProtect() calls in the three emit_* functions you added.  That fixed the problem.  I'll let you work out what the correct values should be :)

I'll give you an f+ because it's just a feedback request, but I'd like to do a review of the updated patch when it's ready.  Thanks!
Attachment #453494 - Flags: feedback?(nnethercote) → feedback+
Based upon feedback, renamed a few things and added LIR_stq support.
Attachment #453494 - Attachment is obsolete: true
Attachment #454101 - Flags: superreview?(edwsmith)
Attachment #454101 - Flags: review?(nnethercote)
Attachment #453494 - Flags: feedback?(edwsmith)
Comment on attachment 454101 [details] [diff] [review]
latest patch with LIR_stq support

asm_store64(): 

* (nit) please keep the local brace style: { on the same line with control statements.

if the immq value doesn't fit in 32bits, the patch emits two 32-bit stores.  is it better to just use the old code in this case?  It does a 64-bit load-immediate into a register, followed by a register store.  (avoids splitting the 64bit value in half)?  I worry about store-forwarding hazards.

if we keep the two-store code as-is, they should probably be flipped in order so the executed sequence is to store at [d+0] followed by [d+4], just to behave nicer to any prefetcher (store low-then-high address).

A counter argument would be that the two 32bit stores are better if using the extra register causes a spill.  That actually can be detected (check whether there are 2 free registers first) -- but this seriously complicates the patch and should be driven by evidence of spilling being a problem.

FYI: The way I test new instructions is insert a TODO or assert in the appropriate place to stop the debugger, step over the code emitted, then dissassemble the generated code and hand-inspect that the debugger's disassembler agrees with what we want.  you can hand-modify the assigned registers when doing this to check corner cases, too.  

the underrunProtect parameters look correct now, compared to the previous patch.
Attachment #454101 - Flags: superreview?(edwsmith) → superreview+
cut & paste bug: the verbose output for the new instructions always prints "movs", it should be "movq/l/s/b"
the comment:


                    // MOVQMI takes a 32-bit integer that gets signed extended to a 64-bit memory location

should read "64-bit value" at the end.  the immediate is a stored value, not a memory location.
(In reply to comment #7)
> 
> if the immq value doesn't fit in 32bits, the patch emits two 32-bit stores.  is
> it better to just use the old code in this case?  It does a 64-bit
> load-immediate into a register, followed by a register store.  (avoids
> splitting the 64bit value in half)?  I worry about store-forwarding hazards.

Yeah, that seems reasonable -- it merges two cases, and the fits-in-32-bit case is the most important.  (If this contradicts anything I said earlier, I apologize!)

r=me with Ed's nits addressed (no need to post a new patch).
Attachment #454101 - Flags: review?(nnethercote) → review+
Per team mtg, removing my name from bugs I'm not actively working on.
Assignee: wsharp → nobody
I landed this with some minor tweaks:
- used the old code for storing immediates that don't fit in 64-bits, like Ed suggested
- fixed some nits from the reviews
- tweaked the formatting in a couple of places

http://hg.mozilla.org/projects/nanojit-central/rev/e60ea6ffca83
http://hg.mozilla.org/tracemonkey/rev/1b8e50da7f24
Whiteboard: PACMAN → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/1b8e50da7f24
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
http://hg.mozilla.org/tamarin-redux/rev/5da233c90e00
Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.