nanojit: don't utilize all XMM regs on X64?

RESOLVED WONTFIX

Status

()

RESOLVED WONTFIX
9 years ago
9 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

9 years ago
Created attachment 408781 [details] [diff] [review]
patch

We have various bits of code in the compiler that iterate over all the registers and do something (see bug 514274).  And the XMM registers get allocated from XMM0 upwards, which means that the higher ones hardly ever get used.  The attached patch cause the compiler to ignore XMM8--XMM15 (the lower ones are all necessary because they might be function arguments).  This gives a ~1% speedup on SS.
Attachment #408781 - Flags: review?(gal)
(Assignee)

Comment 1

9 years ago
Created attachment 408782 [details]
SS results

At this point I expect someone to say "what we should really do is avoid the places where we scan over all the registers" which is reasonable but I don't see how to avoid them.

Comment 2

9 years ago
We need some feedback from Adobe here. I can imagine 16 registers is too much, maybe we should take the patch. Ed?
(Assignee)

Comment 3

9 years ago
Created attachment 408785 [details] [diff] [review]
patch, v2

fix some assertion failures
Attachment #408781 - Attachment is obsolete: true
Attachment #408785 - Flags: review?(gal)
Attachment #408781 - Flags: review?(gal)
(Assignee)

Comment 4

9 years ago
Furthermore, we have this code in nRegisterResetAll():

#ifdef _WIN64
        a.free = 0x001fffcf; // rax-rbx, rsi, rdi, r8-r15, xmm0-xmm5
#else       
        a.free = 0x00ffffff & ~(1<<RSP | 1<<RBP);
#endif      


So on Windows we already don't use XMM8..XMM15, but we pay the price of scanning through them anyway.  So the patch brings windows and non-windows closer together.
Shouldn't we prefer to use XMM8...15 to avoid spills?

Comment 6

9 years ago
iirc, on msc, xmm6-15 are callee-saved, and at the time i wrote the backend we didn't have the necessary bits to handle saving/restoring them (or we did and i didn't "get it" at the time).  so i left them out.

on gcc, xmm6-15 are scratch so i left them in use.  the best fix IMO is to fix the _WIN64 case (should that be MSC_VER?) to allow using the full set of XMM's, otherwise we're just artificially increasing register pressure.  

due to the way registers are chosen, i think it would take a method that does lots of manipulation of doubles to demonstrate the register pressure problem.
(Assignee)

Comment 7

9 years ago
(In reply to comment #5)
> Shouldn't we prefer to use XMM8...15 to avoid spills?

(In reply to comment #6)
> 
> the best fix IMO is to fix
> the _WIN64 case (should that be MSC_VER?) to allow using the full set of XMM's,
> otherwise we're just artificially increasing register pressure.  
> 
> due to the way registers are chosen, i think it would take a method that does
> lots of manipulation of doubles to demonstrate the register pressure problem.

Maybe I wasn't clear in comment 0... my contention is that register pressure is rarely high enough for XMM8..XMM15 to be used.  And there's a compile-time cost to making them available.  So we should pretend they don't exist;  we will lose in rare cases w.r.t. run-time, but we will win frequently w.r.t. compile-time.

I'm not totally certain it's a good idea but it appears to be a sizeable win on SunSpider right now.

Comment 8

9 years ago
Comment on attachment 408785 [details] [diff] [review]
patch, v2

r+, but I feel weird about this (still).
Attachment #408785 - Flags: review?(gal) → review+
(Assignee)

Comment 9

9 years ago
Bug 535681 has some ideas for making iterating over registers faster in all cases.  That would be a better fix than this bug.  I'll only land this patch if I deem bug 535681 to be undoable.
Depends on: 535681
(Assignee)

Comment 10

9 years ago
I've been unable to replicate the improvement I saw earlier with this patch.  I've tried with various revisions over the past couple of weeks.  It's possible that the improvement to intersectRegisterState() in bug 536098 was enough to reduce the difference to negligible.  Marking as WONTFIX.
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.