Closed Bug 514102 Opened 10 years ago Closed 6 years ago

Improve register state merging

Categories

(Core Graveyard :: Nanojit, defect, P3)

All
Other
defect

Tracking

(Not tracked)

RESOLVED WONTFIX
flash10.2

People

(Reporter: gal, Assigned: gal)

References

Details

(Whiteboard: PACMAN, has-patch)

Attachments

(1 file, 10 obsolete files)

Register states are currently merged by spilling and re-loading any mismatched. That often causes suboptimal code. Here we could do mov ebx, eax or even xchg ebx, eax.


  00002e4e88   mov 4(esp),ecx         ecx(state)  <= spill state
  00002e4e8c   mov eax,ecx            ecx(state)
      label1:
  00002e4e8e   [label1]               eax(state)
  ## merging registers (intersect) with existing edge
  00002e4e8e   mov ebx,4(esp)          <= restore state
      sp = ld state[0]
  00002e4e92   mov edx,0(ebx)         ebx(state)
Nick, if you are looking for small improvements to reg alloc this probably would result in visible perf improvement for some test cases.
Blocks: 514195
Here is a simple test case:

var a = 1, b = 2, c = 3, d = 4;
for (var i = 0; i < 10; ++i) {
    ++a;
    ++b;
    ++c;
    ++d;
}

  00002e1f30   mov eax,ecx            ecx(state)
      label1:
  00002e1f32   [label1]               eax(state)
  ## merging registers (intersect) with existing edge
  00002e1f32   mov ecx,-4(ebp)         <= restore state
      sp = ld state[0]
  00002e1f35   mov eax,0(ecx)         ecx(state)
      cx = ld state[8]
Assignee: general → gal
Attached patch patch (obsolete) — Splinter Review
Attached patch patch (obsolete) — Splinter Review
Attachment #398239 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #398249 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #398250 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #398251 - Attachment is obsolete: true
Comment on attachment 398256 [details] [diff] [review]
patch

intersectRegisterState and unionRegisterState are mostly copy/paste
Attachment #398256 - Flags: review-
#8 I didn't write that code as separate functions. It came like that. Its hairy complex code. We can try to merge them separately but not as this patch. Its high risk enough as it is. r- for the r-
there is copy/paste of fresh comments and code. It's ok not to fix existing duplication, but don't add to it.
Attached patch patch (obsolete) — Splinter Review
Attachment #398256 - Attachment is obsolete: true
Yeah I moved the redundant code into a helper. I will try to merge the two functions into a template function in a separate patch.
Attachment #398269 - Flags: review?(rreitmai)
Attachment #398269 - Flags: review?(edwsmith)
does steal() conflict with registerAllocTmp() from bug 513863?  seems like a possible refactoring collision.
(In reply to comment #13)
> does steal() conflict with registerAllocTmp() from bug 513863?  seems like a
> possible refactoring collision.

It doesn't conflict.  And if it did, well, whoever landed second would just have to deal with it :)
Yeah, whoever lands second will rebase a bit. Are we otherwise good to go? :)
Blocks: 515875
Attached patch patch (obsolete) — Splinter Review
Attachment #398269 - Attachment is obsolete: true
Attachment #398269 - Flags: review?(rreitmai)
Attachment #398269 - Flags: review?(edwsmith)
Attached patch patch (obsolete) — Splinter Review
Attachment #400172 - Attachment is obsolete: true
status and/or preliminary results?
Whiteboard: PACMAN
Target Milestone: --- → Future
Assignee: gal → nobody
Component: JavaScript Engine → Nanojit
Without any proof, I think this bug has the potential to substantially improve register shuffling around branches in TR, and might become more important as fast-paths inlined.
OS: Mac OS X → Other
Priority: -- → P3
Hardware: x86 → All
Whiteboard: PACMAN → PACMAN, has-patch
Target Milestone: Future → flash10.2
Assignee: nobody → edwsmith
Status: NEW → ASSIGNED
Blocks: 513616
Summary: TM: improve register state merging [nanojit] → Improve register state merging
Assignee: edwsmith → gal
Attached patch patch (obsolete) — Splinter Review
Attachment #400173 - Attachment is obsolete: true
Rebased the patch.
Any numbers? I like registers.
I am crashing on an empty loop. Once thats fixed I get some numbers. Ed is working on phi nodes in parallel.
Attached patch patch (obsolete) — Splinter Review
Less crashy.
Attachment #456532 - Attachment is obsolete: true
Attached patch patchSplinter Review
Working patch. Minimal speedup. This will become more relevant with phi nodes hopefully.
Attachment #456536 - Attachment is obsolete: true
Attachment #456532 - Flags: review?(nnethercote)
Comment on attachment 456537 [details] [diff] [review]
patch


>+    void Assembler::shuffleRegs(Move* begin, Move* end, RegisterMask src, RegisterMask dst)

I'd like to see a high level comment describing what this function is doing,
rather than having to piece it together from a few cryptic comments
describing each step.  This function is at the heart of this patch and I
don't understand it well enough yet to give my r+.


>+            if (curins != savedins) {
>+                if (savedins)
>+                    planReload(&reloadp, r, savedins);
>                 if (curins) {
>-                    //_nvprof("intersect-evict",1);
>-                    verbose_only( shouldMention=true; )
>-                    NanoAssert(curins->getReg() == r);
>-                    evict(curins);
>+                    // If curins is live in saved, we can copy its value from the register
>+                    // its held in there.

This comment is hard to understand, esp. the "its held in there".  Do you mean
"we can copy its value from the register in 'saved' that holds it"?


>         LIns* getActive(Register r) const {
>             NanoAssert(r != deprecated_UnknownReg);
>             return active[r];
>         }
> 
>+        // Scan through the list of active registers and and see if one matches. This

"and and"


>+        // should only be used if this is an inactive copy of the register allocator
>+        // and hence i->resv() is not up to date.

i->resv() no longer exits, just say "i's reservation is not up to date".


>+        bool scanFor(LIns* i, Register* rp) {
>+            for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) {
>+                if (active[r] == i) {
>+                    *rp = r;
>+                    return true;
>+                }
>+            }
>+            return false;
>+        }

'scanFor' is too vague, I'll have to look at the function body every time to
remember what it does.  'scanForRegHoldingIns' is longer but more
descriptive.
I did some measurements.  This patch reduces the number of static restores for all of SS from 60,570 to 58,814, which isn't bad.  The restores (ie. loads) get replaced with register moves.

However, it's a net loss for SS based on instruction counts due to compile-time increases.  3d-raytrace has a 1% instruction count increase, a few of the others have 0.3--0.6%.  Combined with the increase in complexity, I don't think this patch is worth it at the moment.  I'd be happy to re-evaluate when phi nodes are added.

(FWIW, I've found tinkering at this level in NJ to be pretty fruitless, it tends to result in 1-case-in-100-is-marginally-better kind of improvements which are swallowed by the noise.  That's why I've been focusing on the quality of LIR generation.)
agreed.  My hypothesis is that adding phi nodes will facilitate higher quality LIR generation, but data needs to support it.
Comment on attachment 456532 [details] [diff] [review]
patch

Clearing an r? on an obsolete patch to clear out my review queue.
Attachment #456532 - Flags: review?(nnethercote)
I tested this patch ("shuffle"), which slows down jit-speed, combined with the patch from bug 584935 ("ffs"), which speeds up the jit by using BSF/BSR instructions.  Combined the result is still faster than where we started.  The input is 700MB of randomly crawled AS3-containing swfs.

              x86-64  x86-32 
baseline      24.8    25.2
ffs only      23.9    24.5
shuffle       24.9    25.4
shuffle-ffs   24.3    25.1

I don't yet understand shuffleRegs() well enough to try tuning it using FFS-style iteration.
nanojit is long gone.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.