nanojit: avoid premature regalloc state updates in the i386 backend

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Trunk
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

9 years ago
Created attachment 400448 [details] [diff] [review]
patch

This patch is a first step towards fixing bug 513615, and lays the
groundwork for some future bugs.  It's deliberately very small, so the
general approach can be evaluated carefully.

- prepResultReg() and freeRsrcOf() are two prematurely-updating functions
  that need to be removed eventually.  For now, I've added
  prepareResultReg() and freeResourcesOf() which are safer versions.  (I'm
  not particularly attached to the new names, but I don't like freeRsrcOf()
  because it's cryptic and I find it hard to type.)

- asm_output2 has been added, which is the same as asm_output but takes an
  additional parameter which is 'PPost' if the current register state is
  the post-state, 'PPre' if it's the pre-state, and 'PNone' if it's neither
  (useful for the middle of sequences of generated instructions, when we 
  don't bother to keep the state consistent).

  asm_output(...) is now equivalent to asm_output2(PPre, ...).  Eventually all
  occurrences will print the post-state and all asm_output2 occurrences can be
  replaced with asm_output occurrences.  The output now indicates which state
  is being printed, eg:

      rp_offset = int 8
               mov eax,8              post: eax(rp_offset) ecx(ld1527) esi(cx) edi(sp)
      ld1526 = ld add156[0]
               mov   ecx,0(ecx+eax*1) pre:  eax(rp_offset) ecx(ld1527) esi(cx) edi(sp)

  and (an example involving PNone):

      not46 = not ld35
      mov eax,edx
      not eax                         post: eax(not46) ecx(ld36) edx(ld35) ebx(an d724) esi(ld33) edi(sp) xmm0(js_UnboxDouble65)

- asm_int() and asm_neg_not() have been modified to use the new state-update
  functions.  This required adding 'pp' parameters to several of the codegen
  macros;  these are passed on to asm_output2.

The plan after this patch is to continue modifying parts of the i386
back-end similarly to the way that asm_int() and asm_neg_not() were 
modified.  At some point, Assembler.cpp will need to be modified and then
parts of all the back-ends will have to be modified in sync as the
generic/arch-specific interface will change.
Attachment #400448 - Flags: review?(rreitmai)
Attachment #400448 - Flags: review?(edwsmith)
(Assignee)

Updated

9 years ago
Blocks: 513615
(Assignee)

Comment 1

9 years ago
Comment on attachment 400448 [details] [diff] [review]
patch

This one needs some more work before it's ready to be reviewed.
Attachment #400448 - Flags: review?(rreitmai)
Attachment #400448 - Flags: review?(edwsmith)
(Assignee)

Updated

9 years ago
Depends on: 516909
(Assignee)

Comment 2

9 years ago
Renaming this to just be about the i386 backend.
Summary: nanojit: start avoiding premature register allocation state updates → nanojit: avoid premature regalloc state updates in the i386 backend
(Assignee)

Updated

9 years ago
Depends on: 531347
(Assignee)

Comment 3

9 years ago
Created attachment 417434 [details] [diff] [review]
patch v2

This patch overhauls Assembler.cpp and most of Nativei386.cpp so that regstate
updates are handled in a more obvious and less error-prone manner.

The starting point is the introduction of prepareResultReg() and
freeResourcesOf(), which are similar to the old prepResultReg() and
freeRsrcOf(), except that the new ones can be used in a 

    prepareResultReg(ins, allow);
    GENERATE_CODE();
    freeResourcesOf(ins);

instead of:

    prepResultReg(ins, allow);
    GENERATE_CODE();

where prepResultReg() called freeRsrcOf(ins) and so you had to be very careful
not to accidentally reuse the register holding 'ins' when generating code.
asm_int() and asm_neg_not() are simple examples that are worth looking at first
to get a sense of the change.

The patch doesn't fully convert the i386 backend to the new scheme.  I only
changed those functions where it was easy for the new version to generate
identical code to the old version, and easy for me to test in TM.  I did
this so I could test it all carefully, by diffing the old and new code
generated for all of SunSpider.  I'll do the remainder in a follow-up bug.
This should help avoid regressions, and make any introduced regressions
easier to find.

The patch also adds a number of comments explaining non-obvious aspects of the
code generation, and does some minor formatting and naming clean-ups (e.g.
using 'ins' consistently for LIns variables, instead of 'i' or 'l' or 'p').
And it adds a few more assertions.
Attachment #400448 - Attachment is obsolete: true
Attachment #417434 - Flags: review?(edwsmith)
(Assignee)

Updated

9 years ago
Attachment #417434 - Flags: review?(dvander)

Comment 4

9 years ago
Comment on attachment 417434 [details] [diff] [review]
patch v2

>             r = nRegisterAllocFromSet(set);
>             _allocator.addActive(r, ins);
>             ins->setReg(r);
>+
>         } else {

That seems unnecessary.

>     void Assembler::findRegFor2(RegisterMask allow, LIns* ia, Register& ra, LIns* ib, Register& rb)
>     {
>         if (ia == ib) {
>             ra = rb = findRegFor(ia, allow);
>+
>         } else {

That too.

>+
>+                // 'ins' is in 'allow', in register r (different to the old r);
>+                //  s is the old r.
>                 if ((rmask(s) & GpRegs) && (rmask(r) & GpRegs)) {
> #ifdef NANOJIT_ARM
>-                    MOV(s, r);  // ie. move 'ins' from its pre-state reg to its post-state reg
>+                    MOV(s, r);  // move 'ins' from its pre-state reg (r) to its post-state reg (s)
> #else
>                     MR(s, r);
> #endif

Since you are in Rome, maybe you want to fix this and put a define into the arm backend.
> > #ifdef NANOJIT_ARM
> >-                    MOV(s, r);  // ie. move 'ins' from its pre-state reg to its post-state reg
> >+                    MOV(s, r);  // move 'ins' from its pre-state reg (r) to its post-state reg (s)
> > #else
> >                     MR(s, r);
> > #endif
> 
> Since you are in Rome, maybe you want to fix this and put a define into the arm
> backend.

Actually, it's already there: http://hg.mozilla.org/tracemonkey/file/21d3b1b65758/js/src/nanojit/NativeARM.h#l630
Attachment #417434 - Flags: review?(dvander) → review+
(Assignee)

Comment 6

9 years ago
(In reply to comment #4)
> (From update of attachment 417434 [details] [diff] [review])
> >             r = nRegisterAllocFromSet(set);
> >             _allocator.addActive(r, ins);
> >             ins->setReg(r);
> >+
> >         } else {
> 
> That seems unnecessary.

You mean the blank line?  I can remove it, I initially added a lot more comments about the register state but then I removed most of them, I think this was a leftover related to that.

> > #ifdef NANOJIT_ARM
> >-                    MOV(s, r);  // ie. move 'ins' from its pre-state reg to its post-state reg
> >+                    MOV(s, r);  // move 'ins' from its pre-state reg (r) to its post-state reg (s)
> > #else
> >                     MR(s, r);
> > #endif
> 
> Since you are in Rome, maybe you want to fix this and put a define into the arm
> backend.

Sure, I'll do that too.

Comment 7

9 years ago
Okay, I read all this code carefully and I'm convinced it's good.  I ran it on tamarin, though, and this assertion fires:

Assertion failed: !ins->getArIndex() (/Users/edwsmith/vm/njn/nanojit/Nativei386.cpp:438)

it's in asm_restore for the LIR_param case where we attempt to load stack args directly from the stack instead of spilling them:

>            int d = (arg - abi_regcount) * sizeof(intptr_t) + 8;
>            LD(r, d, FP);
>            NanoAssert(!ins->getArIndex());     // rematerializable; shouldn't be in a spill slot
>            ins->markAsClear();

I can help diagnose this if you like.  It is possible its due to me applying the patch to the wrong point (I just sync'd tamarin-redux backwards until the patch applied cleanly).

Anyway, I'm marking this R+ since you know how to run the regressions and use the sandbox.

fwiw, here's the details on what I tested:

macos 10.5.8
configure.py --enable-shell --enable-debug --enable-debugger
runtests.py -E (that binary) --vmargs="-Ojit -Dnodebugger"

Updated

9 years ago
Attachment #417434 - Flags: review?(edwsmith) → review+
(Assignee)

Comment 8

9 years ago
I thought that rematerializable things like 32-bit constants could end up in spill slots.  Turns out they can due to the findMemFor() call in handleLoopCarriedExprs(), but this doesn't manifest in TM.  So I just reverted those assertions back to tests, as was in the original code.  After that, TR acceptance tests all passed fine.

Pushed to NJ-central:

http://hg.mozilla.org/projects/nanojit-central/rev/e7b2c0d805eb
Whiteboard: fixed-in-nanojit
(Assignee)

Comment 9

9 years ago
http://hg.mozilla.org/tracemonkey/rev/8f1f371590cb
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey

Comment 11

9 years ago
http://hg.mozilla.org/tamarin-redux/rev/d1bb609bb3fe
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
(Assignee)

Comment 12

9 years ago
Merge to TM the Sparc bustage fix:

http://hg.mozilla.org/tracemonkey/rev/c6fbac9adb8b

Comment 13

9 years ago
http://hg.mozilla.org/mozilla-central/rev/8f1f371590cb
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.