Closed Bug 473117 Opened 16 years ago Closed 15 years ago

TM: TraceRecorder::guardCallee's "not forgeable" value check actually is forgeable

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: Waldo, Assigned: Waldo)

Details

(Keywords: verified1.9.1, Whiteboard: [sg:critical?] fixed-in-tracemonkey)

Attachments

(3 files, 6 obsolete files)

/*
     * NB: The following guard guards at runtime that the callee is a
     * function. Even if the given value is an object that doesn't have
     * a private slot, the value we're matching against is not forgeable.
     */
    guard(true,
          lir->ins2(LIR_eq, 
                    lir->ins2(LIR_piand, 
                              stobj_get_fslot(callee_ins, JSSLOT_PRIVATE),
                              INS_CONSTPTR((void*)(~JSVAL_INT))),
                    INS_CONSTPTR(OBJ_GET_PRIVATE(cx, callee_obj))),
          exit);

In dense arrays, JSSLOT_PRIVATE == JSSLOT_ARRAY_LENGTH, and that slot stores the length of the dense array, which isn't particularly difficult to control.  We might get saved somewhere down the line after the "function identity" check, but that's not immediately obvious, if it's even correct.  I think we have to guard on the class of the object here, at least if we don't want to change the current private tagging and dense-length storage systems.

The attachment here implements this guard, but it has one problem: it's not using the existing side exit in guardCallee.  For some strange reason, reusing that results in ~1800 more trace aborts running testComparisons, while generating a new one, as happens in this patch, does not; I haven't investigated reasons yet.  Anyone know why this might happen?
Flags: blocking1.9.1?
Are you using the same exit type? MISMATCH vs BRANCH exit?
Mhm. Strange. This needs more investigation. Lets change the patch to use the same side exit and file a dependent bug on the strange effect and look into that.
Using the same side exit is a very, very noticeable perf regression -- testComparisons runs slower with -j than without -- and we have a policy of not accepting perf regressions.

I'd be fine with landing with the different side exit and fixing to use the same side exit afterward (preferably blocking, since this is completely non-obvious), if that's what you meant to say.  :-)
Lets not land fragile patches that have workarounds for unexplainable bugs. We have to investigate the underlying issue. If you could attach a clean patch we can debug the sideexit issue. Since its so noticeable we should be able to find it quickly.
Flags: blocking1.9.1? → blocking1.9.1+
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
Priority: -- → P1
Priority: P1 → P2
Whiteboard: [sg:critical?]
Waldo is not available for a while. Throwing this bug to David to investigate why the number of side exits skyrocket if we share the side exit block vs having 2 separate exits.
Assignee: jwalden+bmo → danderson
Stealing back, "not available" was a matter of three days, not some long stretch of time.  :-)
Assignee: danderson → jwalden+bmo
All yours :) Lets try to reproduce the wrid side exit number increase first (if it is still reproducible).
...
(gdb) si 50
Cannot access memory at address 0x268a9708
0x02ffc42e in ?? ()
1: x/i $pc  0x2ffc42e:  mov    -44(%ebp),%ecx
(gdb) disas $pc $pc+100
Dump of assembler code from 0x2ffc42e to 0x2ffc492:
0x02ffc42e:     mov    -44(%ebp),%ecx
0x02ffc431:     mov    4(%eax),%eax
0x02ffc434:     and    $0xfffffffc,%eax
0x02ffc437:     cmp    $0x1b87c0,%eax
0x02ffc43c:     mov    -40(%ebp),%eax
0x02ffc43f:     jne    0xb1a025
0x02ffc445:     mov    16(%eax),%eax
0x02ffc448:     and    $0xfffffffe,%eax
0x02ffc44b:     cmp    $0xb12e38,%eax
0x02ffc450:     mov    -40(%ebp),%eax
0x02ffc453:     jne    0xb1a034
0x02ffc459:     mov    12(%eax),%eax
0x02ffc45c:     cmp    $0x8faa60,%eax
0x02ffc461:     mov    -36(%ebp),%eax
0x02ffc464:     jne    0xb1a043
...
End of assembler dump.
(gdb) si
0x02ffc431 in ?? ()
1: x/i $pc  0x2ffc431:  mov    4(%eax),%eax
(gdb) p (JSObject*)$eax
$1 = (JSObject *) 0xb1e0e0
(gdb) p (JSFunction*)($1->fslots[2]&~7)
$2 = (JSFunction *) 0xb12e70
(gdb) p (JSOp)$2->u.i.script->code[0]              
$3 = JSOP_GETARG
(gdb) p (JSOp)$2->u.i.script->code[3]
$4 = JSOP_GETARG
(gdb) p (JSOp)$2->u.i.script->code[6]
$5 = JSOP_STRICTNE
(gdb) p (JSOp)$2->u.i.script->code[7]
$6 = JSOP_RETURN
(gdb) p (JSOp)$2->u.i.script->code[8]
$7 = JSOP_STOP
(gdb) p (JSFunction*)0xb12e38
$8 = (JSFunction *) 0xb12e38
(gdb) p (JSOp)$8->u.i.script->code[0]
$9 = JSOP_GETARG
(gdb) p (JSOp)$8->u.i.script->code[3]
$10 = JSOP_GETARG
(gdb) p (JSOp)$8->u.i.script->code[6]
$11 = JSOP_STRICTEQ
(gdb) p (JSOp)$8->u.i.script->code[7]
$12 = JSOP_RETURN
(gdb) p (JSOp)$8->u.i.script->code[8]
$13 = JSOP_STOP

The class-check seems to be doing fine here; it's the JSFunction* equality check that fails.  Somehow, every time we try to call the op function inside the loops in testComparisons, the function we're trying to call isn't the one seen at trace time -- instead of the === comparison function it's the !== comparison function.
(gdb) disas 0x8e0dc7 0x8e0f40
Dump of assembler code from 0x8e0dc7 to 0x8e0f40:
0x008e0dc7:     push   %esi
0x008e0dc8:     push   %ebx
0x008e0dc9:     push   %ebp
0x008e0dca:     push   %ebp
0x008e0dcb:     mov    %esp,%ebp
0x008e0dcd:     sub    $0x38,%esp
0x008e0dd0:     mov    %ecx,-4(%ebp)
0x008e0dd3:     mov    %ecx,%eax
0x008e0dd5:     mov    (%eax),%esi
0x008e0dd7:     mov    12(%eax),%eax
0x008e0dda:     mov    -232(%esi),%ebx
0x008e0de0:     mov    %ebx,-28(%ebp)
0x008e0de3:     mov    -224(%esi),%ebx
0x008e0de9:     mov    %ebx,-24(%ebp)
0x008e0dec:     mov    -216(%esi),%edi
0x008e0df2:     mov    -208(%esi),%ebx
0x008e0df8:     mov    -200(%esi),%ecx
0x008e0dfe:     mov    %ecx,-8(%ebp)
0x008e0e01:     mov    -192(%esi),%ecx
0x008e0e07:     mov    %ecx,-20(%ebp)
0x008e0e0a:     mov    -176(%esi),%ecx
0x008e0e10:     mov    %ecx,-16(%ebp)
0x008e0e13:     mov    -168(%esi),%ecx
0x008e0e19:     mov    %ecx,-12(%ebp)
0x008e0e1c:     mov    -56(%esi),%ecx
0x008e0e1f:     mov    %ecx,-48(%ebp)
0x008e0e22:     mov    -48(%esi),%edx
0x008e0e25:     mov    -32(%esi),%ecx
0x008e0e28:     mov    %ecx,-44(%ebp)
0x008e0e2b:     mov    -24(%esi),%ecx
0x008e0e2e:     mov    %ecx,-36(%ebp)
0x008e0e31:     mov    -16(%esi),%ecx
0x008e0e34:     mov    %ecx,-40(%ebp)
0x008e0e37:     mov    -8(%esi),%ecx
0x008e0e3a:     mov    %ecx,-32(%ebp)
0x008e0e3d:     mov    -48(%ebp),%ecx
0x008e0e40:     mov    (%eax),%eax
0x008e0e42:     test   %eax,%eax
0x008e0e44:     mov    -44(%ebp),%eax
0x008e0e47:     jne    0xb45f61
0x008e0e4d:     mov    %eax,48(%esi)
0x008e0e50:     movl   $0x0,56(%esi)
0x008e0e57:     mov    %ecx,64(%esi)
0x008e0e5a:     mov    %edx,72(%esi)
0x008e0e5d:     mov    4(%eax),%eax
0x008e0e60:     and    $0xfffffffc,%eax
0x008e0e63:     cmp    $0x1ba980,%eax
0x008e0e68:     mov    -44(%ebp),%eax
0x008e0e6b:     jne    0xb45f70
0x008e0e71:     mov    16(%eax),%eax
0x008e0e74:     and    $0xfffffffe,%eax
0x008e0e77:     cmp    $0xb41bd0,%eax
0x008e0e7c:     mov    -44(%ebp),%eax
0x008e0e7f:     jne    0xb45f7f
0x008e0e85:     mov    12(%eax),%eax
0x008e0e88:     cmp    $0xb3bc20,%eax
0x008e0e8d:     mov    -40(%ebp),%eax
0x008e0e90:     jne    0xb45f8e
0x008e0e96:     cmp    %edx,%ecx
0x008e0e98:     mov    -36(%ebp),%edx
0x008e0e9b:     sete   %cl
0x008e0e9e:     movzbl %cl,%ecx
0x008e0ea1:     mov    %ecx,-96(%esi)
0x008e0ea4:     mov    %eax,48(%esi)
0x008e0ea7:     cmp    $0x1,%eax
0x008e0eaa:     mov    -32(%ebp),%eax
0x008e0ead:     je     0xb45f9d
0x008e0eb3:     cmp    %edx,%ecx
0x008e0eb5:     mov    -28(%ebp),%edx
0x008e0eb8:     sete   %cl
0x008e0ebb:     movzbl %cl,%ecx
0x008e0ebe:     test   %ecx,%ecx
0x008e0ec0:     sete   %cl
0x008e0ec3:     movzbl %cl,%ecx
0x008e0ec6:     mov    %ecx,-16(%esi)
0x008e0ec9:     mov    -8(%ebp),%ecx
0x008e0ecc:     add    $0x1,%eax
0x008e0ecf:     mov    %eax,-8(%ebp)
0x008e0ed2:     jo     0xb45fac
0x008e0ed8:     mov    %eax,-8(%esi)
0x008e0edb:     mov    %eax,48(%esi)
0x008e0ede:     movl   $0x64,56(%esi)
0x008e0ee5:     cmp    $0x64,%eax
0x008e0ee8:     mov    -24(%ebp),%eax
0x008e0eeb:     jge    0xb45fbb
0x008e0ef1:     mov    %edx,-232(%esi)
0x008e0ef7:     mov    -20(%ebp),%edx
0x008e0efa:     mov    %eax,-224(%esi)
0x008e0f00:     mov    -16(%ebp),%eax
0x008e0f03:     mov    %edi,-216(%esi)
0x008e0f09:     mov    -12(%ebp),%edi
0x008e0f0c:     mov    %ebx,-208(%esi)
0x008e0f12:     mov    -8(%ebp),%ebx
0x008e0f15:     mov    %ecx,-200(%esi)
0x008e0f1b:     mov    -4(%ebp),%ecx
0x008e0f1e:     mov    %edx,-192(%esi)
0x008e0f24:     mov    %eax,-176(%esi)
0x008e0f2a:     mov    %edi,-168(%esi)
0x008e0f30:     mov    %ebx,-8(%esi)
0x008e0f33:     jmp    0x8e0dd0
0x008e0f38:     mov    %ebp,%esp
0x008e0f3a:     pop    %ebp
0x008e0f3b:     pop    %ebp
0x008e0f3c:     pop    %ebx
0x008e0f3d:     pop    %esi
0x008e0f3e:     pop    %edi
0x008e0f3f:     ret    
End of assembler dump.
(gdb) p/x 11779136
$107 = 0xb3bc40
(gdb) si
0x008e0e57 in ?? ()
1: x/i $pc  0x8e0e57:   mov    %ecx,64(%esi)
(gdb) c
Continuing.

Breakpoint 13, 0x008e0e6b in ?? ()
1: x/i $pc  0x8e0e6b:   jne    0xb45f70
(gdb) p/x $eax
$108 = 0xb3bc40
(gdb) p/x (uintptr_t *) (3221211712 + 48)
$109 = 0xbfffca70
(gdb) p/x 0x1ba980
$110 = 0x1ba980
(gdb) p ((JSClass*)0x1ba980)->name
$111 = 0x19465c "Function"
(gdb) p ((JSClass*)(((JSObject*)0xb3bc40)->classword&~7))->name
$112 = 0x19465c "Function"

More lovely, at least sometimes we're exiting the trace because a comparison between a JSObject* and a JSClass* fails.  :-)  Moving closer to figuring out what's wrong...
I wonder if these latest issues might also be the cause of bug 471214...can't say yet, but I won't be surprised if that happens.
Never mind comment 9, I was forgetting about an intermediate write there that meant the class/object comparison wasn't actually happening.
Any updates here?
Doesn't seem to be as bad as in the past, but after appropriate cache-warming it's still 0.4s slower or so on the one test that was getting most hammered:

h-118:~/moz/shell-js/js/src jwalden$ time ./js -j trace-test.js testComparisons
TEST-PASS | trace-test.js | testComparisons

passed: testComparisons

FAILED: 0

real    0m1.576s
user    0m1.447s
sys     0m0.092s

h-118:~/moz/shell-js-mod/js/src jwalden$ time ./js -j trace-test.js testComparisons
TEST-PASS | trace-test.js | testComparisons

passed: testComparisons

FAILED: 0

real    0m1.910s
user    0m1.746s
sys     0m0.112s
Comment 11 is a reuse-existing-exit patch; if I create a new snapshot, times generally drop by a couple tenths of a second or so:

h-118:~/moz/shell-js-mod/js/src jwalden$ time ./js -j trace-test.js testComparisons
TEST-PASS | trace-test.js | testComparisons

passed: testComparisons

FAILED: 0

real    0m1.682s
user    0m1.510s
sys     0m0.109s
Could you post the trace stats with and without the patch?
(In reply to comment #15)
> Could you post the trace stats with and without the patch?

ping...
Attached patch Updated to latest TM (obsolete) — Splinter Review
Oh hey, since I last tested this the slowdown's gone away!  So something since March 12 made a difference here, might be worthwhile figuring out exactly what -- but in any case looks like this is all but fixed now.
Attachment #356468 - Attachment is obsolete: true
Attachment #368951 - Flags: review?(gal)
Attachment #368951 - Flags: review?(gal)
Comment on attachment 368951 [details] [diff] [review]
Updated to latest TM

Nooooo, I tested the wrong patch.  Problem still exists...

Without a corrected patch to use the same side exit:

TRACEMONKEY=stats ./js -j trace-test.js
recorder: started(1146), aborted(203), completed(1365), different header(0), trees trashed(3), slot promoted(0), unstable loop variable(153), breaks(2), returns(22), unstableInnerCalls(52)
monitor: triggered(4872), exits(4872), type mismatch(0), global mismatch(27)

With it:

TRACEMONKEY=stats ./js -j trace-test.js
recorder: started(1146), aborted(203), completed(1436), different header(0), trees trashed(3), slot promoted(0), unstable loop variable(153), breaks(2), returns(22), unstableInnerCalls(52)
monitor: triggered(6052), exits(6052), type mismatch(0), global mismatch(27)
Comment on attachment 368951 [details] [diff] [review]
Updated to latest TM

You are using a separate snapshot now? Did we identify the bug that causes the merged snapshot to be slower?
No, pulling out the shark now to see if that illuminates anything about why a same-side-exit patch causes more side exits and aborts.
I know the bug is critical and needs fixing but i don't want to make the effect go away until we understand it. The patch looks good. As soon we know whats up I will + it, ok?
Attached file Minimal testcase
find-waldo-now:~/moz/shell-js/js/src jwalden$ TRACEMONKEY=stats ./js -j ~/moz/small.js 
recorder: started(2), aborted(1), completed(2), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
monitor: triggered(3), exits(3), type mismatch(0), global mismatch(0)

h-118:~/moz/shell-js-mod/js/src jwalden$ TRACEMONKEY=stats ./js -j ~/moz/small.js
recorder: started(2), aborted(1), completed(5), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
monitor: triggered(1001), exits(1001), type mismatch(0), global mismatch(0)
Attachment #368951 - Attachment is obsolete: true
...
    and2 = and ld3, (void*)(~JSSLOT_CLASS_MASK_BITS)
    eq6 = eq and2, &js_FunctionClass
    xf4: xf eq6 -> pc=0xc0bfcc imacpc=0x0 sp+32 rp+0
      mov 24(esi),ebx    eax(GetProperty_tn1) ecx(state) ebx($global0) esi(sp)
      mov 16(esi),eax    eax(GetProperty_tn1) ecx(state) esi(sp)
      mov ebx,4(eax)     eax(GetProperty_tn1) ecx(state) esi(sp)
      and ebx,-4         eax(GetProperty_tn1) ecx(state) ebx(ld3) esi(sp)
      cmp ebx,1849600    eax(GetProperty_tn1) ecx(state) ebx(and2) esi(sp)
      jne 0x9acfc4       eax(GetProperty_tn1) ecx(state) esi(sp)
--------------------------------------- exit block (LIR_xt|LIR_xf)
        0x9acfc4:
              mov eax,9654928                
              mov esp,ebp                    
        0x9acfcb:
              jmp 0x936ff8                   
--------------------------------------- end exit block 0x9352f8
    ld4 = ld GetProperty_tn1[16]
    and3 = and ld4, (void*)(~JSVAL_INT)
    eq7 = eq and3, OBJ_GET_PRIVATE(cx, callee_obj)
    xf5: xf eq7 -> pc=0xc0bfcc imacpc=0x0 sp+32 rp+0
      mov ebx,16(eax)    eax(GetProperty_tn1) ecx(state) esi(sp)
      and ebx,-2         eax(GetProperty_tn1) ecx(state) ebx(ld4) esi(sp)
      cmp ebx,9996368    eax(GetProperty_tn1) ecx(state) ebx(and3) esi(sp)
      jne 0x9acfd0       eax(GetProperty_tn1) ecx(state) esi(sp)
--------------------------------------- exit block (LIR_xt|LIR_xf)
        0x9acfd0:
              mov eax,9654928                
              mov esp,ebp                    
        0x9acfd7:
              jmp 0x936ff8
...
Breakpoint 1, nanojit::Assembler::patch (this=0x2002400, exit=0x93529c) at ./nanojit/Assembler.cpp:682
682             GuardRecord *rec = exit->guards;
(gdb) n
683             AvmAssert(rec);
(gdb) n
684             while (rec) {
(gdb) n
685                 patch(rec);
(gdb) n
patching jump at 0x9acfcb to target 0x936e73 (was 0x936ff8)

686                 rec = rec->next;
(gdb) n
684             while (rec) {
(gdb) n
688         }


You can't share one GuardRecord across multiple LIR_x.? instructions, because each exit's prep-to-jump-to-epilogue code is compiled into different code so register intersection, etc. compile correctly.  Thus only the first exit's jump is patched, so every iteration hits the original trace and falls off and then immediately back on, because the class-check isn't what will fail, and the private-check will every time.

Arguably, then, it would be sufficient to use a new snapshot every time, but perhaps it's possible to use the cloning infrastructure to avoid that.  It's not immediately obvious to me how much overhead is associated with snapshot ("looks" like proportional to O(locals+globals), which may be a decent size), which would be the concern here.
You can't share GuardRecords, but you can share SideExits. GuardRecords are linked in the SideExit. When we patch we should always patch all guards associated with the side exit. Where do we not do that?
TraceRecorder::guard is basically a dumb write-out-a-few-LIR-instructions method with no special effort in it to note exit instructions which happen to be used in multiple places; if we get a reused side exit instruction, we'll reuse everything in it (but still compile different exit code for each).  The only places we exit->addGuard are in TraceRecorder::clone (used only for shared loop exits, and only if two snapshots happen to be able to share state) and in snapshot itself for the single call that happens most places.
So, really, when you see:

  LIns* exit = snapshot(...);
  guard(..., exit);
  guard(..., exit);

There is one VMSideExit, and there is one GuardRecord, but each exit point gets its own generated code, and only the first one will be patched when that exit is taken and subsequent code is traced and compiled.
Both should be patched of course. So we should link them in guard().
Attached patch Manually clone reused exits (obsolete) — Splinter Review
There are no two things to link right now.  There is only one GuardRecord so long as we use the return value of snapshot twice.  If we want to link them, which pretty inarguably we do, we need two GuardRecord structures which both refer to the same VMSideExit.  Doing so in guard() requires adding a bit more smarts to it to remember exits it's already seen; doing it this way requires a little more diligence and a bit less bookkeeping.

In the absence of some obvious place to store this information that doesn't bloat VMSideExit (the only place I see that makes sense to store it), I'd rather rely on us to find these problems if they crop up (no other instances do, checked this yesterday).  This patch fixes up all current instances of reused snapshots; note that you can't cloneExit a cloneExit(snapshot(...)), which makes the two loopy modifications a bit weird to read, but it was necessary to avoid crashes in various parts of trace-test.js doing things which would precisely trigger that behavior.

A last note on that: since we store GuardRecords in the trace itself as I understand it, and GuardRecord.next threads to allow sharing a SideExit, what happens if a GuardRecord is dead-code-eliminated?  This might or might not be a phantom concern; we thread now for LOOP_EXIT in certain special cases and may just have gotten lucky, but I can't tell for sure.  Followup bug if this sounds like a legitimate concern, I think -- this bug's gone beyond its original purpose far enough already (maybe good for obfuscatory purposes, but otherwise a negative).
Attachment #372150 - Flags: review?(graydon)
Comment on attachment 372150 [details] [diff] [review]
Manually clone reused exits

This is not the right approach. We want in these cases to have only 1 trace continuation. With your patch you massively increase tail duplication and all those traces are identical. We must link the guards in the side exits. 1 side exit only.
Attachment #372150 - Flags: review-
There's still only one side exit, and I don't see how you're going to avoid this when each side exit has to intersect registers and so on with different points in the trace.
Attachment #372150 - Flags: review-
Comment on attachment 372150 [details] [diff] [review]
Manually clone reused exits

Withdrawing my opposition. Got confused by the naming scheme. Waldo agrees its not good and will make up something cuter.
Comment on attachment 372150 [details] [diff] [review]
Manually clone reused exits

I think I *mostly* understand the evolution of this bug and the associated patch. Will r+ when a new variant with more pleasant naming arrives.
Attachment #372150 - Flags: review?(graydon)
Sorry this got a little big, but it's definitely better than the previous patch -- we want to use side exits in the APIs, and guard records should pretty much be an internal detail.
Attachment #372150 - Attachment is obsolete: true
Attachment #372755 - Flags: review?(graydon)
Attachment #372755 - Attachment is obsolete: true
Attachment #372757 - Flags: review?(graydon)
Attachment #372755 - Flags: review?(graydon)
Instruction seems like deadwood, it's long, and overloaded (machine, LIR, abstract high level?). Just createGuard and createExit would do. Contrast with TraceRecorder::new{Array,String}, which emit code for those kinds of in-JS new expressions (so create is cool, or an even contrastier verb).

/be
Attached patch Newest and bestest naming (obsolete) — Splinter Review
As discussed on IRC.
Attachment #372757 - Attachment is obsolete: true
Attachment #372768 - Flags: review?(graydon)
Attachment #372757 - Flags: review?(graydon)
Comment on attachment 372768 [details] [diff] [review]
Newest and bestest naming

I believe the guard in ::guardCallee ought to actually *not* reuse the BRANCH_EXIT previously established, but rather a second exit with identical state but different code: MISMATCH_EXIT. 

Exiting when someone gives you a callee different from the one you expected is cause for an attempted extension. That's BRANCH_EXIT. Exiting because someone gave you a non-function is *not* cause for an attempted extension. That should be MISMATCH_EXIT.

(It turns out this isn't strictly necessary since any attempted extension will abort when it notices it's not got a function, blacklisting the extension-point, but I think that's a bad thing to rely on)

I wonder if changing that obviates the point of the rest of the patch? I'm sorry if so. I quite like the cleanup in the patch otherwise.

I also can't see in here which hunk in particular is responsible for recovering the slowdown that you initially observed. Can you point it out?
(In reply to comment #38)
> (From update of attachment 372768 [details] [diff] [review])
> I believe the guard in ::guardCallee ought to actually *not* reuse the
> BRANCH_EXIT previously established, but rather a second exit with identical
> state but different code: MISMATCH_EXIT. 

Ah, good point, should have thought about that myself.

> I wonder if changing that obviates the point of the rest of the patch? I'm
> sorry if so. I quite like the cleanup in the patch otherwise.

Well, if you had a loop where a called function always had the same private but potentially a different parent depending on mutations to it, you could observe the slowdown.  That's not particularly likely to happen (I'm not entirely sure how it could happen, isn't that going to abort calling across globals?  an experiment with clone was hitting that abort), so it kind of obviates, but I think this suspect this would have bitten us down the line eventually anyway, so it's not unnecessary.

> I also can't see in here which hunk in particular is responsible for
> recovering the slowdown that you initially observed. Can you point it out?

It's spread around a bit.  The key is pre-patch that exit->addGuard(rec) is called only once per VMSideExit, when really it should be called once per guard that uses a snapshot.  The result is that when a snapshot is reused N times pre-patch, SideExit.guards is rec, and rec->next is always NULL.  If we were calling addGuard for every guard, SideExit.guards would be the last guard's record, SideExit.guards->next would be the second-to-last guard's record, and so on.

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp

>-    /* We couldn't find a matching side exit, so create our own side exit structure. */
>-    LIns* data = lir->skip(sizeof(GuardRecord) +
>-                           sizeof(VMSideExit) +
>-                           (stackSlots + ngslots) * sizeof(uint8));
>-    GuardRecord* rec = (GuardRecord*)data->payload();
>-    VMSideExit* exit = (VMSideExit*)(rec + 1);
>-
>-    /* Setup guard record structure. */
>-    memset(rec, 0, sizeof(GuardRecord));
>-    rec->exit = exit;
> 
>     /* Setup side exit structure. */
>     memset(exit, 0, sizeof(VMSideExit));
>     exit->from = fragment;
>     exit->calldepth = callDepth;
>     exit->numGlobalSlots = ngslots;
>     exit->numStackSlots = stackSlots;
>     exit->numStackSlotsBelowCurrentFrame = cx->fp->callee
>         ? nativeStackOffset(&cx->fp->argv[-2])/sizeof(double)
>         : 0;
>     exit->exitType = exitType;
>-    exit->addGuard(rec);
>     exit->block = fp->blockChain;
>     exit->pc = pc;
>     exit->imacpc = fp->imacpc;
>     exit->sp_adj = (stackSlots * sizeof(double)) - treeInfo->nativeStackBase;
>     exit->rp_adj = exit->calldepth * sizeof(FrameInfo*);
>     memcpy(getFullTypeMap(exit), typemap, typemap_size);

Pre-patch, reusing a snapshot would result in rec/exit above both existing only once in sibling trace data -- but each guard would be compiled into a separate series of instructions, and each would have its own series of instructions for code in the corresponding exit.  With only one GuardRecord in sight, nothing was keeping track of multiple guards all using the same side exit, so trying to patch any reused snapshot would patch only its first use and not subsequent ones -- so if the conditions that caused a guard for a secondary or later snapshot to fail repeated, the patching would have no effect.
Attachment #372768 - Attachment is obsolete: true
Attachment #372995 - Flags: review?(graydon)
Attachment #372768 - Flags: review?(graydon)
Comment on attachment 372995 [details] [diff] [review]
Mismatch-exit if object isn't js_FunctionClass

Looks good, thanks for all the explaining.
Attachment #372995 - Flags: review?(graydon) → review+
http://hg.mozilla.org/tracemonkey/rev/e8588a4a1153

with bug 488781 as a followup to reclaim some performance which this patch might or might not have lost (I was seeing slowdowns, but there was considerable variance between the two SunSpider runs [of 100x each] that I did, so it's hard to say how much slowdown there actually was):

http://hg.mozilla.org/tracemonkey/rev/c59fd9494490
http://hg.mozilla.org/tracemonkey/rev/6ab042310fb4
Whiteboard: [sg:critical?] → [sg:critical?] fixed-in-tracemonkey
This changeset seems to be what's sending 1.9.1's Mac box red:

  ld: duplicate symbol _STOBJ_GET_CLASS in jsj_JavaArray.o and jsj_JSObject.o
  collect2: ld returned 1 exit status
And Linux as well.
JS_ALWAYS_INLINE JSClass*
STOBJ_GET_CLASS(const JSObject* obj)
{
    return (JSClass *) (obj->classword & ~JSSLOT_CLASS_MASK_BITS);
}       

Waldo, need a static before JS_ALWAYS_INLINE?

/be
(In reply to comment #45)
> JS_ALWAYS_INLINE JSClass*
> STOBJ_GET_CLASS(const JSObject* obj)
> {
>     return (JSClass *) (obj->classword & ~JSSLOT_CLASS_MASK_BITS);
> }       
> 
> Waldo, need a static before JS_ALWAYS_INLINE?

I pushed 

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/e4a20466cccc

since that worked for me.
Blah, sorry about that, should have remembered that originally.  Oh, LiveConnect, how I hate thee...
http://hg.mozilla.org/mozilla-central/rev/c59fd9494490 plus all changes are in.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
js1_8_1/trace/trace-test.js	
http://hg.mozilla.org/tracemonkey/rev/61892f57b46a
Flags: in-testsuite+
v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
cvsroot/mozilla/js/tests/js1_8_1/trace/trace-test.js,v  <--  trace-test.js
new revision: 1.14; previous revision: 1.13

/cvsroot/mozilla/js/tests/shell.js,v  <--  shell.js
Group: core-security
Flags: wanted1.9.0.x-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: