Closed Bug 626398 Opened 13 years ago Closed 13 years ago

A decision to abort in the tracer gets lost

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: h4writer, Assigned: n.nethercote)

References

Details

(Whiteboard: see comment 21, fixed-in-tracemonkey)

Attachments

(1 file, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; Linux i686; rv:2.0b10pre) Gecko/20110114 Firefox/4.0b10pre
Build Identifier: 

In bug #625494 I stumbled on a bug. I got a "unexpected constantly false guard detected" on a guard. Now first I thought it was problem with my patch, but after examination the error is reproducible without my patch in a lookupswitch:

Testcase:
function g(n) {
  switch (n) {
    case 0:
    case "1": 
	print("case "+n+":  "+tracemonkey.onTrace)
	break;
    default:
	print("default: "+tracemonkey.onTrace)
	break;
  }
}

for (let i = 0; i != 30; i++) {
  g(i%4/2);
}

Output when run with -j:
case 0:  false
default: false
default: false
default: false
case 0:  false
default: false
default: false
default: false
Assertion failure: unexpected constantly false guard detected, at jstracer.cpp:4343
Aborted

The problem arise on the following guard:

guard(true, w.name(w.eqd(v_ins, w.immd(d)), "guard(switch on numeric)"), BRANCH_EXIT);

So only when the trace get's recorded and the input argument is 0 (non constant) and the type of the argument in de tracerecord is double.

I think there is a wrong check in the guard that makes the "eqd" a constant, but it really isn't.


Reproducible: Always
Attached patch Patch (obsolete) — Splinter Review
Ok I have found the problem. The problem is in the tracerecorder of \, so in the alu on LIR_divd. 

There is this guard:

/* Don't lose a -0. */
guard(false, w.eqi0(result), exit);

Now this also guards when result is a normal integer 0. This is wrong. The right check is:

/* Don't lose a -0. */
guard(false, w.ltiN(result, 0), exit);
Attachment #504707 - Flags: review?
Attached patch Patch (obsolete) — Splinter Review
Sorry was a bit to fast. The previous patch also guarded all negative integers. But it should only guard negative 0. So this is the updated patch.
Attachment #504707 - Attachment is obsolete: true
Attachment #504711 - Flags: review?
Attachment #504707 - Flags: review?
Attachment #504711 - Flags: review? → review?(nnethercote)
Attached patch Patch using MaybeBranch (obsolete) — Splinter Review
Well in "modd" there is also a check on negative zero. There a MaybeBranch is used. So I copied that. Not sure what is better? So I provide here the MaybeBranch method.
Attachment #504773 - Flags: review?
haytjes, you want to request review from someone specific.  I'd recomment ":njn".
Status: UNCONFIRMED → NEW
Ever confirmed: true
It seems like we are emitting several guards here (in this LIR_divd code) that could be constantly false. The easiest thing to do would be: before emitting such a guard, check that the current values would pass the guard. If not, give up recording.
(In reply to comment #5)
> It seems like we are emitting several guards here (in this LIR_divd code) that
> could be constantly false. The easiest thing to do would be: before emitting
> such a guard, check that the current values would pass the guard. If not, give
> up recording.

I suspect that's not enough.  Nanojit has become smarter about detecting tests that always succeed/fail by looking at earlier guards.  Eg. if you have:

  xt c

then you know that afterwards 'c' must be false.
@bz: k, I'll do that next time, ty for info.

@njn: I went through the code again of LIR_divd and seems like the code don't emit several guards that could be constantly false. (Like reported in comment 5) For the guard on line 8366 there is a check on the current values on line 8327 (jstracer.cpp). So the only possible constantly false check is the "0" guard. That should be a -0 check. So my patch is still valid.
Also your comment about not being enough. It will be enough. The problem arose because the guard made sure v_ins couldn't/shouldn't be (+/-)0. But the recorder was recording when v_ins was 0. In the switchop a guard is set for the case it is currently recording. So in this case 0. The guard made sure v_ins should be 0. There the contradiction came ("v_ins not 0" and "v_ins must be 0").
Attachment #504711 - Attachment is obsolete: true
Attachment #504711 - Flags: review?(nnethercote)
Attachment #504773 - Flags: review? → review?(nnethercote)
(In reply to comment #6)
> >                      [...] The easiest thing to do would be: before emitting
> > such a guard, check that the current values would pass the guard. If not, give
> > up recording.
> 
> I suspect that's not enough.  Nanojit has become smarter about detecting tests
> that always succeed/fail by looking at earlier guards.

But a condition that is true at record time can't be statically false, no matter how smart Nanojit is.

(What the TraceRecorder records is the interpreter's actions and decision points. So the static assumptions baked into the trace must be true of the record time state at every point, or we have a tracer bug.

This rule is baked into the tracing jit very deeply. For example, suppose we trace `x = ""`. After that we'll emit code that assumes x is a string, but we don't actually track this static assumption anywhere in the TraceRecorder. Rather each time we need to emit code involving x, we look at the record-time value of x to see what type it is. The static assumptions are "stored" in the interpreter state.)
I just saw this assertion with http://www.instalki.pl/programy/download/Windows/odczyt_edycja_pdf/Adobe_Reader.html on 64bit Linux but I haven't been able to reproduce it locally. I assume the appearance of this assert in crash testing is due to yesterday's merge.
Summary: Tracing a double 0 gives "unexpected constantly false guard detected" → Assertion failure: unexpected constantly false guard detected Tracing a double 0
(In reply to comment #9)
> I just saw this assertion with
> http://www.instalki.pl/programy/download/Windows/odczyt_edycja_pdf/Adobe_Reader.html
> on 64bit Linux but I haven't been able to reproduce it locally. I assume the
> appearance of this assert in crash testing is due to yesterday's merge.

I can't reproduce it on that site. And it is very well possible that the problem is somewhere else then in this bug. If you find a (reproducible) testcase, it may help.
We should take this after nick reviewed it.
I tried to reproduce but could not on the original url. I could reproduce it regularly when spidering the site from the original url but it kept happening at different urls. I assume it is ad related, but I haven't been able to get a reproducible chunk of page to reduce yet. Sorry.
bc, this assertion can come up in a range of circumstances, it's probably a different case that you're seeing to the one haytjes is seeing.
I can't reproduce the problem in comment 0 -- my shell complains about there being no such object called 'tracemonkey'.

As for this guard:

> /* Don't lose a -0. */
> guard(false, w.eqi0(result), exit);

I can see that it's more pessimistic than necessary -- it'll side-exit if the result is 0 or -0.  But I don't think that's a correctness problem.  Eg. this program runs fine:

function f(i) {
    var x = 0 / -i;
    if (x === 0 && (1/x) === -Infinity)
        print("negzero");
    else
        print("other");
}

for (var i = -20; i < 10; i++) {
    f(i);
}
(In reply to comment #14)
> my shell complains about there being no such object called 'tracemonkey'.

Run with -j to get that object.  No -j, no tracemonkey.
(In reply to comment #15)
> 
> Run with -j to get that object.  No -j, no tracemonkey.

Ah, yes.  Thanks.  But now I can't reproduce the assertion failure, on 32-bit or 64-bit.
(In reply to comment #16)
> Ah, yes.  Thanks.  But now I can't reproduce the assertion failure, on 32-bit
> or 64-bit.

Yeah, the testcase doesn't work anymore. This is because of the fix in bug #631788. In this fix the tableswitch doesn't assert anymore when the guard value==0 is done twice. (because of abortIfAlwaysExits).

So the symptom of this bug is gone. Not the real issue. I tried to find another example but didn't succeed yet...

(In reply to comment #14)

> I can't reproduce the problem in comment 0 -- my shell complains about there
> being no such object called 'tracemonkey'.
> 
> As for this guard:
> 
> > /* Don't lose a -0. */
> > guard(false, w.eqi0(result), exit);
> 
> I can see that it's more pessimistic than necessary -- it'll side-exit if the
> result is 0 or -0.  But I don't think that's a correctness problem.  Eg. this
> program runs fine:
> 
> function f(i) {
>     var x = 0 / -i;
>     if (x === 0 && (1/x) === -Infinity)
>         print("negzero");
>     else
>         print("other");
> }
> 
> for (var i = -20; i < 10; i++) {
>     f(i);
> }

The bug isn't a correctness bug. It will always give the right result. The issue is the 'speed' (minor):

Currently:
1) a trace get's recorded that contains: i/3 or i/4 or ...
2) a guard will get added that guards if result is '0' or '-0'
3) the trace get's run a few times

Result:
If a 0 is encountered the code will guard and go to the interpreter till it has run enough times and record a new trace.

Expected:
2) guard only when result is '-0'

Expected result:
If a 0 is encountered the code just run further in the trace. No new trace should get recorded.
Comment on attachment 504773 [details] [diff] [review]
Patch using MaybeBranch

Seems fine, esp. since it mirrors the code for LIR_modi.  I'll do some profiling with Sunspider and V8 before landing.  Thanks, and sorry I took so long to process this bug.
Attachment #504773 - Flags: review?(nnethercote) → review+
Summary: Assertion failure: unexpected constantly false guard detected Tracing a double 0 → Trace code for integer div side-exits on 0 or -0 result, but doesn't need to on 0
Whiteboard: see comment 14 and 17
Let me adjust my statement. the symptom of this bug is not gone, the symptom is just something else. So the testcase is still valid. I give a better testcase here:

function g(n) {
  switch (n) {
    case 0:
    print("case   : "+n);
    break;
    default:
    print("default: "+n)
    break;
  }
}

for (let i = 0; i != 30; i+=2) {
  g(i%4/2);
}


Ran with -j:
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
case   : 1
case   : 0
case   : 1
case   : 0
case   : 1
case   : 0


Expected result:
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0
default: 1
case   : 0


Explanation:

The trace starts after 'x' iterations. Variable i is 0 at that moment.
1) LIR_divd gets recorded. So a guard on i==0 is added. This happens when the actual value of i equals 0 !!!
2) switchop gets recorded. So a guard on i==0 is added. Because of (1) this check is discarded. 

The trace then runs with i equals 1
1) The first guard does nothing. i equals 1
2) Here there should have been a guard, but there isn't one. So the code for i is 0 is ran instead of the code when i equals 1.

@njn: I noticed you updated the code of alu to tryToDemote. Is it because of that this patch isn't committed yet? I'll reapply the patch to the newest trunk.
Attached patch Updated patch using MaybeBranch (obsolete) — Splinter Review
Attachment #504773 - Attachment is obsolete: true
Attachment #520583 - Flags: review?(nnethercote)
Attached patch patchSplinter Review
haytjes, your explanation isn't quite correct but it led me to the correct explanation.

This one's my fault, I screwed up in bug 631788 -- I forgot to check the result of a call to guard().  So what was happening was that the tracer decided to abort -- because in the DIV we exited if the div result was zero, and in the SWITCH we exited if the div result was non-zero, so we clearly always exit -- but the decision to abort was lost due to the missing check, so we kept compiling.
Assignee: general → nnethercote
Attachment #520583 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #520591 - Flags: review?
Attachment #520583 - Flags: review?(nnethercote)
Blocks: 631788
Summary: Trace code for integer div side-exits on 0 or -0 result, but doesn't need to on 0 → A decision to abort in the tracer gets lost
Whiteboard: see comment 14 and 17 → see comment 21
Attachment #520591 - Flags: review? → review?(dmandelin)
Owh, my fault. In that case I switch back to my first statement. The symptom is more or less gone. But the patch is actually still valid for this bug.

When guarding in a trace, the same check has to be done on the current value it is tracing on. 

So in this case there is a check for -0:

        if (jsint(r) != r || JSDOUBLE_IS_NEGZERO(r))
            goto undemotable;

And the corresponding guard should be:

        /* Don't lose a -0. */
        if (MaybeBranch mbr = w.jf(w.eqi0(result))) {
            guard(false, w.ltiN(d0, 0), exit);
            w.label(mbr);
        }

Currently the guard is

        guard(false, w.eqi0(result), exit);

But that not only takes -0, but also 0 in account. And there is no check if the value is '0'. So potentially we create a trace on '0', but that would guard when running. Therefor the guard should get specialized to only take -0 in account.
To clarify:  are you just reiterating what comment 14 said (the guard is more pessimistic than necessary, but is correct in its current form), or something else?  If it's the former, I can merge my correctness patch with your patch in order to generate slightly better code.
(In reply to comment #23)
> To clarify:  are you just reiterating what comment 14 said (the guard is more
> pessimistic than necessary, but is correct in its current form), or something
> else?  If it's the former, I can merge my correctness patch with your patch in
> order to generate slightly better code.

Yeah, I was just reiterating comment 14.
Attachment #520591 - Flags: review?(dmandelin) → review+
Attachment #504773 - Flags: review+ → review-
haytjes, this code isn't valid for div:

        /* Don't lose a -0. */
        guard(false, w.eqi0(result), exit);
        if (MaybeBranch mbr = w.jf(w.eqi0(result))) {
            guard(false, w.ltiN(d0, 0), exit);
            w.label(mbr);
        }

It will do the wrong thing for 0 / -1, the result of which is -0 and so requires a side-exit.

The reason that code is valid for mod is that the sign of a mod operation in JavaScript depends solely on the LHS.  This isn't true for div.
Sry, my code is indeed wrong. I should have checked the RHS instead of the LHS. For certainty I took the ECMA262 with me to find when a -0 can occur.

1) LHS is 0 and the RHS has the opposite sign.
So in this case RHS < 0.

2) LHS is -0 and the RHS has the opposite sign.
LHS shouldn't be -0, because that is not an integer.

3) inf/inf
LHS nor RHS should be integer, so can't be inf.

So the patch should be:
        /* Don't lose a -0. */
        guard(false, w.eqi0(result), exit);
        if (MaybeBranch mbr = w.jf(w.eqi0(result))) {
            guard(false, w.ltiN(d1, 0), exit);
            w.label(mbr);
        }

At the moment I'm not an my normal computer, so I can't build it to test and provide a patch, sry.
haytjes, I just pushed the missed-abort fix, but I didn't optimize the guard.  I did this because this bug has been so messy, with various incorrect diagnoses, incorrect patches, and an early inappropriate r+.  (Not that I'm complaining;  these things happen sometimes and I'm as much to blame as anyone :)  And fixing the correctness bug is more important IMHO than a slight optimization.

So I'm being really conservative and just fix the correctness issue right now.
If you still want to optimize the guard, can you file a follow-up bug and CC me?

http://hg.mozilla.org/tracemonkey/rev/7081b38c04b7

Thanks for your persistence on this bug, and sorry for all the delays on my end.
Whiteboard: see comment 21 → see comment 21, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/7081b38c04b7
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: