Closed Bug 607856 Opened 9 years ago Closed 9 years ago

long hang loading page

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: tnikkel, Assigned: njn)

References

()

Details

(Whiteboard: fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin)

Attachments

(1 file)

blocking2.0: --- → ?
With build: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101026 Firefox/4.0b8pre ID:20101027142247 I cannot reproduce, in fact it loads almost instantly.  Could this be due to a slower connection or possibly an add-on?

~B
Still happens after disabling all addons, on two different computers, Windows and Linux. Connection can't be the issue because I used the same connection to bisect the issue.
Bisected to rev 5feb11557dae bug 600127.
Blocks: 600127
(In reply to comment #2)
> Still happens after disabling all addons, on two different computers, Windows
> and Linux. Connection can't be the issue because I used the same connection to
> bisect the issue.

I can't reproduce either. What build are you using?
(In reply to comment #4)
> 
> I can't reproduce either.

Me neither.
I reproduced using the 2010-10-27 Windows and Linux nightlies. And I bisected using my own Linux debug builds.
It doesn't seem to happen in a fresh profile, but it happens in at least three different profiles on two different machines. Only one of the profiles is the one I use regularly, the others are just testing, not too much changed.
So I had javascript.options.methodjit.content set to false, and that seems to be causing it.

It also happens on machines where the method JIT is disabled due to not having SSE2.
(In reply to comment #8)
> So I had javascript.options.methodjit.content set to false, and that seems to
> be causing it.
> 
> It also happens on machines where the method JIT is disabled due to not having
> SSE2.

That sounds plausible -- the change you bisected to was a change in the trace JIT, and if the method JIT is disabled the trace JIT will be used more.  I'll try to reproduce that way on Monday.  Thanks for the additional info.
I can reproduce this on Linux64, and can confirm that backing out 600127's patch makes it go away.
Assignee: general → nnethercote
Status: NEW → ASSIGNED
I diffed the LIR dumps from the two runs, this looks like the culprit.  Without
the patch from bug 600127 we have this code:

  ------------------------------ # JSOP_IFEQ
      mLengthAndFlags3 = ldq.str andq1[0]
      length3 = rshuq mLengthAndFlags3, vplen3/*4*/
      eqq1 = eqq length3, NULL2/*0LL*/
      xf1: xf eqq1 -> pc=0x7f2e350737ba imacpc=(nil) sp+8 rp+0 (GuardID=093)
  ------------------------------ # JSOP_GETARG
      stq.sp sp[0] = andq1
  ------------------------------ # JSOP_AND
  ------------------------------ # JSOP_IFNE
      xt1: xt eqq1 -> pc=0x7f2e350737fb imacpc=(nil) sp+8 rp+0 (GuardID=095)
      x1: x  -> pc=0x7f2e35073273 imacpc=(nil) sp+0 rp+0 (GuardID=096)


When the patch is added, it changes to this:

 ------------------------------ # JSOP_IFEQ
      mLengthAndFlags3 = ldq.str andq1[0]
      length3 = rshuq mLengthAndFlags3, vplen3/*4*/
      eqq1 = eqq length3, NULL2/*0LL*/
      xf1: xf eqq1 -> pc=0x7f2657e487ba imacpc=(nil) sp+8 rp+0 (GuardID=092)
  ------------------------------ # JSOP_GETARG
  ------------------------------ # JSOP_AND
  ------------------------------ # JSOP_IFNE
      x1: x  -> pc=0x7f2657e48273 imacpc=(nil) sp+0 rp+0 (GuardID=093)

After 'xf1' Nanojit concludes that 'eqq1' must be true, otherwise we would
have exited.  So when it sees 'xt eqq1' it decides (correctly) that the
guard cannot ever exit.

The notable thing is that 'xt1' is marked with sp+8, which means it
keeps stores to sp[0] alive.  Therefore it keeps the 'stq' alive.  So when
it's removed, the 'stq' becomes dead.  That's why it doesn't appear in the
second version of the code.

Now, the question is this:  in general, if a guard is never taken, is it
safe to remove it?  Currently we assume it is -- and we don't have a choice,
because ExprFilter::insGuard() returns NULL for never-taken guards.  But it's
clear in this case that doing so wasn't safe.  And note that bug 582766 was
another case where I removed a seemingly dead guard, which caused problems
because it was keeping stores alive.

It's worth pointing out that this code is a bit odd -- there's some special
treatment of loop termination code that I don't entirely understand.  See
bug 573912 comment 2 and onwards.  We do weird stuff with guards at the end
of loops and I've now twice paid the price for it.

In other words, bug 600127 is correct, it just exposed a latent problem by
improving the optimization of guard conditions.

gal, dvander -- any thoughts on this?  We need a clear answer on the
removability of guards.  Ie. either we should add LIR_xn or ensure that
guards are always added in a manner that is safe to remove.  I suspect
the former will be easier and less error-prone.

(Nb: the code fragment is for
http://hgbook.red-bean.com/support/jquery-min.js:23, which is unfortunately
a minified nightmare so I don't have a standalone test case yet.)
I was planning to go ahead with adding LIR_xn, but now I think it's a bad idea.  It's the "guards are sharp and mysterious so we better be careful with them" approach.  If a guard is never taken, it should be removable;  something's wrong if that's not true.

Looking again at the example in comment 11, we need 'xt1' to be present in order to keep the 'stq' alive.  But that's totally backwards -- in general, the presence of guards implies the need for stack stores, not the other way around.

So I think there's something bogus about the if-fusing done at the loop's end.
Perhaps the bogosity is that it assumes that conditional guards are never optimized away.  Looking at this fragment:

 ------------------------------ # JSOP_IFNE
      xt1: xt eqq1 -> pc=0x7f2e350737fb imacpc=(nil) sp+8 rp+0 (GuardID=095)
      x1: x  -> pc=0x7f2e35073273 imacpc=(nil) sp+0 rp+0 (GuardID=096)

It doesn't make sense to me to have two consecutive guards with different sp+n values.  If x1's value was sp+8 that would fix this particular case.
Hmm, everyone can ignore most of comment 10 and comment 11.  I think I've
worked out the real problem.  Here's the code for emitIf():

JS_REQUIRES_STACK void
TraceRecorder::emitIf(jsbytecode* pc, bool cond, LIns* x)
{
    ExitType exitType;
x   if (IsLoopEdge(pc, (jsbytecode*)tree->ip)) {
x       exitType = LOOP_EXIT;

        /*
         * If we are about to walk out of the loop, generate code for the
         * inverse loop condition, pretending we recorded the case that stays
         * on trace.
         */
x       if ((*pc == JSOP_IFEQ || *pc == JSOP_IFEQX) == cond) {
x           JS_ASSERT(*pc == JSOP_IFNE || *pc == JSOP_IFNEX || *pc == JSOP_IFEQ || *pc == JSOP_IFEQX);
x           debug_only_print0(LC_TMTracer,
x                             "Walking out of the loop, terminating it anyway.\n
");
x           cond = !cond;
x       }

        /*
         * Conditional guards do not have to be emitted if the condition is
         * constant. We make a note whether the loop condition is true or false
         * here, so we later know whether to emit a loop edge or a loop end.
         */
x       if (x->isImmI()) {
            pendingLoop = (x->immI() == int32(cond));
            return;
x       }
    } else {
        exitType = BRANCH_EXIT;
    }
    /*
     * Put 'x' in a form suitable for a guard/branch condition if it isn't
     * already.  This lets us detect if the comparison is optimized to 0 or 1,
     * in which case we avoid the guard() call below.
     */
x   ensureCond(&x, &cond);
x   if (!x->isImmI())
        guard(cond, x, exitType);
}

The lines marked 'x' are the ones executed in the problematic case.  In
particular, the |x->isImmI()| fails and then the |!x->isImmI()| succeeds.  This is because ensureCond() changes 'x' into an equivalent form
that happens to be have been previously encountered in a manner that allows
it to be optimized into an immediate.

So I think bug 604297 is ultimately at fault, because it inserted that
ensureCond().  But the code that existed before that was pretty fragile.  If
you move the ensureCond() call inside the |!x->isImmi()| branch I get
assertion failures caused by always-taken guards.

So emitIf() needs to be tweaked in such a manner that the always-taken
guards don't occur, but in a way that doesn't affect it's current fragile
behaviour.
I'm a little rusty on my tracerology, but this comment in emitIf() looks sketchy:

        /*
         * Conditional guards do not have to be emitted if the condition is
         * constant.

Sometimes they do have to be emitted, otherwise it could kill stores that would otherwise be sunk by an always-taken guard. This seems plausible:

 (1) JSOP_AND calls emitIf().
 (2) The condition is converted to a constant guard.
 (3) The constant guard is not emitted.
 (4) The interpreter pops the stack, and moves onto JSOP_IFNE.
 (5) The next opcode, JSOP_IFNE, performs the end-of-trace guard. By this time the store which should have been sunk is no longer live.

But it *is* okay to remove this guard. It's not really a guard at all, it's just a random condition.
Here is a minimal test case, reconstructed from the bytecode in comment #11. It will iloop.

// vim: set ts=4 sw=4 tw=99 et:
function f(a) {
    var k = "asdf";
    while (k && a--) {
        if (a == 5) {
            k = "";
            if (k) { }
            continue;
        }
        print('hi');
        if (k) { }
    }
}
f(4);
f(4);
f(4);
f(5);
f(25);
f(25);
f(6);
checkTraceEnd() forcefully resets cx->regs->sp to the original |sp|. That's super scary, since it's assuming that everything above the entry sp is dead (which is not necessarily true, like here).

I vote for figuring out why checkTraceEnd() is doing shenanigans.. but we could also change emitIf() to not remove guards if the op is JSOP_IFNE?
Great detective work, dvander, thanks.

> checkTraceEnd() forcefully resets cx->regs->sp to the original |sp|. That's
> super scary, since it's assuming that everything above the entry sp is dead
> (which is not necessarily true, like here).

> I vote for figuring out why checkTraceEnd() is doing shenanigans.. but we could
> also change emitIf() to not remove guards if the op is JSOP_IFNE?

Is IFNE different to IFEQ in this respect?
emitIf() is way too fragile for my tastes, BTW, I'm not surprised it's dodgy.
Yeah, IFEQ can never result in a loop, so it should just keep recording. IFNE either ends the trace (closing or aborting) or compiles a tree call, and if it ends the trace it should not kill random stack slots on the way out.

emitIf is pretty scary but it's not clear whether it or checkTraceEnd is at fault. Andreas, any thoughts?
blocking2.0: ? → betaN+
This is a workaround patch;  it fixes the hang but doesn't address the deeper emitIf() problems.  (I'll file a new bug for them.)

- Moves the ensureCond() call in emitIf() in order to restore exactly the behaviour from before the bug 604297 patch.

- Disables the condition optimizations from bug 600127.
Attachment #487705 - Flags: review?(dvander)
Attachment #487705 - Flags: review?(dvander) → review+
I filed bug 609129 as a follow-up to investigate this and fix it properly.
nnethercote http://hg.mozilla.org/tamarin-redux/rev/70e923d12442
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin
Comment on attachment 487705 [details] [diff] [review]
patch (against TM 56615:2098bd53381e)

Does the LIR.cpp change really address the root cause?
(In reply to comment #24)
> 
> Does the LIR.cpp change really address the root cause?

No, it was just a stop-gap.  Bug 609129 fixed it properly and re-enabled the implicit guard optimization.
Duplicate of this bug: 609682
Duplicate of this bug: 610386
http://hg.mozilla.org/mozilla-central/rev/1ce507b5289d
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Duplicate of this bug: 552406
You need to log in before you can comment on or make changes to this bug.