Closed Bug 620757 Opened 14 years ago Closed 13 years ago

TM: don't trace JSOP_TABLESWITCH

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- .x+

People

(Reporter: chris, Unassigned)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13

Open the attached html file. It runs a purely mathematical calculation 1000 times and prints the unique results. There should only be one unique result, but on FF when the jit is enabled there will usually be two or three.

Reloading the page will sometimes give different results.

Disabling javascript.options.jit.content will fix the problem.

Uncommenting the empty eval('') in the code will fix the problem.

Reproducible: Always

Steps to Reproduce:
1. Load the attached file
2. Compare with another browser or with jit disabled
3. Reload to see that the results can vary from run to run
Actual Results:  
Example buggy output:
50000 120000 46498 11625 46498 23249 258074 120000 55798 166490 11625 34874 27899 194389
50000 120000 46498 0 46498 23249 258074 120000 55798 166490 23249 23249 55797 222288
50000 120000 46498 0 46498 23249 258074 120000 55798 166490 73246 23249 817397796 817564287

Expected Results:  
50000 120000 46498 11625 46498 23249 258074 50000 23249 96751 11625 34874 11625 108375
Status: UNCONFIRMED → NEW
Ever confirmed: true
This WFM in 4.0b8-ish tracemonkey optimized self-build.

Chris, if you can confirm this is fixed in Firefox 4 betas, then we can bisect through the build archive to identify what fixed the bug.

/be
Yes, WFM in OS X 4.0b7
Attached file Shell test case
This still fails with -j, but works with -j -m.
This works with trunk for me, but that might be due to the rebalancing of method JIT and trace JIT, so we need confirmation 2.0 isn't affected.
Thanks Jan, that confirms #5.
Setting blocking flag to get it on the radar.
blocking2.0: --- → ?
I reduced Jan's test as much as I could.  The result is this:

function calc() {
  res = [];
  var fs = [[0, 0], [0, 0], [0, 1]];

  for (var i = 0; i < fs.length; i++) {
    var f = fs[i];
    var gf0 = g(f[0]);
    var gf1 = g(f[1]);
    res[i] = snd(gf0, gf1)
  } 
  return res
}       

function g(n) {
  switch (n) {
    case 0: return 88;
    case 1: return 1;
  } 
}   
        
var snd = function(x, y) { return y };
    
for (i = 0; i < 5; i++) {
  print(calc());
}


With -m I get this output:

88,88,1
88,88,1
88,88,1
88,88,1
88,88,1

With -j I get this output:

88,88,1
88,88,1
88,88,1
88,88,88
88,88,88

There are three trace fragments created.  The first two are for line 15 (the switch) and are identical except for differences in addresses, which is odd;  I guess it's because g() is called twice?  The last fragment is for line 5.  I suspect this is a trace tree construction issue.  I tried turning off all optimizations within Nanojit but that didn't change anything.
Attached file diff
If I change the |f[0]| in comment 8 to |f[1]| the bug goes away.  I did a TMFLAGS=recorder,tracer run of the two variants, stripped out addresses, and have attached the diff of the results.

It gets interesting around line 114, where in the non-buggy run we see "synthesized shallow frame for a.js:15@0".  That doesn't occur in the buggy run.  It looks like the calldepth value is 0 in the buggy run and 1 in the non-buggy run.
Attachment #499171 - Attachment is patch: false
I should add:  in the non-buggy run we end up attaching another branch to the trace tree.
Since this is (a) not a regression, and (b) the test case from comment 0 now works because the methodjit executes it instead of the tracejit, I suggest it should be wanted2.0 but not blocking2.0.
status2.0: --- → ?
Minusing per comment 11.
blocking2.0: ? → -
Blocking 2.x per njn and bz on IRC.
blocking2.0: - → .x
status2.0: wanted → ---
Here's a smaller reproducer, courtesy of h4writer on IRC:

function calc() {
  var b = 0;
  var fs = [0, 0, 1];
  var ret;

  for (var i = 0; i < fs.length; i++) {
    g(b);
    ret = g(fs[i]);
  } 

  return ret;
}       

function g(n) {
  switch (n) {
    case 0: return 88;
  } 
  return 1;
}

for (i = 0; i < 4; i++) {
  print(calc());
}


-m output:

1
1
1
1

-j output:

1
1
1
88
I could even reduce it even more, so it doesn't use the array fs:

function calc() {
  var blaat = 0;
  var ret;

  for (var i = -1; i != 2; i++) {
    g(blaat);
    ret = g(i);
  }
  return ret;
}       

function g(n) {
  switch (n) {
    case 0: return 88;
  } 
  return 1;
}

for (i = 0; i < 4; i++) {
  print(calc());
}

It gives the same output as listed in previous comment.
The problem arise when i becomes 1 in the 'for' loop, then g(1) is set in the return value and it get's matched with 0 due some strange error...
I adjusted to code even further and it gives some extra information.
So I adjusted the values so the loop goes from 0 to 3 now. (Not starting at -1 anymore).

function calc() {
  var b = 2;

  for (var i = 0; i != 3; i++) {
    g(b);
    g(i);
  }
  print("---");
}       

function g(n) {
  switch (n) {
    case 2: break;
    default: print(n);
  }
}

for (i = 0; i < 4; i++) {
  calc();
}

The output should be (-m):
0
1
---
0
1
---
0
1
---
0
1
---

the output with tracemonkey (-t):
0
1
---
0
1
---
0
1
---
0
1
2
---

The switch should make sure that only values get printed that ain't 2. But in this case the '2' get printed the fourth time. So the switch-case doesn't match the '2' properly the fourth time.
(Cautious, I'm a n00b at everything I'm doing here)

So I worked myself through the code of tracing. Some bits were comprehensive, other (most) things totally not. But I noticed that this bug only occurs when a switch gets translated in a tableswitch. In jstracer there is a tracerecorder for the more specific tableswitch and also a tracerecorder for the general switchop. The split for a tableswitch (as the comments say) is to prepare a jump table if needed. From there the mostly the magic begins. I can say when the code does, but don't get the big picture clear (so why what happens).

To find this diff. I just looked to the code in switchop and compared it to the code in the tableswitch. I noticed that switchop (in the numberic variation) used the value itself to guard on. So I adjusted the tableswitch so it also used the value itself. Didn't help. I had to adjust it a bit lower in the code. And the problem disappeared. 

Now I have no idea if I'm just lucky, or if the tracing gets recorded, but never matched or if there are still some other parts in the code that must get adjusted. I'm just dropping this clue here for more skilled persons.
Attachment #502831 - Flags: feedback?(nnethercote)
Comment on attachment 502831 [details] [diff] [review]
Incompete patch solves mismatching?

I don't understand the switch code, I think dmandelin wrote it.
Attachment #502831 - Flags: feedback?(nnethercote) → review?(dmandelin)
(In reply to comment #18)
> Comment on attachment 502831 [details] [diff] [review]
> Incompete patch solves mismatching?
> 
> I don't understand the switch code, I think dmandelin wrote it.

Well, I'm not to sure about the code either and this patch just let it work, but I'm not sure if there are other consequences. Also in bug #625494 I rewrote everything of the switchcase. So this patch is actually obsolete?
Ah I finally found the real problem in the code.

So in the code a tableswitch will add a guard.
This guard will always get taken. So we will always exit here.

On the guard a new trace is recorded (according to the actual case).
So the tableswitch gets traced again (intended) and in the code there is a corresponding if that just let the recording go further (so this time the tableswitch doesn't add code to the trace)

This is the corresponding code:
    jsbytecode* pc = cx->regs->pc;
    /* Starting a new trace after exiting a trace via switch. */
    if (anchor &&
        (anchor->exitType == CASE_EXIT || anchor->exitType == DEFAULT_EXIT) &&
        fragment->ip == pc) {
        return ARECORD_CONTINUE;
    }

Now condition isn't specific enough. It should also check if the tableswitch instruction is the first instruction in the traced code.


I will now explain how this testcase could trigger this bug (just as extra information)

1) V the for-loop in the calc function eventually starts tracing
2) V the function g(b) gets called
3) V the tableswitch get's parsed and the necessary guard is added
4) V the tableswitch forces the current trace to stop
5) V the following code just get run in the interpreter
6) V the tracecode of the for-loop in the calc function runs
7) V guards on 2 (g(b) and b equals 2)
8) V a new trace starts (on the tableswitch's guard)
9) V the tableswitch just returns ARECORD_CONTINUE without adding code to the trace
10) V g(i) gets called
11) X the tableswitch just returns ARECORD_CONTINUE
12) ...

11 -> it returns ARECORD_CONTINUE, because:
- anchor exists (from previous switch: g(b))
- anchor->exitType equals CASE_EXIT (from previous switch: g(b))
- ip equals pc, because it is the same code so on the same ip.

Expected result:
11 -> writes the guard for the tableswitch and stops the recorder
In other words, tableswitch() is assuming that we won't trace the same JSOP_TABLESWITCH again in the new trace. But in this case, we do, because

    function g(n) {
        switch (n) {
          case 1: break;
          default: assertEq(n, 0);
        }
    }

    var b = 1;
    for (j = 0; j < 20; j++) {
        g(b);
        g(j % 2);
    }

there are two calls to g in the loop. We start at the JSOP_TABLESWTICH in g. We trace out of the first call and into the second. When we hit the same JSOP_TABLESWITCH a second time, we erroneously fail to emit any code.
There are 2 solutions:

1) Easy fix:

Add a number to TraceRecorder counting how many instructions are already recorded in that trace.

Then in the tableswitch it get's checked if the side-exit is CASE_EXIT. If there are no instructions recorded yet then just RECORD_CONTINUE, else write the lir instructions.

2) Harder fix:

When a new trace is started on the side-exit CASE_EXIT, somehow don't start the trace on the switch but on the next instruction.(Advance pc somehow)
I'm willing to solve this bug,
but I have no idea what the best way would be to fix it.
So can one of the TM-guys have a look and tell his findings?
3) Really easy fix:  

Get rid of tableswitch().  If I do that this problem is fixed, as is the one in bug 635657 (which is very likely a dupe of this bug).

Reason to get rid of it:
- Is tableswitch() necessary now that we have JM?  
- It's only implemented on i386.
- This isn't the first bug in it.
- Tracing switch statements feels like a losing proposition.  
- Getting rid of it didn't hurt Sunspider with either '-j -m -p' or just '-j'.

Reasons to keep it:
- Bug 475115, the original implementation, mentions that it helps the JSSpeccy and jsMSX demos, whatever they are.
I'm all for removing switch-support from the tracer. As you said, it only works on one platform. Tableswitch is inlined in JM thanks to Jan de Mooij, and bug 475115 was measured back when the interpreter dominated.

Dave should weigh in too.
(In reply to comment #24)
> 3) Really easy fix:  
> 
> Get rid of tableswitch().  If I do that this problem is fixed, as is the one in
> bug 635657 (which is very likely a dupe of this bug).
> 
> Reason to get rid of it:
> - Is tableswitch() necessary now that we have JM? 

Maybe not; see below.

> - It's only implemented on i386.
> - This isn't the first bug in it.
> - Tracing switch statements feels like a losing proposition.  

I think it's a reasonable thing to do. The bug here seems to relate to the difficulty of communicating information about exactly what state we're in and what we have to compile. Unfortunately, it may be hard to make that code simple without substantial refactoring.

> - Getting rid of it didn't hurt Sunspider with either '-j -m -p' or just '-j'.
> 
> Reasons to keep it:
> - Bug 475115, the original implementation, mentions that it helps the JSSpeccy
> and jsMSX demos, whatever they are.

http://matt.west.co.tt/spectrum/jsspeccy/
http://jsmsx.sourceforge.net/

If those don't regress without tracing tableswitch, I'd be down with removing it.
Here I add some data to make a better decision. I wouldn't make a decision on this data only, but it's a start.

I've run JSNES (is inspired by jsspeccy)

I ran the benchmark code in the JSNES suite in a browser build.
(Build myself from mozilla-central today (--enable-optimize and -j4))

With Tableswitch (clean profile):
Average of 100 runs: 73.97ms, sd: 63.1691
Average of 100 runs: 74.25ms, sd: 63.0075
Average of 100 runs: 73.56ms, sd: 69.18639999999999
Average of 100 runs: 74.40ms, sd: 96.42000000000007

Without Tableswitch (clean profile)::
Average of 100 runs: 74.02ms, sd: 75.81959999999994
Average of 100 runs: 74.13ms, sd: 73.13309999999997

With Tableswitch (clean profile, disabled methodjit):
Run 0, 383ms
Run 1, 390ms
Run 2, 383ms
Run 3, 382ms
Run 4, 386ms
Run 5, 385ms
Run 6, 364ms
Run 7, 319ms
Run 8, 854ms
Run 9, 840ms

Without Tableswitch (clean profile, disabled methodjit):
Run 0, 409ms
Run 1, 409ms
Run 2, 408ms
Run 3, 407ms
Run 4, 404ms
Run 5, 409ms
Run 6, 403ms
Run 7, 410ms
Run 8, 872ms
Run 9, 850ms

I also ran the testcase in bug 475115 in a shell build
(Build myself from tracemonkey that is 1 a 2 weeks old (--disable-optimize and --enable-debug) and adjusted MAX_BRANCHES to 64)

With Tableswitch: -j
47 ms
47 ms
48 ms
47 ms

Without Tableswitch: -j
107 ms
128 ms
126 ms
134 ms

With Tableswitch: -j -m -p
15 ms
15 ms
14 ms
16 ms

Without Tableswitch: -j -m -p
12 ms
11 ms
12 ms
12 ms


So in both cases there is indeed a difference when tableswitch is used or not. At least when only tracemonkey is running. When methodjit is enabled the differences are negligible. In the second case it is even faster. No idea why???
(In reply to comment #27)
> When methodjit is enabled the
> differences are negligible. In the second case it is even faster. No idea
> why???

Hey, thanks for doing those experiments!

My first guess as to why we would be faster w/o tableswitch with -m -j -p is that having tableswitch allows us to trace something that runs faster in the methodjit (which may or may not be the switch itself). That seems like evidence for Nick's claim that "Tracing switch statements feels like a losing proposition" (which I previously questioned).
dmandelin, I looked briefly at jsMSX and JSSpeccy, both were non-obvious how to run... if you're think haytjes' data is sufficient to remove tableswitch I am too! :)
Ok, I also tried JSSpeccy in the browser build I have.
I measured the time it took to run the first 100 frames of wotef.sna

With tableswitch:
Total time of 100 frames: 1656.00ms
Total time of 100 frames: 1685.00ms
Total time of 100 frames: 1682.00ms
Total time of 100 frames: 1672.00ms

With tableswitch - disable methodJIT
Total time of 100 frames: 1673.00ms
Total time of 100 frames: 1668.00ms
Total time of 100 frames: 1634.00ms
Total time of 100 frames: 1642.00ms

Without tableswitch:
Total time of 100 frames: 1622.00ms
Total time of 100 frames: 1626.00ms
Total time of 100 frames: 1592.00ms
Total time of 100 frames: 1631.00ms

Without tableswitch - disable methodJIT
Total time of 100 frames: 1633.00ms
Total time of 100 frames: 1633.00ms
Total time of 100 frames: 1638.00ms
Total time of 100 frames: 1619.00ms

So here I saw the same trend. When I removed the tableswitch JSSpeccy was faster then with the tableswitch. But I do wonder why disabling methodJIT didn't give a speed regression?
(In reply to comment #29)
> dmandelin, I looked briefly at jsMSX and JSSpeccy, both were non-obvious how to
> run... if you're think haytjes' data is sufficient to remove tableswitch I am
> too! :)

We might end up having to reverse that decision after getting some future bug report, but, yes, I think haytjes' data is sufficient to remove it and see.
This patch excises tableswitch() and related pieces.
Attachment #514958 - Flags: review?
Attachment #502831 - Flags: review?(dmandelin)
Attachment #514958 - Flags: review? → review?(dmandelin)
The patch looks good but I'm wondering if we want to abort when we hit a tableswitch.
(In reply to comment #33)
> The patch looks good but I'm wondering if we want to abort when we hit a
> tableswitch.

You mean JSOP_TABLESWITCH, ie. get rid of TraceRecorder::switchop()?  Yeah, I was wondering about that too.  That would also affect JSOP_LOOKUPSWITCH and JSOP_LOOKUPSWITCHX.

Do you think any of this is suitable for 4.0?
(In reply to comment #34)
> Do you think any of this is suitable for 4.0?

The reward/risk payoff doesn't seem extremely high: the perf gains are low, and the actual correctness issue goes back to, probably, 3.5 so it's not a recent regression. I'm only saying this because touching either JIT gives you a 10-yard risk penalty :)
(In reply to comment #34)
> (In reply to comment #33)
> > The patch looks good but I'm wondering if we want to abort when we hit a
> > tableswitch.
> 
> You mean JSOP_TABLESWITCH, ie. get rid of TraceRecorder::switchop()?  Yeah, I
> was wondering about that too.  That would also affect JSOP_LOOKUPSWITCH and
> JSOP_LOOKUPSWITCHX.

No, I just meant aborting on tableswitch only. Anyway, I tried JSNES Super Mario with and without this patch, and it doesn't regress, so maybe that's good enough for now. We should still wait until after 4 to land.
Attachment #514958 - Flags: review?(dmandelin) → review+
Summary: Incorrect calculations when Tracemonkey JIT is enabled → TM: don't trace JSOP_TABLESWITCH
http://hg.mozilla.org/mozilla-central/rev/53a3258235a1
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: