Speed up branch exits into trace fragments

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
10 years ago
8 years ago

People

(Reporter: bzbarsky, Unassigned)

Tracking

(Blocks 1 bug)

Trunk
x86
macOS
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

The testcase in bug 508849 shows us doing lots of branch exiting to other trace fragments, due to the branchy JS.  If I understand right, right now each branch exit involves a spill of everything to the interpreter stack.  It would be really nice to not have to do that (and ideally to have such a branch exit be patched to just a single jump instruction).  jimb thinks we'd need to keep more state about what the LIR looked like at the exit point in the branch exit, but it might be worth it.

Of course tte other thing that would be worth it is somehow determining how much time the branch exit spills are actually taking in this case....
(Reporter)

Updated

10 years ago
Blocks: 508849
I think you could get a reasonable estimate by counting the number of instructions per trace switch and the number of trace switches, and comparing to the total number of dynamic instructions executed.

But I don't think we restore the interpreter state for a trace->trace control transfer. Mainly, we do the equivalent of returning from a function and calling into a new one, except that there is a jump in place of a return/call.
Ah, ok.  I'd been told that the patched branch exit thing was slow due to state restoration.  If that's not the case, then we need to look elsewhere for the overhead in bug 508849.
This effect is very noticeable on http://jsperf.com/test-trace-branching-overhead for what it's worth...

Comment 4

9 years ago
This has since been fixed in jsPerf, so I thought I’d post some more details here for future reference.

Because of this issue, this sort of function will run slightly slower for each new value of `x`:

    function f(x) {
      while (/* condition */) {
        x();
      }
    }

This is because it compiles a trace for the `x` that’s passed in the first time; then for a second value it has to check against the first thing that got passed in and jump to a new trace, and so on. It's a pretty small effect, _unless_ `x()` runs very fast — in that case, the cost of the check can completely dominate the actual runtime of `x()`.

The upshot is that for very short-running testcases, the scores with  Tracemonkey will depend on the order of the testcases. For example, you could have a test case where you’d benchmark the exact same code snippet three times e.g. http://jsperf.com/test-trace-branching-overhead. The first test will appear to be super fast; the second one will be slower, and the third one will be the slowest — even though you’re testing the exact same code! (To give you an idea: before we patched this in Benchmark.js, the aforementioned testcase would yield the following results: 2,000,000,000 / 227,000,000 / 125,000,000.)

One way to work around this issue is to just add a property with a new name to the `window` object in between tests.

    // Global scope
    var i = 0;
    
    // Before running each test
    window['dirty-hack' + i++] = 1;

This will make TraceMonkey throw away anything it’s compiled up to that point, so for each test it compiles anew.

It works, but it’s still kind of a dirty hack. Another way to work around this is to not use loops at all (loop unrolling).

Comment 5

8 years ago
Another test case where this effect is very noticeable: http://jsperf.com/empty-tests

Comment 6

8 years ago
Disregard my previous comment #5; Benchmark.js now works around the issue.
Obsolete with the demise of TM?

Updated

8 years ago
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.