Closed Bug 572947 Opened 15 years ago Closed 14 years ago

TM: sub-optimal trace generation due to type instability caused by LIR and the recorder having differing views on integer array accesses

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Unassigned)

Details

Attachments

(1 file)

Consider this test case: var a = [1]; var j; for (var i = 0; i < 10; ++i) { j = a[0]; } The TMFLAGS=recorder output is attached. Two fragments get generated -- the first, which is type-unstable, and the second, which rectifies the problem. Specifically, on entry TM thinks that 'j' is an int, and on exit it thinks it's a double. Why is this? The recorder sees the actual value at runtime, which is the integer 1, and says "aha, it's an integer". But then in LIR we always read numbers from arrays as doubles, so we end up determining that it's a double. I think this causes a significant slowdown for fannkuch, possibly up to 20%. I say this because when specialising arrays-of-ints (which avoids this problem) I get a 1.4x speedups, but when specialising arrays-of-doubles (which hits this problem, but avoids unboxing) I only get about 1.2x. I think this problem will account for most of the difference. It looks like several other SS tests might have the same problem to some degree. Gal also said "to make matters worse, we don't seem to tell the oracle if we make a slot into a double". I think the way to fix this is to change CaptureTypesVisitor::visitStackSlots() to look at the tracker to see what LIns* is responsible, and if it's a double-producing instruction like "callf js_UnboxDouble" then it should mark it as TT_DOUBLE. AdjustCallerStackTypesVisitor might also be involved.
The same problem probably affects globals, too.
Nb: I hit this same kind of problem (recorder and LIR having different views of the type of something) in a completely different context yesterday while working on a patch for bug 566767. Having hit it twice in two days has spooked me, I wonder how many other places are affected.
I've been looking at this case: function f() { var a = [1]; var j; for (var i = 0; i < 10; ++i) { j = a[0]; } } f(); which is like the one in comment 0 but the variables are locals instead of globals; the case with globals seems to be handled differently internally, in a manner I found harder to understand. AFAICT, CaptureTypesVisitor::visitStackSlots() is the culprit. It calls getCoercedType() on each stack slot. getCoercedType() just looks at the jsval to determine the TraceType. It seems like determineSlotType() would be more appropriate, as it looks at both the jsval and the LIns. (Indeed, getCoercedType() and determineSlotType() differ only in their handling of numbers.) However, using determineSlotType() isn't possible as the relevant call to captureTypes() occurs within TreeFragment::initialize(), which is itself called within RecordTree() shortly before the TraceRecorder is initialized. And determineSlotType() needs TraceRecorder to be initialized because it accesses the Tracker within TraceRecorder. I tried forcing CaptureTypesVisitor::visitStackSlots() to return TT_DOUBLE in preference to TT_INT and the example above compiled in a single fragment, ie. the problem was fixed (albeit in a manner not appropriate for general use). So my diagnosis seems to be correct.
You might want to look at this stuff in the fatval branch too, as we have changed a bit the way tags are handled and values are typed and unboxed. In that branch, array values are still always read out as doubles, though, because doing so made it faster.
"because doing so made it faster" => for some test cases (3d-raytrace, in particular). It also made crypto stuff slower. But the raytrace win outweighs the smaller wins.
Attached file generated LIR
Here's an idea: when type instability occurs, throw the crappy fragment we just generated away and start again with the correct types. This is in contrast with keeping the crappy fragment and then starting again with the correct types. I see several advantages: (a) it'll decrease compile-time, because we won't have to generate native code for the discarded fragment, (b) we can deallocate the crappy fragment, (c) we'll never run the crappy fragments, and (d) we'll never transition between the crappy and good fragments. Seems too easy. What am I missing?
Good idea, definitely worth a shot.
(In reply to comment #7) > Good idea, definitely worth a shot. I just had a look at closeLoop(), but I have don't have the first idea about how the relevant code works. Any suggestions would be welcome.
Type inference (bug 557407) seems like it would help this bug a lot!
Current js shell results for the testcase in comment #0: Interp: 0.867 ms -j: 0.830 ms -m: 0.856 ms -m -n: 0.681 ms Current js shell results for the testcase in comment #3: Interp: 0.679 ms -j: 0.691 ms -m: 0.661 ms -m -n: 0.915 ms Not sure what to make of these results.
> Not sure what to make of these results. The test is far too short-running for the measurements to be meaningful :) I tried increasing the loop counter in comment 0 to 10,000,000: interp: 17.4s -j: 17.3s -m: 0.09s -m -n: 0.03s Looks good to me, and TM is dead anyway :)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: