Closed Bug 596375 Opened 15 years ago Closed 14 years ago

Possible nondeterminism in JIT speed

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: stejohns, Unassigned)

Details

While working on unrelated code (and running the perf test suite), I noticed an odd anomaly: the test/performance/asmicro/closedvar-write-1.as was making large performance changes from minor code changes. Specifically, I found that making minor changes to the AS3 code's epilogue (eg, adding an extra print() statement at the end) could change the performance by as much as 25%. Change is MacTel32 only (MacTel64 seems unaffected), Release only (didn't test other builds), JIT only. Running in -Dverbose=jit seems to eradicate the difference. My suspicion is there's some sort of nondeterminism going on in the JIT -- code or data alignment? Minor cache effects? that probably bears further investigation.
I've seen pathological code placement effects in the microbenchmarks previously. See this comment and the ones immediately preceding/following: https://bugzilla.mozilla.org/show_bug.cgi?id=565489#c36 Also, the following may also be a code placement artifact: https://bugzilla.mozilla.org/show_bug.cgi?id=576082
Further investigation showed that the "slow" cases all occurred when the inner loop of the benchmark began at an odd address; padding this to an even address (via NOP insertion) heals the cases I've found locally. I'm at a loss to explain this, as I'm not aware of this sort of alignment being an issue on modern x86-32 systems...
the modern x86 tuning guides recommend aligning call/jump targets on 16-byte boundaries. but there is nuance to it, I suggest reading the manual to decide if what you're seeing matches the situation for one of the tuning guide recommendations. (the microarchitecture also matters, atom vs core2 vs nehalem being the latest two mainstream revisions)
Padding loop jump targets to 16-byte boundaries actually slowed it dramatically (so did 8-byte padding). 4-byte padding was basically the same as 2-byte padding. Something is odd here, just not sure what yet.
Adding to the weirdness: the state I was in that was 100% reproducible has apparently self-healed. Closing until/unless I find another reproducible case.
OS: Mac OS X → Windows 7
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Re-opening: as crazy as it seems, the length of the command-line seems to be a factor... running relative performance test between two builds on my system, I found one test (asmicro/globalvar-write-1.as) varying by up to 50% (but the changes should not have affected this test)... renaming the two test directories to be identical in length erased the difference. On another occasion, adding "-Dnodebugger" to the commandline made a 300% change in performance (on another microbenchmark)... in a *Release* build, which ignores this flag. Perhaps there's a sensitivity to initial ESP value? Or relying on uninitialized stack frame area?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to comment #6) > Re-opening: as crazy as it seems, the length of the command-line seems to be a > factor... I have had the same experience and have also started to pay attention to it. ESP alignment is the most likely explanation, probably (alignment in a cache line? on a page?). I figure GC could matter since System.args returns the command line arguments so those are probably allocated in strings somewhere; heap dynamics are somewhat chaotic so different early allocations could have larger effects later. It's not a great explanation though.
Marking as RESOLVED/WONTFIX as this appears as a FOL.
Status: REOPENED → RESOLVED
Closed: 15 years ago14 years ago
Flags: flashplayer-qrb+
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Resolution: --- → WONTFIX
Target Milestone: --- → Future
You need to log in before you can comment on or make changes to this bug.