Closed Bug 638506 Opened 10 years ago Closed 8 years ago
JM: Use ebx as a general purpose register
On x86, JM can use 5 out of the 8 general purpose registers. esp homes the VMFrame, ebx homes the JSStackFrame, and ebp doesn't store anything (it may be used by Breakpad). Instead, it seems that ebp could store the JSStackFrame and ebx could be freed up for general use. Having another GPR would be tremendously beneficial for register allocation (x86 is under incredible register pressure).
We keep considering this and holding off since taking away the register does destroy gdb's/shark's ability to bt through mjit code. But we are in the perf business, so it sounds reasonable. Perhaps we could have a compiler flag that kept ebp as is, or would that add too much #ifdef'y madness? Alternatively, sfink has been talking about doing something to make gdb work on x86-64 and perhaps that would also help here. Steve?
Steve should stop talking about doing something and get around to actually doing it, because he's not entirely confident of what he's talking about yet. In the meantime, would ebp only be repurposed during the execution of mjitted code? Or worded differently: what would be stored in the stack when making a stub call? The JSStackFrame ebp or the original ebp? If we could arrange for stub calls to store the original ebp, then we'd only lose bt when crashing directly in mjitted code. It's callee-saved, so I think that just means restoring ebp before making a stub call. Would that be a reasonable halfway solution?
Restoring ebp before making a stub or native call should be fine.
I'd really appreciate it if stacks for profilers would work at least when --enable-profiling. But it sounds like restoring ebp before calling out of mjit-generated code would get us that.
Works on x86, still need to do x64. x64 could continue to use ebx as the JSStackFrame register, but code fox x64 parallels x86 so it seems simpler to keep the two consistent. This restores ebp before making a VM or native call. This is a register move/add beforehand since ebp is a fixed offset from esp (basically free), and a load afterwards which we have to do anyways for JM+TI (calling into the VM can expand inlined frames, so the restored callee-saved ebx may have been invalid anyways). gdb can unwind the entire stack if I break in a VM or native call made from the JIT code. It can't unwind if I break directly in JIT code, but I gather from comment 2 and comment 4 that this behavior is acceptable.
Assignee: general → bhackett1024
Hmm. It sounds like that behavior would mean that a profiler can't figure out that jitcode was called from JaegerShot, right?
Probably. JIT code is *only* called from JaegerShot though; what benefits do we get from knowing where it is called from? Since debuggers and profilers can't see into the JIT code at all, I figured time in JIT code would just go into a big 'unknown' bucket.
Patch landed on JM. Decided to do this x86-only, there's not too much that had to change and given our current lack of evidence of x64 register pressure I'd like to avoid the code churn on the five trampolines we have for calling into x64. Could be expanded to x64 in the future without much difficulty. http://hg.mozilla.org/projects/jaegermonkey/rev/ff1ae67e4986
Attachment #525698 - Attachment is obsolete: true
Brian, have you actually tested whether profiling still works with that fix on Linux or Windows (Mac is mostly not affected, since we only run JS in 64-bit processes there)?
> JIT code is *only* called from JaegerShot though; Yes, but there are often multiple _instances_ of JaegerShot in the profile tree. > what benefits do we get from knowing where it is called from? The ability to have readable profiles, mostly. > I figured time in JIT code would just go into a big 'unknown' bucket. Right now, I can charge that "unknown" bucket to caller in Shark, and then it looks like whatever C++ code was called from the jit is called directly from JaegerShot and the jit time is all collected under JaegerShot. It sounds like with the new setup there will be no way to get a profiler to coalesce the thousands of "unknown" fragments generated by the JIT into a single entity, so we won't have a good way of measuring how much time is spent in the jitted code... Maybe that's ok.
For fun I checked to see what the speedup was on AWFY and, for ff1ae67e4986, there doesn't appear to be any change on SS/V8/Kraken...
I've only tested that this doesn't crash things. We're starting to benchmark-tune the TI stuff and I'd like to have 6 registers available on x86 for tuning the regalloc. This is the fastest way to do that. If there are compatibility issues with repointing ebp, it would be good I think to know how important the sixth register is before deciding the best way to address those issues. The patch is simple and it would be easy to either backout or control using #ifdefs. I suspect after tuning the sixth register will turn out to make a pretty big difference, but if not then we can back it out and go work on other stuff, without having invested time in making sure it works with shark, breakpad and related tools.
If we can control using ifdefs, that's good enough; we can define it differently based on the omit-frame-pointer settings when compiling C++. As long as we preserve ebx when C++ is keeping frame pointers, things are fine.
I'm still surprised that adding an extra register to the pool of registers doesn't give any initial speedup. Is it being used?
Yeah, some benchmarks sped up while a few slowed down. That could be perf bustage in the regalloc, regardless we should be able to sort things with benchmark tuning. Though having perf microbenchmarks on awfy bug 649487 would be very helpful!
JM is gone and Ion regalloc can use all registers except esp.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.