Closed
Bug 638506
Opened 13 years ago
Closed 11 years ago
JM: Use ebx as a general purpose register
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: bhackett1024, Assigned: bhackett1024)
Details
Attachments
(1 file, 1 obsolete file)
11.73 KB,
patch
|
Details | Diff | Splinter Review |
On x86, JM can use 5 out of the 8 general purpose registers. esp homes the VMFrame, ebx homes the JSStackFrame, and ebp doesn't store anything (it may be used by Breakpad). Instead, it seems that ebp could store the JSStackFrame and ebx could be freed up for general use. Having another GPR would be tremendously beneficial for register allocation (x86 is under incredible register pressure).
Comment 1•13 years ago
|
||
We keep considering this and holding off since taking away the register does destroy gdb's/shark's ability to bt through mjit code. But we are in the perf business, so it sounds reasonable. Perhaps we could have a compiler flag that kept ebp as is, or would that add too much #ifdef'y madness? Alternatively, sfink has been talking about doing something to make gdb work on x86-64 and perhaps that would also help here. Steve?
Comment 2•13 years ago
|
||
Steve should stop talking about doing something and get around to actually doing it, because he's not entirely confident of what he's talking about yet. In the meantime, would ebp only be repurposed during the execution of mjitted code? Or worded differently: what would be stored in the stack when making a stub call? The JSStackFrame ebp or the original ebp? If we could arrange for stub calls to store the original ebp, then we'd only lose bt when crashing directly in mjitted code. It's callee-saved, so I think that just means restoring ebp before making a stub call. Would that be a reasonable halfway solution?
Assignee | ||
Comment 3•13 years ago
|
||
Restoring ebp before making a stub or native call should be fine.
Comment 4•13 years ago
|
||
I'd really appreciate it if stacks for profilers would work at least when --enable-profiling. But it sounds like restoring ebp before calling out of mjit-generated code would get us that.
Assignee | ||
Comment 5•13 years ago
|
||
Works on x86, still need to do x64. x64 could continue to use ebx as the JSStackFrame register, but code fox x64 parallels x86 so it seems simpler to keep the two consistent. This restores ebp before making a VM or native call. This is a register move/add beforehand since ebp is a fixed offset from esp (basically free), and a load afterwards which we have to do anyways for JM+TI (calling into the VM can expand inlined frames, so the restored callee-saved ebx may have been invalid anyways). gdb can unwind the entire stack if I break in a VM or native call made from the JIT code. It can't unwind if I break directly in JIT code, but I gather from comment 2 and comment 4 that this behavior is acceptable.
Assignee: general → bhackett1024
Comment 6•13 years ago
|
||
Hmm. It sounds like that behavior would mean that a profiler can't figure out that jitcode was called from JaegerShot, right?
Assignee | ||
Comment 7•13 years ago
|
||
Probably. JIT code is *only* called from JaegerShot though; what benefits do we get from knowing where it is called from? Since debuggers and profilers can't see into the JIT code at all, I figured time in JIT code would just go into a big 'unknown' bucket.
Assignee | ||
Comment 8•13 years ago
|
||
Patch landed on JM. Decided to do this x86-only, there's not too much that had to change and given our current lack of evidence of x64 register pressure I'd like to avoid the code churn on the five trampolines we have for calling into x64. Could be expanded to x64 in the future without much difficulty. http://hg.mozilla.org/projects/jaegermonkey/rev/ff1ae67e4986
Attachment #525698 -
Attachment is obsolete: true
Comment 9•13 years ago
|
||
Brian, have you actually tested whether profiling still works with that fix on Linux or Windows (Mac is mostly not affected, since we only run JS in 64-bit processes there)?
Comment 10•13 years ago
|
||
> JIT code is *only* called from JaegerShot though; Yes, but there are often multiple _instances_ of JaegerShot in the profile tree. > what benefits do we get from knowing where it is called from? The ability to have readable profiles, mostly. > I figured time in JIT code would just go into a big 'unknown' bucket. Right now, I can charge that "unknown" bucket to caller in Shark, and then it looks like whatever C++ code was called from the jit is called directly from JaegerShot and the jit time is all collected under JaegerShot. It sounds like with the new setup there will be no way to get a profiler to coalesce the thousands of "unknown" fragments generated by the JIT into a single entity, so we won't have a good way of measuring how much time is spent in the jitted code... Maybe that's ok.
Comment 11•13 years ago
|
||
For fun I checked to see what the speedup was on AWFY and, for ff1ae67e4986, there doesn't appear to be any change on SS/V8/Kraken...
Assignee | ||
Comment 12•13 years ago
|
||
I've only tested that this doesn't crash things. We're starting to benchmark-tune the TI stuff and I'd like to have 6 registers available on x86 for tuning the regalloc. This is the fastest way to do that. If there are compatibility issues with repointing ebp, it would be good I think to know how important the sixth register is before deciding the best way to address those issues. The patch is simple and it would be easy to either backout or control using #ifdefs. I suspect after tuning the sixth register will turn out to make a pretty big difference, but if not then we can back it out and go work on other stuff, without having invested time in making sure it works with shark, breakpad and related tools.
Comment 13•13 years ago
|
||
If we can control using ifdefs, that's good enough; we can define it differently based on the omit-frame-pointer settings when compiling C++. As long as we preserve ebx when C++ is keeping frame pointers, things are fine.
Comment 14•13 years ago
|
||
I'm still surprised that adding an extra register to the pool of registers doesn't give any initial speedup. Is it being used?
Assignee | ||
Comment 15•13 years ago
|
||
Yeah, some benchmarks sped up while a few slowed down. That could be perf bustage in the regalloc, regardless we should be able to sort things with benchmark tuning. Though having perf microbenchmarks on awfy bug 649487 would be very helpful!
Comment 16•11 years ago
|
||
JM is gone and Ion regalloc can use all registers except esp.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•