Closed Bug 638506 Opened 13 years ago Closed 11 years ago

JM: Use ebx as a general purpose register

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: bhackett1024, Assigned: bhackett1024)

Details

Attachments

(1 file, 1 obsolete file)

WIP 13 years ago Brian Hackett [Laid off!] 9.13 KB, patch		Details \| Diff \| Splinter Review
x86 patch 13 years ago Brian Hackett [Laid off!] 11.73 KB, patch		Details \| Diff \| Splinter Review

Brian Hackett [Laid off!]

Assignee

Description

•

13 years ago

On x86, JM can use 5 out of the 8 general purpose registers.  esp homes the VMFrame, ebx homes the JSStackFrame, and ebp doesn't store anything (it may be used by Breakpad).

Instead, it seems that ebp could store the JSStackFrame and ebx could be freed up for general use.  Having another GPR would be tremendously beneficial for register allocation (x86 is under incredible register pressure).

Luke Wagner [:luke]

Comment 1

•

13 years ago

We keep considering this and holding off since taking away the register does destroy gdb's/shark's ability to bt through mjit code.  But we are in the perf business, so it sounds reasonable.  Perhaps we could have a compiler flag that kept ebp as is, or would that add too much #ifdef'y madness?  Alternatively, sfink has been talking about doing something to make gdb work on x86-64 and perhaps that would also help here.  Steve?

Steve Fink [:sfink] [:s:]

Comment 2

•

13 years ago

Steve should stop talking about doing something and get around to actually doing it, because he's not entirely confident of what he's talking about yet.

In the meantime, would ebp only be repurposed during the execution of mjitted code? Or worded differently: what would be stored in the stack when making a stub call? The JSStackFrame ebp or the original ebp? If we could arrange for stub calls to store the original ebp, then we'd only lose bt when crashing directly in mjitted code. It's callee-saved, so I think that just means restoring ebp before making a stub call. Would that be a reasonable halfway solution?

Brian Hackett [Laid off!]

Assignee

Comment 3

•

13 years ago

Restoring ebp before making a stub or native call should be fine.

Boris Zbarsky [:bzbarsky]

Comment 4

•

13 years ago

I'd really appreciate it if stacks for profilers would work at least when --enable-profiling.  But it sounds like restoring ebp before calling out of mjit-generated code would get us that.

Brian Hackett [Laid off!]

Assignee

Comment 5

•

13 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Works on x86, still need to do x64.  x64 could continue to use ebx as the JSStackFrame register, but code fox x64 parallels x86 so it seems simpler to keep the two consistent.

This restores ebp before making a VM or native call.  This is a register move/add beforehand since ebp is a fixed offset from esp (basically free), and a load afterwards which we have to do anyways for JM+TI (calling into the VM can expand inlined frames, so the restored callee-saved ebx may have been invalid anyways).

gdb can unwind the entire stack if I break in a VM or native call made from the JIT code.  It can't unwind if I break directly in JIT code, but I gather from comment 2 and comment 4 that this behavior is acceptable.

Assignee: general → bhackett1024

Boris Zbarsky [:bzbarsky]

Comment 6

•

13 years ago

Hmm.  It sounds like that behavior would mean that a profiler can't figure out that jitcode was called from JaegerShot, right?

Brian Hackett [Laid off!]

Assignee

Comment 7

•

13 years ago

Probably.  JIT code is *only* called from JaegerShot though; what benefits do we get from knowing where it is called from?  Since debuggers and profilers can't see into the JIT code at all, I figured time in JIT code would just go into a big 'unknown' bucket.

Brian Hackett [Laid off!]

Assignee

Comment 8

•

13 years ago

Attached patch x86 patch — Details — Splinter Review

Patch landed on JM.  Decided to do this x86-only, there's not too much that had to change and given our current lack of evidence of x64 register pressure I'd like to avoid the code churn on the five trampolines we have for calling into x64.  Could be expanded to x64 in the future without much difficulty.

http://hg.mozilla.org/projects/jaegermonkey/rev/ff1ae67e4986

Attachment #525698 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Comment 9

•

13 years ago

Brian, have you actually tested whether profiling still works with that fix on Linux or Windows (Mac is mostly not affected, since we only run JS in 64-bit processes there)?

Boris Zbarsky [:bzbarsky]

Comment 10

•

13 years ago

> JIT code is *only* called from JaegerShot though;

Yes, but there are often multiple _instances_ of JaegerShot in the profile tree.

> what benefits do we get from knowing where it is called from?

The ability to have readable profiles, mostly.

> I figured time in JIT code would just go into a big 'unknown' bucket.

Right now, I can charge that "unknown" bucket to caller in Shark, and then it looks like whatever C++ code was called from the jit is called directly from JaegerShot and the jit time is all collected under JaegerShot.  It sounds like with the new setup there will be no way to get a profiler to coalesce the thousands of "unknown" fragments generated by the JIT into a single entity, so we won't have a good way of measuring how much time is spent in the jitted code...  Maybe that's ok.

Luke Wagner [:luke]

Comment 11

•

13 years ago

For fun I checked to see what the speedup was on AWFY and, for ff1ae67e4986, there doesn't appear to be any change on SS/V8/Kraken...

Brian Hackett [Laid off!]

Assignee

Comment 12

•

13 years ago

I've only tested that this doesn't crash things.  We're starting to benchmark-tune the TI stuff and I'd like to have 6 registers available on x86 for tuning the regalloc.  This is the fastest way to do that.

If there are compatibility issues with repointing ebp, it would be good I think to know how important the sixth register is before deciding the best way to address those issues.  The patch is simple and it would be easy to either backout or control using #ifdefs.

I suspect after tuning the sixth register will turn out to make a pretty big difference, but if not then we can back it out and go work on other stuff, without having invested time in making sure it works with shark, breakpad and related tools.

Boris Zbarsky [:bzbarsky]

Comment 13

•

13 years ago

If we can control using ifdefs, that's good enough; we can define it differently based on the omit-frame-pointer settings when compiling C++.  As long as we preserve ebx when C++ is keeping frame pointers, things are fine.

Luke Wagner [:luke]

Comment 14

•

13 years ago

I'm still surprised that adding an extra register to the pool of registers doesn't give any initial speedup.  Is it being used?

Brian Hackett [Laid off!]

Assignee

Comment 15

•

13 years ago

Yeah, some benchmarks sped up while a few slowed down.  That could be perf bustage in the regalloc, regardless we should be able to sort things with benchmark tuning.  Though having perf microbenchmarks on awfy bug 649487 would be very helpful!

Jan de Mooij [:jandem]

Comment 16

•

11 years ago

JM is gone and Ion regalloc can use all registers except esp.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

JM: Use ebx as a general purpose register

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Attachment

General

Description

File Name

Content Type