preserve frame pointer in all JIT code
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
People
(Reporter: luke, Assigned: anba)
References
Details
Attachments
(7 files)
13.12 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
If we could preserve the frame pointer in all JIT code and use a standard prologue/epilogue, I think there would be a lot of benefits: - With preserved fp, it's pretty easy to unwind JIT frames because you don't have to know the sp->fp offset at every pc. Thus, it would be easy to: . register unwind info with Windows (RtlAddFunctionTable et al) so that various forms of native unwinding Just Worked . register DWARF info so gdb and other tools Just Worked - Having 'fp' simplifies codegen and stack iteration and can reduce metadata size of bailouts, etc - Maybe we could remove, or at least simplify, the JIT profiling mode. FWIW, wasm/asm.js used to not preserve fp and go through a bunch of contortions to maintain fp during profiling mode. Later, we needed the ability to walk the stack at any pc (to handle GC during async interrupt) so we switched wasm to always preserve fp so we could always find the current instance object (via fp). This was a major simplification and removed profiling mode altogether. The concern of course is register-starved x86; 6 vs 7 GPRs is a big difference. However, 64-bit is now the majority of FF users: https://hardware.metrics.mozilla.com/#detail-browser-share-32-64 and would likely be the browser of anyone doing serious benchmarking.
> The concern of course is register-starved x86; 6 vs 7 GPRs is a big difference. FWIW, this didn't stop us from preserving frame pointers for CPP in bug 1322735.
Comment 2•7 years ago
|
||
We should check how much this actually affects Ion perf on 32-bit and 64-bit. Both JS shell benchmarks like SunSpider/Kraken/Octane (especially Kraken and Octane-crypto might be affected) and Talos stuff like Speedometer/Dromaeo. We've had regalloc issues that only show up with profiling enabled (due to FP being reserved only in that case). That's not great and I'm in favor of simplifying the current situation. If the hit on 32-bit x86 is <= 5% or so I'd be willing to take it. If it's more we should be more careful.
Updated•7 years ago
|
Comment 3•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #0) > . register DWARF info so gdb and other tools Just Worked This change would make it far simpler to enable the existing unwinder by default. https://dxr.mozilla.org/mozilla-central/rev/1624b88874765bf57e9feba176d30149c748d9d2/js/src/gdb/mozilla/unwind.py#1 Currently it is off because it's difficult to unwind when stopped at an arbitrary spot in JIT code. DWARF emission wouldn't even have to happen for this to be useful.
![]() |
Reporter | |
Comment 4•7 years ago
|
||
Another random benefit of being able to unwind at an arbitrary point in JIT code that I saw today is that, when we call RedirectJitCodeToInterruptCheck(), we'd be able to walk the stack and only patch/redirect the JIT code for functions on the stack instead of all the JIT code in the JitRuntime.
Comment 5•6 years ago
|
||
I attempted to measure the performance impact of preserving %rbp by adding it to the NonAllocatableRegs mask. The register allocator does correctly respect that mask. Simple change to make, but building all the variants takes quite a while! Those builds produced the following results on Octane: 32-bit, with ebp. > Richards: 33650 > DeltaBlue: 50137 > Crypto: 25974 > RayTrace: 116992 > EarleyBoyer: 34948 > RegExp: 4555 > Splay: 18503 > SplayLatency: 4345 > NavierStokes: 30361 > PdfJS: 15532 > Mandreel: 22724 > MandreelLatency: 24802 > Gameboy: 63029 > CodeLoad: 18604 > Box2D: 47968 > zlib: 57163 > Typescript: 14681 > ---- > Score (version 9): 25515 32-bit, patched without ebp. > Richards: 33911 > DeltaBlue: 50474 > Crypto: 26092 > RayTrace: 116326 > EarleyBoyer: 34089 > Segmentation fault (core dumped) 64-bit, with rbp. > Richards: 29541 > DeltaBlue: 48722 > Crypto: 28312 > RayTrace: 110776 > EarleyBoyer: 28612 > RegExp: 4376 > Splay: 20014 > SplayLatency: 23000 > NavierStokes: 29680 > PdfJS: 17694 > Mandreel: 22790 > MandreelLatency: 29674 > Gameboy: 69180 > CodeLoad: 22275 > Box2D: 48063 > zlib: 52394 > Typescript: 14715 > ---- > Score (version 9): 28461 64-bit, patched without rbp. > Richards: 29343 > DeltaBlue: 52108 > Crypto: 29160 > RayTrace: 108038 > EarleyBoyer: 29225 > RegExp: 4359 > Splay: 18784 > SplayLatency: 21817 > NavierStokes: 30243 > PdfJS: 18677 > Mandreel: 22856 > MandreelLatency: 29153 > Gameboy: 66618 > CodeLoad: 22526 > Box2D: 48111 > zlib: 71744 > Typescript: 13220 > ---- > Score (version 9): 28808 The crash on x86 is from MacroAssembler::generateBailoutTail() assuming that ebp/rbp is in the AllocatableMask.
Comment 6•6 years ago
|
||
I have x86 mostly working at this point. The goal was to just get a patch to a point where we can evaluate performance to see whether this route is viable. Octane runs with the exception of PdfJS -- there are many CacheIR functions that assert that more registers are available than x86 would provide after this patchset. Also, although the RegExp code runs, it currently clobbers %ebp. Results from Octane runs without/with ebp as an allocatable register are below (minus PdfJS): > [sstangl@mazu octane]$ ~/dev/gecko-dev/js/src/opt32/js/src/js run.js > Richards: 29964 > DeltaBlue: 47975 > Crypto: 25462 > RayTrace: 112700 > EarleyBoyer: 31753 > RegExp: 4541 > Splay: 18499 > SplayLatency: 4506 > NavierStokes: 29532 > Mandreel: 20058 > MandreelLatency: 22973 > Gameboy: 39159 > CodeLoad: 19003 > Box2D: 44460 > zlib: 45256 > Typescript: 15206 > ---- > Score (version 9): 24359 > [sstangl@mazu octane]$ ~/dev/gecko-dev/js/src/opt32-master/js/src/js run.js > Richards: 33191 > DeltaBlue: 50792 > Crypto: 26462 > RayTrace: 114550 > EarleyBoyer: 34773 > RegExp: 4658 > Splay: 18384 > SplayLatency: 4308 > NavierStokes: 29917 > Mandreel: 23148 > MandreelLatency: 24437 > Gameboy: 51345 > CodeLoad: 19245 > Box2D: 46722 > zlib: 50268 > Typescript: 16109 > ---- > Score (version 9): 25919 Jan, what do you think?
Comment 7•6 years ago
|
||
Comment 8•6 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #6) > Octane runs with the exception of PdfJS -- there are many CacheIR functions > that assert that more registers are available than x86 would provide after > this patchset. Is this in Baseline or Ion IC code? It's hard enough to make IC regalloc work on x86 so taking away ebp could be a problem... Is it okay to clobber ebp temporarily *before* we make JS/C++ calls? The CacheIR code should ensure registers are restored to their original values whenever we make such calls (because that's necessary for GC marking etc). I guess that would break unwinding if we want to make that fp-based. > Results from Octane runs without/with ebp as an allocatable register are > below (minus PdfJS): > > Jan, what do you think? So on 32-bit a 6% Octane regression, 10% for the asm.js zlib test, right? What do you see on Kraken (many tight loops)? On 64-bit zlib (asm.js) improves from 52394 to 71744 points - that's a really big difference, is it reproducible? It's a noticeable perf regression on 32-bit so I'm not thrilled, but OTOH most of our power users moved to 64-bit anyway.. It also depends on how invasive the changes are: if it really complicates manual regalloc that could be a more serious problem than the perf hit.
![]() |
Reporter | |
Comment 9•6 years ago
|
||
That's weird, assuming the asm.js is being compiled by OdinMonkey here (no --no-asmjs), it should already have FP reserved, so I wouldn't expect an Octane-zlib regression.
Assignee | ||
Comment 10•2 years ago
|
||
Handle the case when temp2_
couldn't be reserved.
CheckNotBackReferenceImpl()
:
- Use
backtrack_stack_pointer_
as an additional temp register by saving it
on the stack.
initFrameAndRegs()
:
- Use
backtrack_stack_pointer_
as an additional temp register. We don't have
to restore the register, becausebacktrack_stack_pointer_
hasn't yet been
initialised. - Use
temp0_
forinputStartMinusOneReg
.InputOutputData
is stored in
temp0_
, but we don't read from it after this point anymore.
successHandler()
:
- Use
backtrack_stack_pointer_
as an additional temp register. We don't have
to restore the register, becausebacktrack_stack_pointer_
won't be read
after running the success handler.
Updated•2 years ago
|
Assignee | ||
Comment 11•2 years ago
|
||
We can't reuse the trick we've been using for maybeTemp5
where we're reusing
lastIndex
as an additional temp register. (Also see bug 1480819.)
Instead just save regexp
and lastIndex
on the stack and then use these
register for temp4
resp. temp5
.
This change also means RegExpMatcherRaw
will now (again) always be called
with a valid lastIndex
argument.
Depends on D146704
Assignee | ||
Comment 12•2 years ago
|
||
Use AutoScratchRegisterMaybeOutputType
instead of AutoScratchRegister
, so
we need one less register.
Depends on D146705
Assignee | ||
Comment 13•2 years ago
|
||
LGetPropSuperCache
needs seven register, so it can't be used on x86.
Depends on D146706
Assignee | ||
Comment 14•2 years ago
|
||
Changes:
- Replace
regs.take(BaselineFrameReg)
withregs.takeUnchecked(...)
, because
BaselineFrameReg
is the frame pointer only on some platforms, which means it's
no longer in the register set returned fromGeneralRegisterSet::All()
. - Remove
TakeJitRegisters
because the frame pointer is now always preserved. - Update
GenerateDirectCallFromJit
as instructed in the code comments.
Drive-by change:
- Remove
RegisterID::S0
andRegisterID::S1
, because they are some old
JSC-specific definitions. Instead addRegisterID::fp
.
Depends on D146707
Assignee | ||
Comment 15•2 years ago
|
||
This code is no longer needed.
Depends on D146708
Assignee | ||
Comment 16•2 years ago
|
||
Tests are looking good: https://treeherder.mozilla.org/jobs?repo=try&revision=09d6e7f8d4835e73ecef374190c62a09ad9d3b50
Benchmarks don't seem to indicate any regressions:
- Base revision: https://treeherder.mozilla.org/jobs?repo=try&revision=531119cb1e0ecd159e245de28e683e096d2eed42
- Patch revision: https://treeherder.mozilla.org/jobs?repo=try&revision=e4fcbc91261e65de6f7ebd1eb2af677f4a88d4da
- JS benchmarks: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=531119cb1e0ecd159e245de28e683e096d2eed42&newProject=try&newRevision=e4fcbc91261e65de6f7ebd1eb2af677f4a88d4da&framework=11&page=1
- Speedometer: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=531119cb1e0ecd159e245de28e683e096d2eed42&newProject=try&newRevision=e4fcbc91261e65de6f7ebd1eb2af677f4a88d4da&framework=13&page=1
Updated•2 years ago
|
Updated•2 years ago
|
Comment 17•2 years ago
|
||
Pushed by andre.bargull@gmail.com: https://hg.mozilla.org/integration/autoland/rev/3a85ea2898b3 Part 1: Change SMRegExpMacroAssembler to work with one less temp register. r=iain https://hg.mozilla.org/integration/autoland/rev/bbf75553a6dc Part 2: Change JitRealm::generateRegExpMatcherStub to work with one less register on x86. r=jandem https://hg.mozilla.org/integration/autoland/rev/3a054262db65 Part 3: Change IonCacheIRCompiler to work with one less temp register. r=jandem https://hg.mozilla.org/integration/autoland/rev/42a0409c631b Part 4: Disable GetElemSuper IC inlining on x86. r=jandem https://hg.mozilla.org/integration/autoland/rev/d1b71b47f02a Part 5: Remove frame pointer from allocatable register set. r=jandem https://hg.mozilla.org/integration/autoland/rev/f7930df19aac Part 6: Remove no longer used framepointer-temp from LIonToWasmCallBase. r=jandem
Comment 18•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3a85ea2898b3
https://hg.mozilla.org/mozilla-central/rev/bbf75553a6dc
https://hg.mozilla.org/mozilla-central/rev/3a054262db65
https://hg.mozilla.org/mozilla-central/rev/42a0409c631b
https://hg.mozilla.org/mozilla-central/rev/d1b71b47f02a
https://hg.mozilla.org/mozilla-central/rev/f7930df19aac
Updated•2 years ago
|
Description
•