Closed
Bug 1358599
Opened 8 years ago
Closed 8 years ago
Use runtime checks for GC pre-barriers instead of patchable jumps
Categories
(Core :: JavaScript Engine: JIT, enhancement)
Core
JavaScript Engine: JIT
Tracking
()
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: djvj, Assigned: djvj)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 6 obsolete files)
61.67 KB,
patch
|
djvj
:
review+
|
Details | Diff | Splinter Review |
Currently, pre-barriers emitted in jitcode use patchable jumps around GC pre-barriers, that look something like this:
JUMP skip_barrier;
<BARRIER CODE>
skip_barrier:
...
At both the start and end of an incremental garbage collection, we toggle all the JUMPs to CMPs and fall through to the barrier code.
This behaviour was written prior to our jitcode being made w^x. Prior to this, this was a relatively cheap operation that saved runtime cost. After w^x however, this operation becomes a lot more expensive, necessitating multiple mprotect() calls both before and after the toggle/patch code.
Checking this at runtime would look something like
LOAD reg, *addressOfIncrementalGCFlag
JZ reg, skip_barrier
<BARRIER CODE>
skip_barrier:
...
It's an extra load, which should be relatively fast as it's always from the same address and should be in L1 most of the time. The JZ should be well-predicted since it'll always be true or false for extended periods of time. Runtime cost is probably not that high, and in exchange, we get to eliminate toggleBarriers() entirely.
Assignee | ||
Comment 1•8 years ago
|
||
I think it's particularly egregious that the mprotect() and patching code touches ALL jitcode, both hot and cold. Also, we would save some non-trivial amount of memory by not keeping the tables around to track the patchable jumps.
Comment 2•8 years ago
|
||
Whoa, nice catch.
Comment 3•8 years ago
|
||
If this does regress benchmarks, one option is to change this only for Baseline and IC code.
Assignee | ||
Comment 4•8 years ago
|
||
I'm not too concerned if this regresses Octane somewhat. If it regresses Speedometer, then yeah.
Comment 5•8 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #4)
> I'm not too concerned if this regresses Octane somewhat. If it regresses
> Speedometer, then yeah.
Speedometer is not the holy grail of benchmarks though. We can take a small hit on a typical |o1.foo = o2| SetProp micro-benchmark, anything more than that is unacceptable.
Assignee | ||
Comment 6•8 years ago
|
||
Of course, I'm not suggesting that Speedometer is a new gold standard. My statement was more about caring less about Octane regressions than we would have in a previous life. I'm going off of speedometer being a better reflection of web page behaviour, where we have more ground to cover than comp-heavy benchmarks than Octane, which we have already paid a lot of attention to.
Assignee | ||
Comment 7•8 years ago
|
||
Ok, it turns out that we need a few more instructions than I previously thought. Namely, since we need a register, and we can't be sure of where we're getting that register, we need a push/pop around the gated barrier code.
I think we can eliminate this in many circumstances where we know that there's a register free. In Ion, we can get the compiler to provide a scratch register, which may come for free. In baseline, we use register ad-hoc, and may be able to use BaselineScratchReg in most circumstances, or some other register that we know is free.
For now, I'm just always doing the push/pop and seeing how well it does.
Comment 8•8 years ago
|
||
There are branchTest32 and branch32 overloads that take AbsoluteAddress + Imm32. Does that help?
Assignee | ||
Comment 9•8 years ago
|
||
I looked around for these, didn't find them. When you say "AbsoluteAddress + Imm32", do you mean a "void (*)(AbsoluteAddress, Imm32)" signature? If so, that's exactly what I need. Where are they defined?
Assignee | ||
Comment 10•8 years ago
|
||
Ah, nm. I kept looking for cmp32, and test32, didn't look for the fused test-and-branch methods. It's been too long since I played with the MacroAssembler.
Assignee | ||
Comment 11•8 years ago
|
||
Initial patch for runtime pre-barriers. It compiles, no testing yet. Turns out there was already a pre-existing MacroAssembler method to do this stuff. It also turns out that the pre-existing method was doing a 'cmpl' while the underlying var was a 'bool', which seemed weird, so I switched the var over to a 'uint32_t'.
Assignee | ||
Comment 12•8 years ago
|
||
Rebased patch. Try shows jit+gc marking failures in Linux and Linux64:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a4fccc4f41f3c5291715e89289e99bd1e421d30
Attachment #8862114 -
Attachment is obsolete: true
Assignee | ||
Comment 13•8 years ago
|
||
Updated patch fixing previous issues. First off, I wasn't actually removing the patchable jump. Secondly, I was using the inverted conditional in the branch instruction. This one passes the jit-tests on commandline. Running in try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f10eea49b062b5886279800f9d99bb3c3b4f2c8c
Attachment #8862454 -
Attachment is obsolete: true
Assignee | ||
Comment 14•8 years ago
|
||
I haven't had time to try this on speedometer yet, but since I was concerned about regressions on octane, I ran this in js-shell against our in-tree octane. 20 runs, lowest and highest scores removed. Reference on left, modified (no-toggleBarriers) on right.
Upshot this is a strict win on even octane:
Richards - 27893 +- 104 26097 +- 166
DeltaBlue - 41144 +- 453 42658 +- 531
Crypto - 26629 +- 121 27320 +- 172
RayTrace - 82233 +- 372 83586 +- 634
EarleyBoyer - 21113 +- 250 22099 +- 168
RegExp - 3135 +- 36 3225 +- 32
Splay - 10905 +- 151 14317 +- 114
SplayLatency - 7451 +- 132 17196 +- 116
NavierStokes - 27744 +- 157 28197 +- 161
PdfJS - 9907 +- 101 10346 +- 228
Mandreel - 21093 +- 230 21987 +- 239
MandreelLatency - 25493 +- 284 26278 +- 378
Gameboy - 33309 +- 853 35900 +- 900
CodeLoad - 15860 +- 124 16279 +- 117
Box2D - 31333 +- 538 31821 +- 669
zlib - 68672 +- 381 69875 +- 441
Typescript - 19585 +- 154 20097 +- 134
Score (version 9) - 21440 +- 82 23306 +- 127
Can someone tell me how to get reliable speedometer numbers?
Assignee | ||
Updated•8 years ago
|
Attachment #8862559 -
Flags: review?(jdemooij)
Attachment #8862559 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 15•8 years ago
|
||
Props to :tcampbell for pointing this out as a potential perf issue.
Assignee | ||
Comment 16•8 years ago
|
||
Latest patch. Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=00e88c8966d3d746e0af7fe1e667e66b1aacb736
Attachment #8862559 -
Attachment is obsolete: true
Attachment #8862559 -
Flags: review?(jdemooij)
Attachment #8862559 -
Flags: review?(jcoppeard)
Attachment #8862609 -
Flags: review?(sphink)
Attachment #8862609 -
Flags: review?(jdemooij)
Updated•8 years ago
|
Attachment #8862609 -
Flags: review?(sphink) → review+
Updated•8 years ago
|
Whiteboard: [qf] → [qf:p1]
Assignee | ||
Comment 17•8 years ago
|
||
After :bz pointed me to other places where |toggleBarriers| is called, I went and cleaned all of those up too. New patch. Going to keep r+ from :sfink since the delta is all related to jitcode.
Attachment #8862609 -
Attachment is obsolete: true
Attachment #8862609 -
Flags: review?(jdemooij)
Attachment #8862718 -
Flags: review?(jdemooij)
Assignee | ||
Comment 18•8 years ago
|
||
Oh yeah, latest try run looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9ddf2d26458b8962cced5365ede363480c09620
Comment 19•8 years ago
|
||
Comment on attachment 8862718 [details] [diff] [review]
runtime-pre-barriers.patch
Review of attachment 8862718 [details] [diff] [review]:
-----------------------------------------------------------------
Simpler and faster, nice! Some suggestions for additional cleanup/removal, leave-open + follow-up patch is fine.
::: js/src/jit/MacroAssembler.h
@@ +1664,4 @@
> }
>
> template <typename T>
> void patchableCallPreBarrier(const T& address, MIRType type) {
Maybe inline callPreBarrier into this one, and rename to just callPreBarrier?
@@ +1667,5 @@
> void patchableCallPreBarrier(const T& address, MIRType type) {
> Label done;
>
> // All barriers are off by default.
> // They are enabled if necessary at the end of CodeGenerator::generate().
Nit: can remove this comment too
@@ +1669,5 @@
>
> // All barriers are off by default.
> // They are enabled if necessary at the end of CodeGenerator::generate().
> +
> + branchTestNeedsIncrementalBarrier(Assembler::Equal, &done);
Nit: branchTestNeedsIncrementalBarrier asserts cond is Zero or NonZero, so Zero would be more consistent (it's the same value at least on x86/ARM).
Btw, branchTestNeedsIncrementalBarrier is also called in CodeGenerator::emitArrayPopShift, but there's no good reason for it. Mind filing a follow-up bug to remove that one?
@@ -1668,5 @@
>
> // All barriers are off by default.
> // They are enabled if necessary at the end of CodeGenerator::generate().
> - CodeOffset nopJump = toggledJump(&done);
> - writePrebarrierOffset(nopJump);
writePrebarrierOffset and the assembler's preBarriers_ can be removed too.
Attachment #8862718 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #19)
> ::: js/src/jit/MacroAssembler.h
> @@ +1664,4 @@
> > }
> >
> > template <typename T>
> > void patchableCallPreBarrier(const T& address, MIRType type) {
>
> Maybe inline callPreBarrier into this one, and rename to just callPreBarrier?
Good call. Inlining actually cleans up the code a bit, especially in the case where we're barriering |MIRType::Value|. The |branchTestGCThing| on the value can jump directly to the final (cacheline-aligned) |done| label, instead of the double-hopping as it would currently be.
I'm renaming the whole thing |guardedCallPreBarrier|.
> writePrebarrierOffset and the assembler's preBarriers_ can be removed too.
Done. Also removed the preBarriers table on the JitCode structs too. This stuff is like spring cleaning. The more you remove, the more you find.
Other nits addressed. There are enough changes here that I'd be more comfortable if you took another look at it, so putting up the delta patch for r?.
Assignee | ||
Comment 21•8 years ago
|
||
Remove more pre barrier related code that is no longer needed. Delta from previous patch.
Attachment #8862852 -
Flags: review?(jdemooij)
Assignee | ||
Comment 22•8 years ago
|
||
Assignee | ||
Comment 23•8 years ago
|
||
> Btw, branchTestNeedsIncrementalBarrier is also called in CodeGenerator::emitArrayPopShift, but there's no good reason for it. Mind filing a follow-up bug to remove that one?
I'm not seeing how we can remove this. Don't we need a pre barrier on pop and shift? I'm not seeing any other code that calls into |patchableCallPreBarrier| in that codegen.
Comment 24•8 years ago
|
||
Comment on attachment 8862852 [details] [diff] [review]
remove-more-pre-barrier-code.patch
Review of attachment 8862852 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM. Just a heads up, I reviewed a patch today that added a patchableCallPreBarrier to CodeGenerator.cpp so you might want to double check on tip.
::: js/src/jit/MacroAssembler.h
@@ +1663,4 @@
> jump(&done);
>
> haltingAlign(8);
> bind(&done);
Pre-existing but I don't know why this haltingAlign is here. We don't emit it elsewhere for similar code paths. If we removed it our code would be a bit smaller and we could rm the jump(&done) right before it. r=me either way.
Attachment #8862852 -
Flags: review?(jdemooij) → review+
Updated•8 years ago
|
Assignee: nobody → kvijayan
Assignee | ||
Comment 25•8 years ago
|
||
Yeah, I ran into a compile error over it. Fixed it in my patch.
This is just a guess, but maybe cache lines? The common case is the guarded code not getting executed. If we jump past it, and the target address lands at the beginning of a cache line, we get a full cache line of instructions to execute in one go.
Not sure how much it's worth though..
Assignee | ||
Comment 26•8 years ago
|
||
Latest try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d58abab48b81a63a6952722d636ce1eb490530e8
Pushing after one last local build.
Assignee | ||
Comment 27•8 years ago
|
||
Final patch rebased and passing try. Forwarding r+.
Attachment #8862718 -
Attachment is obsolete: true
Attachment #8862852 -
Attachment is obsolete: true
Attachment #8863229 -
Flags: review+
Comment 28•8 years ago
|
||
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0879ee58fcdc
Use runtime guards for jitcode pre-barriers instead of patchable jumps. r=jandem r=sfink
Comment 29•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 30•8 years ago
|
||
The verdict on AWFY is a ~5% Octane-richards regression. I'll file a follow-up bug to get rid of that haltingAlign(8) because that still seems silly.
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•