Closed Bug 1419497 Opened 8 years ago Closed 8 years ago

Optimize pre-barriers in JIT code

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: jandem, Assigned: jandem)

Details

Attachments

(2 files)

Pre-barriers are currently pretty slow. The trampoline has to push/pop all volatile registers and that's especially expensive for the float registers. When the GC thing is in the nursery, when it's in a parent runtime, or when its black mark bit is set, the barrier is a no-op. I'll attach patches to check for these cases in the trampoline code before calling into C++. These checks cover the majority of pre-barrier calls on Octane and Speedometer. For the micro-benchmark below I get the following numbers on x64: no incremental GC: 20 ms incremental GC, before: 201 ms incremental GC, after: 67 ms So with these patches pre-barriers are no longer 10x slower but 3.4x slower. function f() { var o = {}; o.x = Math; startgc(1); var t = new Date; for (var i = 0; i < 10000000; i++) o.x = Math; print(new Date - t); } f();
This code is not Ion-specific but shared with Baseline, so we should rename Ion to Jit.
Attachment #8930573 - Flags: review?(jcoppeard)
Most of this is fairly straight-forward. Unfortunately compiling |1 << x| is a mess on x86/x64 - the RHS must be in ecx, so I added some #ifdefs for that.
Attachment #8930574 - Flags: review?(jcoppeard)
Attachment #8930573 - Flags: review?(jcoppeard) → review+
Comment on attachment 8930574 [details] [diff] [review] Part 2 - Inline pre-barrier checks Review of attachment 8930574 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/public/HeapAPI.h @@ +329,5 @@ > static MOZ_ALWAYS_INLINE uintptr_t* > GetGCThingMarkBitmap(const uintptr_t addr) > { > + // Note: the JIT pre-barrier trampolines inline this code. Update that > + // code too when making changes here! Thanks for adding these comments. ::: js/src/jit/MacroAssembler.cpp @@ +3235,5 @@ > +# error "Unknown architecture" > +#endif > + > + // No barrier is needed if the bit is set, |word & mask != 0|. > + branchTestPtr(Assembler::NonZero, temp2, temp1, noBarrier); Wow, this turns out as pretty gnarly machine code. Just a thought: rather than compute the mask and compare with the bitmap word, can we save an instruction by shifting the bitmap word right and checking the bottom bit?
Attachment #8930574 - Flags: review?(jcoppeard) → review+
(In reply to Jan de Mooij [:jandem] from comment #0) > no incremental GC: 20 ms > incremental GC, before: 201 ms > incremental GC, after: 67 ms I just realized we can fold the orq to add ChunkMarkBitmapOffset into the loadPtr (as BaseIndex offset). That improves this to 60-63 ms or so. (In reply to Jon Coppeard (:jonco) from comment #3) > Just a thought: rather than compute the mask and compare with the bitmap > word, can we save an instruction by shifting the bitmap word right and > checking the bottom bit? Interesting idea! I tried this but I didn't see a perf improvement and it might be harder to optimize for the CPU because it's a longer dependency chain - with the current patch the CPU could compute |mask = 1 << bit| while it loads the bitmap word from memory, but if we do |word >> bit| it will be blocked on the load.
Pushed by jandemooij@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c11fb810b8b2 part 1 - Rename Ion to Jit in pre-barrier code. r=jonco https://hg.mozilla.org/integration/mozilla-inbound/rev/092d8e20762a part 2 - Optimize pre-barriers in jit code by handling more cases without calling into C++. r=jonco
(In reply to Jan de Mooij [:jandem] from comment #4) Ah right, that makes sense. Thanks for trying it out.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: