Closed Bug 686017 Opened 13 years ago Closed 13 years ago

Cell::arenaHeader() should be avoided on the fast paths

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: igor, Assigned: igor)

References

Details

Attachments

(1 file, 1 obsolete file)

60.59 KB, patch
bhackett1024
: review+
Details | Diff | Splinter Review
Currently in a few places outside the GC we use Cell::arenaHeader() to access the compartment for the GC thing or its allocation kind. I discovered when experimenting with alternative layout for GC things such access is not free as it involves some arithmetic operations and a read from the start of the arena which may not be in cache. 

In particular, changing JSObject::nativeSearch to use cx to get the JSRuntime instance, not Cell::compartment()->rt, and calculating in CopyInitializerObject the allocation kind based on the number of the fixed slots in the object, and not using Cell::allocKind, resulted in a 0.5% win in V8. the most of the improvement comes from the deltablue benchmark that is 2.6% faster with the patch. The patch does not affect SunSpider scores.
Attached patch v1 (obsolete) — Splinter Review
This patch implements the above mention elimination of Cell::compartments()->rt and Cell::allocKind() calls in couple of places. As the patch adds JSContext *cx parameter to  JSObject::nativeSearch that affected quite a number callers and their signatures.
Attachment #559581 - Flags: review?(bhackett1024)
Attached patch v1Splinter Review
The previous attachment contains unrelated changes.
Attachment #559581 - Attachment is obsolete: true
Attachment #559581 - Flags: review?(bhackett1024)
Attachment #559585 - Flags: review?(bhackett1024)
The patch here allows to minimize the regressions from the smaller chunk sizes and different chunk layouts in bug 671702.
Blocks: 671702
Blocks: 686144
Comment on attachment 559581 [details] [diff] [review]
v1

Makes sense, I've seen this cost for accessing things from the arena header before.  I'm surprised that changing CopyInitializerObject made a difference, though.  With TI disabled this function will get called a lot, but with TI enabled initializer objects should normally be constructed in jitcode.  Do you know how much CopyInitializerObject gets called in v8bench with TI enabled?
Attachment #559581 - Flags: review+
Attachment #559585 - Flags: review?(bhackett1024) → review+
(In reply to Brian Hackett from comment #4)
>I'm surprised that changing CopyInitializerObject made a
> difference, though.  With TI disabled this function will get called a lot,
> but with TI enabled initializer objects should normally be constructed in
> jitcode.  Do you know how much CopyInitializerObject gets called in v8bench
> with TI enabled?

With TI it is called just 7141 times during splay. Without TI the number goes to 509056. So maybe the speedup from that chamge is just a noise. On the other hand with 100 v8 runs on a system that just runs a VNC session and a shell to run the benchmark I did observe speedup around one 1ms from that extra change. That can be explained if those 7141 calls all resulted in cache misses. That would translate into 250*7141 or about 1.7e6 cycles and that is about 0.5ms. So the numbers roughly match. The question does the cache miss theory sounds plausible?
(In reply to Igor Bukanov from comment #5)
> With TI it is called just 7141 times during splay. Without TI the number
> goes to 509056. So maybe the speedup from that chamge is just a noise. On
> the other hand with 100 v8 runs on a system that just runs a VNC session and
> a shell to run the benchmark I did observe speedup around one 1ms from that
> extra change. That can be explained if those 7141 calls all resulted in
> cache misses. That would translate into 250*7141 or about 1.7e6 cycles and
> that is about 0.5ms. So the numbers roughly match. The question does the
> cache miss theory sounds plausible?

Yeah, that sounds reasonable.  Once we have a generational GC and aren't constantly taking cache misses during allocation these ancillary sources of misses will become more and more important to track down.
http://hg.mozilla.org/mozilla-central/rev/f3908eb90151
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: