Closed
Bug 686017
Opened 11 years ago
Closed 11 years ago
Cell::arenaHeader() should be avoided on the fast paths
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla9
People
(Reporter: igor, Assigned: igor)
References
Details
Attachments
(1 file, 1 obsolete file)
60.59 KB,
patch
|
bhackett1024
:
review+
|
Details | Diff | Splinter Review |
Currently in a few places outside the GC we use Cell::arenaHeader() to access the compartment for the GC thing or its allocation kind. I discovered when experimenting with alternative layout for GC things such access is not free as it involves some arithmetic operations and a read from the start of the arena which may not be in cache. In particular, changing JSObject::nativeSearch to use cx to get the JSRuntime instance, not Cell::compartment()->rt, and calculating in CopyInitializerObject the allocation kind based on the number of the fixed slots in the object, and not using Cell::allocKind, resulted in a 0.5% win in V8. the most of the improvement comes from the deltablue benchmark that is 2.6% faster with the patch. The patch does not affect SunSpider scores.
Assignee | ||
Comment 1•11 years ago
|
||
This patch implements the above mention elimination of Cell::compartments()->rt and Cell::allocKind() calls in couple of places. As the patch adds JSContext *cx parameter to JSObject::nativeSearch that affected quite a number callers and their signatures.
Attachment #559581 -
Flags: review?(bhackett1024)
Assignee | ||
Comment 2•11 years ago
|
||
The previous attachment contains unrelated changes.
Attachment #559581 -
Attachment is obsolete: true
Attachment #559581 -
Flags: review?(bhackett1024)
Attachment #559585 -
Flags: review?(bhackett1024)
Assignee | ||
Comment 3•11 years ago
|
||
The patch here allows to minimize the regressions from the smaller chunk sizes and different chunk layouts in bug 671702.
Blocks: 671702
Comment 4•11 years ago
|
||
Comment on attachment 559581 [details] [diff] [review] v1 Makes sense, I've seen this cost for accessing things from the arena header before. I'm surprised that changing CopyInitializerObject made a difference, though. With TI disabled this function will get called a lot, but with TI enabled initializer objects should normally be constructed in jitcode. Do you know how much CopyInitializerObject gets called in v8bench with TI enabled?
Attachment #559581 -
Flags: review+
Updated•11 years ago
|
Attachment #559585 -
Flags: review?(bhackett1024) → review+
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Brian Hackett from comment #4) >I'm surprised that changing CopyInitializerObject made a > difference, though. With TI disabled this function will get called a lot, > but with TI enabled initializer objects should normally be constructed in > jitcode. Do you know how much CopyInitializerObject gets called in v8bench > with TI enabled? With TI it is called just 7141 times during splay. Without TI the number goes to 509056. So maybe the speedup from that chamge is just a noise. On the other hand with 100 v8 runs on a system that just runs a VNC session and a shell to run the benchmark I did observe speedup around one 1ms from that extra change. That can be explained if those 7141 calls all resulted in cache misses. That would translate into 250*7141 or about 1.7e6 cycles and that is about 0.5ms. So the numbers roughly match. The question does the cache miss theory sounds plausible?
Comment 6•11 years ago
|
||
(In reply to Igor Bukanov from comment #5) > With TI it is called just 7141 times during splay. Without TI the number > goes to 509056. So maybe the speedup from that chamge is just a noise. On > the other hand with 100 v8 runs on a system that just runs a VNC session and > a shell to run the benchmark I did observe speedup around one 1ms from that > extra change. That can be explained if those 7141 calls all resulted in > cache misses. That would translate into 250*7141 or about 1.7e6 cycles and > that is about 0.5ms. So the numbers roughly match. The question does the > cache miss theory sounds plausible? Yeah, that sounds reasonable. Once we have a generational GC and aren't constantly taking cache misses during allocation these ancillary sources of misses will become more and more important to track down.
Assignee | ||
Comment 7•11 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/f3908eb90151
Comment 8•11 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/f3908eb90151
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
You need to log in
before you can comment on or make changes to this bug.
Description
•