Closed
Bug 453918
Opened 15 years ago
Closed 15 years ago
TM: eliminate silly map_is_native guards on globalObj
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla1.9.1b1
People
(Reporter: vlad, Assigned: brendan)
Details
Attachments
(1 file)
2.55 KB,
patch
|
shaver
:
review+
|
Details | Diff | Splinter Review |
The global-shape guard is pretty expensive, due to the memory access. It generates code like: 0x252f3d: mov 0x98(%edx),%ebx 0x252f43: mov 0x38(%ebx),%ebx 0x252f46: mov (%ebx),%edi 0x252f48: mov 0x4(%edi),%edi 0x252f4b: mov (%edi),%edi 0x252f4d: cmp $0xa5a0e,%edi to read and then compare the current global shape. 18:02 < shaver> but yes, we could do some easy-enough analysis to detect a lot of cases where we're not touching the global 18:04 < vlad> shaver: but yeah, I think that analysis would give a huge boost to a lot of benchmarks, not the least (and most worthless) of which is bitwise-and 18:05 < shaver> vlad: want to file a bug for some discussion on analyses that can rip out the guards?
A couple of random thoughts, in roughly ascending order of performance effect: - we're planning to move the shape into the JSObject, so you'd save at least one load in there - for top-level (and maybe eval) code that's running with COMPILE_N_GO, the location of the global object's shape is trace-constant, so the guard could boil down to a fixed-address load and compare, I think. - it's not clear to me that we need to be checking global shape on trace at all, actually, because all interactions with the global object are converted to use the native stack and then sunk to the side exits, right? (Which makes me wonder how the side exits actually cope with a different shape on exit than on entry, now that I mention it -- might have to read that code again, or ask Andreas to explain it to me. :) ) - most shape mutations are *adding* properties, which I don't think invalidates any previously-cached/compiled slot offsets; it might be worth distinguishing those cases to get better cache behaviour in the interpreter and fewer discarded traces on trace. Want to disable the global shape guard and see what the maximum gain is?
Assignee | ||
Comment 2•15 years ago
|
||
We shouldn't be guarding the shape of the global object. Where is LIR to do that being emitted? Note the (aobj != globalObj) if conditions in test_property_cache. /be
Assignee | ||
Comment 3•15 years ago
|
||
(In reply to comment #1) > - we're planning to move the shape into the JSObject, so you'd save at least > one load in there We need a bigger overhaul than just moving shape, which will make three type-like pointers bloat JSObject (map, clasp, shape) instead of the current two (map and clasp). We need just one, let's call it "type" ;-). This may not make 3.1 but I will get to it. > (Which makes me > wonder how the side exits actually cope with a different shape on exit than on > entry, now that I mention it -- might have to read that code again, or ask > Andreas to explain it to me. :) ) I don't see any such guards on bitwise-and -- I do see bogus map_is_native tests but I am fixing those. > - most shape mutations are *adding* properties, The property cache tracks these by keying off old shape and storing new in vcap. See JSOP_SETPROP/SETNAME case in js_Interpret. > which I don't think invalidates > any previously-cached/compiled slot offsets; it might be worth distinguishing > those cases to get better cache behaviour in the interpreter We already do (you reviewed that patch :-P). > and fewer discarded traces on trace. That's bug 454039 -- fixing ASAP for v8/raytrace.js among others. /be
Assignee | ||
Comment 4•15 years ago
|
||
(In reply to comment #3) > We need a bigger overhaul than just moving shape, which will make three > type-like pointers bloat JSObject (map, clasp, shape) instead of the current > two (map and clasp). We need just one, let's call it "type" ;-). This may not > make 3.1 but I will get to it. Bug 408416. > I don't see any such guards on bitwise-and -- I do see bogus map_is_native > tests but I am fixing those. That's this bug. Taking, patch in a minute. /be
Assignee: general → brendan
Severity: normal → major
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.9.1b1
Assignee | ||
Comment 5•15 years ago
|
||
Testing welcome, I'm short on time atm. Good news is that for bitwise-and, with this patch I see no "guard" anywhere in the debug spew. /be
Attachment #337309 -
Flags: review?(shaver)
Assignee | ||
Updated•15 years ago
|
Summary: TM: reduce number of global-shape guards → TM: eliminate silly map_is_native guards on globalObj
(In reply to comment #3) > > - most shape mutations are *adding* properties, > > The property cache tracks these by keying off old shape and storing new in > vcap. See JSOP_SETPROP/SETNAME case in js_Interpret. > > > which I don't think invalidates > > any previously-cached/compiled slot offsets; it might be worth distinguishing > > those cases to get better cache behaviour in the interpreter > > We already do (you reviewed that patch :-P). Isn't that only for interpreter-added cases, not those added by API callers like the stub-property-defining resolver in XPConnect? Also, it only updates the cache entry for the JSOP_SETNAME/PROP in question, which means that other shape-predicated things (other cache entries, global shape tests at trace entry, shape guards on trace) are invalidated.
Comment on attachment 337309 [details] [diff] [review] proposed fix How is the nativeness of the global object protected at trace execution time? I see a shape test in js_ExecuteTree, but no native-object check there either.
Assignee | ||
Comment 8•15 years ago
|
||
(In reply to comment #6) > > > it might be worth distinguishing > > > those cases to get better cache behaviour in the interpreter ^^^^^^^^^^^ > > > > We already do (you reviewed that patch :-P). > > Isn't that only for interpreter-added cases, ^^^^^^^^^^^ > not those added by API callers > like the stub-property-defining resolver in XPConnect? Yes, only the interpreter fills the cache. You changed your argument. Need evidence that XPConnect should prime the cache -- could just pollute it. That is all small potatoes and not this bug, in any case. > Also, it only updates > the cache entry for the JSOP_SETNAME/PROP in question, That's right, it's a polymorphic "call" site cache. What's the problem? > which means that other > shape-predicated things (other cache entries, Other cache entries see the object in earlier or later shape at different pcs. Note that it's a different object for each earlier to later evolutionary phase, so you don't have the later object with the higher shape being seen as invalid from the earlier pc. Please turn on JS_PROPERTY_CACHE_METERING and have a look. It's possible to make bad cases for the cache, as for trace-trees. That's not what this bug is about, and I haven't actually seen any in the benchmarks yet. > global shape tests at trace > entry, Please read the patch in this bug: + + // We are adding to the global object, so update its saved shape. + JS_ASSERT(obj == globalObj); + traceMonitor->globalShape = OBJ_SHAPE(obj); > shape guards on trace No, see above -- these follow the same evolution the interpreter sees when filling the cache by pc. > ) are invalidated. I'm not sure you understand how this stuff works. I hope this comment helps. If not, mail me. If you know of a way to partially order shapes so that adding does not invalidate, I'm all ears. It would seem to require spelling shapes as numbers with a big enough radix that there's room to evolve (as if adding kids to a parent node in a tree) from shape X to multiple X', X'', etc. -- all with common high order bits. I know of no such way. The hidden classes in V8 correspond to shapes, but allow for making subclass is-a superclass judgments that shapes do not allow. This is the "type" idea I mentioned in comment 3, but it should wait -- especially since we have zero evidence (including this bug!) of any problem with the current shape system in the benchmarks we are working on. Evidence! /be
Assignee | ||
Comment 9•15 years ago
|
||
(In reply to comment #7) > (From update of attachment 337309 [details] [diff] [review]) > How is the nativeness of the global object protected at trace execution time? Global objects must be native in SpiderMonkey. Lots of stuff counts on this, including js_FindClassObject and of course existing jstracer.cpp code including this line: if (OBJ_SCOPE(JS_GetGlobalForObject(cx, cx->fp->scopeChain))->title.ownercx != cx) { This has been true forever. This is why this bug's patch has + JS_ASSERT(OBJ_IS_NATIVE(globalObj)); Is it dev-doc-needed? Attempting to make a foreign object be a global object will fail spectacularly, better feedback than any doc :-P. /be
I didn't intend to change my argument, but I guess I didn't express it clearly. By "interpreter performance" I was referring to the cache as used by the interpreter, not that all mutations happened by the interpreter. You're right to point out that we don't know that the DOM or other native-mutator cases are costing us. Andreas and I saw in at least one benchmark that we were getting shape mismatches due to "extra" properties; I believe that it was on 3d-raytrace or another similar raytracer, which annotated some of its Triangle objects with an extra property or two for material/colour/whatever. This caused us to fall off trace when the traced visitor function encountered an augmented Triangle, and I believe that the cache would miss similarly. That led me to originally speculate that we could improve cache performance with a "major and minor" shape scheme, wherein the minor number increased as properties were added, and major as properties were deleted. It may not be feasible to do so, as you point out, or even worthwhile if the number of shapes at a given location is small, and we can use multiple trees to cope in the traced case. I had forgotten that we required natives for globals, indeed. I wonder if the failure mode is clear enough for people to determine the solution without documentation, but I haven't seen it pop up otherwise on dev-tech-js-engine.
Assignee | ||
Comment 11•15 years ago
|
||
(In reply to comment #10) > Andreas and I saw in at least one benchmark that we were getting shape > mismatches due to "extra" properties; I believe that it was on 3d-raytrace or > another similar raytracer, which annotated some of its Triangle objects with an > extra property or two for material/colour/whatever. This was a bug, I forget which. It could have been bug 454266. > It may not be feasible to do so, as you > point out, or even worthwhile if the number of shapes at a given location is > small, and we can use multiple trees to cope in the traced case. It's not necessary for most cases, where a new object evolves predictably and each pc sees the same before and after shapes. There is no shape mismatch; you get 99.999% cache hit rates. Suspect cache misses, they are probably bugs in the cache and/or shape maintenance code. We shipped Firefox 3 with more than a few such bugs, but still won on perf at the time. More wins in 3.1. Review soon? My mq is full! /be
Comment on attachment 337309 [details] [diff] [review] proposed fix Haven't had a chance to test, but the patch itself looks fine. If it tests well, r=shaver!
Comment on attachment 337309 [details] [diff] [review] proposed fix ahem.
Attachment #337309 -
Flags: review?(shaver) → review+
Assignee | ||
Comment 14•15 years ago
|
||
Fixed in tracemonkey: http://hg.mozilla.org/tracemonkey/rev/4fe759fa6016 /be
Reporter | ||
Comment 15•15 years ago
|
||
After this patch, bitwise-and lost some loads, but it gained a call to BoxInt32 in the middle of the trace: import vp=0x806924 name=$global0 type=int flags=0 import vp=0x806920 name=$global1 type=int flags=0 state = param ecx param1 = param edx sp = ld state[0] rp = ld state[4] cx = ld state[12] gp = ld state[8] eos = ld state[JSVAL_ERROR_COOKIE] eor = ld state[20] ld1 = ld cx[152] ld2 = ld ld1[56] ld3 = ld ld2[0] sp[0] = ld2 ld4 = ld cx[152] ld5 = ld ld4[56] ld6 = ld ld5[0] ld7 = ld gp[616] $global0 = i2f ld7 sp[8] = ld7 ld8 = ld gp[608] $global1 = i2f ld8 sp[16] = ld8 and1 = and ld7, ld8 i2f1 = i2f and1 sp[8] = and1 ld9 = ld cx[152] ld10 = ld ld9[56] ld11 = ld ld10[0] gp[616] = and1 BoxInt321 = BoxInt32 ( cx and1 ) eq1 = eq BoxInt321, JSVAL_ERROR_COOKIE xt1: xt eq1 -> 0x30282f sp+16 rp+0 ld12 = ld ld2[28] ld12[288] = BoxInt321 add1 = add ld8, 1 ov1 = ov add1 xt2: xt ov1 -> 0x302833 sp+0 rp+0 i2f2 = i2f add1 sp[0] = ld8 gp[608] = add1 sp[0] = add1 sp[8] = #0x927c0 lt1 = lt add1, #0x927c0 xf1: xf lt1 -> 0x30283e sp+16 rp+0 sp[0] = lt1 gp[616] = and1 gp[608] = add1 loop
Reporter | ||
Comment 16•15 years ago
|
||
False alarm -- it's 454039 that did that.
Assignee | ||
Comment 17•15 years ago
|
||
Yeah, this one was a good speedup for a bunch of benchmarks that use globals. Sorry about the bug 454039 mess. New patch there tests well. /be
Assignee | ||
Comment 18•15 years ago
|
||
Fixed on m-c: http://hg.mozilla.org/mozilla-central/rev/4fe759fa6016 /be
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•