Closed Bug 453918 Opened 15 years ago Closed 15 years ago

TM: eliminate silly map_is_native guards on globalObj

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1b1

People

(Reporter: vlad, Assigned: brendan)

Details

Attachments

(1 file)

The global-shape guard is pretty expensive, due to the memory access.  It generates code like:  

0x252f3d:	mov    0x98(%edx),%ebx
0x252f43:	mov    0x38(%ebx),%ebx
0x252f46:	mov    (%ebx),%edi
0x252f48:	mov    0x4(%edi),%edi
0x252f4b:	mov    (%edi),%edi
0x252f4d:	cmp    $0xa5a0e,%edi

to read and then compare the current global shape.

18:02 < shaver> but yes, we could do some easy-enough analysis to detect a lot 
                of cases where we're not touching the global
18:04 < vlad> shaver: but yeah, I think that analysis would give a huge boost 
              to a lot of benchmarks, not the least (and most worthless) of 
              which is bitwise-and
18:05 < shaver> vlad: want to file a bug for some discussion on analyses that 
                can rip out the guards?
A couple of random thoughts, in roughly ascending order of performance effect:

- we're planning to move the shape into the JSObject, so you'd save at least one load in there

- for top-level (and maybe eval) code that's running with COMPILE_N_GO, the location of the global object's shape is trace-constant, so the guard could boil down to a fixed-address load and compare, I think.

- it's not clear to me that we need to be checking global shape on trace at all, actually, because all interactions with the global object are converted to use the native stack and then sunk to the side exits, right?  (Which makes me wonder how the side exits actually cope with a different shape on exit than on entry, now that I mention it -- might have to read that code again, or ask Andreas to explain it to me. :) )

- most shape mutations are *adding* properties, which I don't think invalidates any previously-cached/compiled slot offsets; it might be worth distinguishing those cases to get better cache behaviour in the interpreter and fewer discarded traces on trace.

Want to disable the global shape guard and see what the maximum gain is?
We shouldn't be guarding the shape of the global object. Where is LIR to do that being emitted? Note the (aobj != globalObj) if conditions in test_property_cache.

/be
(In reply to comment #1)
> - we're planning to move the shape into the JSObject, so you'd save at least
> one load in there

We need a bigger overhaul than just moving shape, which will make three type-like pointers bloat JSObject (map, clasp, shape) instead of the current two (map and clasp). We need just one, let's call it "type" ;-). This may not make 3.1 but I will get to it.

> (Which makes me
> wonder how the side exits actually cope with a different shape on exit than on
> entry, now that I mention it -- might have to read that code again, or ask
> Andreas to explain it to me. :) )

I don't see any such guards on bitwise-and -- I do see bogus map_is_native tests but I am fixing those.

> - most shape mutations are *adding* properties,

The property cache tracks these by keying off old shape and storing new in vcap. See JSOP_SETPROP/SETNAME case in js_Interpret.

> which I don't think invalidates
> any previously-cached/compiled slot offsets; it might be worth distinguishing
> those cases to get better cache behaviour in the interpreter

We already do (you reviewed that patch :-P).

> and fewer discarded traces on trace.

That's bug 454039 -- fixing ASAP for v8/raytrace.js among others.

/be
(In reply to comment #3)
> We need a bigger overhaul than just moving shape, which will make three
> type-like pointers bloat JSObject (map, clasp, shape) instead of the current
> two (map and clasp). We need just one, let's call it "type" ;-). This may not
> make 3.1 but I will get to it.

Bug 408416.

> I don't see any such guards on bitwise-and -- I do see bogus map_is_native
> tests but I am fixing those.

That's this bug. Taking, patch in a minute.

/be
Assignee: general → brendan
Severity: normal → major
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.9.1b1
Attached patch proposed fixSplinter Review
Testing welcome, I'm short on time atm. Good news is that for bitwise-and, with this patch I see no "guard" anywhere in the debug spew.

/be
Attachment #337309 - Flags: review?(shaver)
Summary: TM: reduce number of global-shape guards → TM: eliminate silly map_is_native guards on globalObj
(In reply to comment #3)
> > - most shape mutations are *adding* properties,
> 
> The property cache tracks these by keying off old shape and storing new in
> vcap. See JSOP_SETPROP/SETNAME case in js_Interpret.
> 
> > which I don't think invalidates
> > any previously-cached/compiled slot offsets; it might be worth distinguishing
> > those cases to get better cache behaviour in the interpreter
> 
> We already do (you reviewed that patch :-P).

Isn't that only for interpreter-added cases, not those added by API callers like the stub-property-defining resolver in XPConnect?  Also, it only updates the cache entry for the JSOP_SETNAME/PROP in question, which means that other shape-predicated things (other cache entries, global shape tests at trace entry, shape guards on trace) are invalidated.
Comment on attachment 337309 [details] [diff] [review]
proposed fix

How is the nativeness of the global object protected at trace execution time?  I see a shape test in js_ExecuteTree, but no native-object check there either.
(In reply to comment #6)
> > > it might be worth distinguishing
> > > those cases to get better cache behaviour in the interpreter
                                                       ^^^^^^^^^^^

> > 
> > We already do (you reviewed that patch :-P).
> 
> Isn't that only for interpreter-added cases,
                      ^^^^^^^^^^^

> not those added by API callers
> like the stub-property-defining resolver in XPConnect?

Yes, only the interpreter fills the cache. You changed your argument.

Need evidence that XPConnect should prime the cache -- could just pollute it. That is all small potatoes and not this bug, in any case.

> Also, it only updates
> the cache entry for the JSOP_SETNAME/PROP in question,

That's right, it's a polymorphic "call" site cache. What's the problem?

> which means that other
> shape-predicated things (other cache entries,

Other cache entries see the object in earlier or later shape at different pcs. Note that it's a different object for each earlier to later evolutionary phase, so you don't have the later object with the higher shape being seen as invalid from the earlier pc. Please turn on JS_PROPERTY_CACHE_METERING and have a look.

It's possible to make bad cases for the cache, as for trace-trees. That's not what this bug is about, and I haven't actually seen any in the benchmarks yet.

> global shape tests at trace
> entry,

Please read the patch in this bug:

+
+                // We are adding to the global object, so update its saved shape.
+                JS_ASSERT(obj == globalObj);
+                traceMonitor->globalShape = OBJ_SHAPE(obj);

> shape guards on trace

No, see above -- these follow the same evolution the interpreter sees when filling the cache by pc.

> ) are invalidated.

I'm not sure you understand how this stuff works. I hope this comment helps. If not, mail me.

If you know of a way to partially order shapes so that adding does not invalidate, I'm all ears. It would seem to require spelling shapes as numbers with a big enough radix that there's room to evolve (as if adding kids to a parent node in a tree) from shape X to multiple X', X'', etc. -- all with common high order bits. I know of no such way.

The hidden classes in V8 correspond to shapes, but allow for making subclass is-a superclass judgments that shapes do not allow. This is the "type" idea I mentioned in comment 3, but it should wait -- especially since we have zero evidence (including this bug!) of any problem with the current shape system in the benchmarks we are working on.

Evidence!

/be
(In reply to comment #7)
> (From update of attachment 337309 [details] [diff] [review])
> How is the nativeness of the global object protected at trace execution time?

Global objects must be native in SpiderMonkey. Lots of stuff counts on this, including js_FindClassObject and of course existing jstracer.cpp code including this line:

    if (OBJ_SCOPE(JS_GetGlobalForObject(cx, cx->fp->scopeChain))->title.ownercx != cx) {

This has been true forever.

This is why this bug's patch has

+    JS_ASSERT(OBJ_IS_NATIVE(globalObj));

Is it dev-doc-needed? Attempting to make a foreign object be a global object will fail spectacularly, better feedback than any doc :-P.

/be
I didn't intend to change my argument, but I guess I didn't express it clearly.  By "interpreter performance" I was referring to the cache as used by the interpreter, not that all mutations happened by the interpreter.  You're right to point out that we don't know that the DOM or other native-mutator cases are costing us.

Andreas and I saw in at least one benchmark that we were getting shape mismatches due to "extra" properties; I believe that it was on 3d-raytrace or another similar raytracer, which annotated some of its Triangle objects with an extra property or two for material/colour/whatever.  This caused us to fall off trace when the traced visitor function encountered an augmented Triangle, and I believe that the cache would miss similarly.  That led me to originally speculate that we could improve cache performance with a "major and minor" shape scheme, wherein the minor number increased as properties were added, and major as properties were deleted.  It may not be feasible to do so, as you point out, or even worthwhile if the number of shapes at a given location is small, and we can use multiple trees to cope in the traced case.

I had forgotten that we required natives for globals, indeed.  I wonder if the failure mode is clear enough for people to determine the solution without documentation, but I haven't seen it pop up otherwise on dev-tech-js-engine.
(In reply to comment #10)
> Andreas and I saw in at least one benchmark that we were getting shape
> mismatches due to "extra" properties; I believe that it was on 3d-raytrace or
> another similar raytracer, which annotated some of its Triangle objects with an
> extra property or two for material/colour/whatever.

This was a bug, I forget which. It could have been bug 454266.

> It may not be feasible to do so, as you
> point out, or even worthwhile if the number of shapes at a given location is
> small, and we can use multiple trees to cope in the traced case.

It's not necessary for most cases, where a new object evolves predictably and each pc sees the same before and after shapes. There is no shape mismatch; you get 99.999% cache hit rates.

Suspect cache misses, they are probably bugs in the cache and/or shape maintenance code. We shipped Firefox 3 with more than a few such bugs, but still won on perf at the time. More wins in 3.1.

Review soon? My mq is full!

/be
Comment on attachment 337309 [details] [diff] [review]
proposed fix

Haven't had a chance to test, but the patch itself looks fine.  If it tests well, r=shaver!
Comment on attachment 337309 [details] [diff] [review]
proposed fix

ahem.
Attachment #337309 - Flags: review?(shaver) → review+
After this patch, bitwise-and lost some loads, but it gained a call to BoxInt32 in the middle of the trace:

import vp=0x806924 name=$global0 type=int flags=0
import vp=0x806920 name=$global1 type=int flags=0
    state = param ecx
    param1 = param edx
    sp = ld state[0]
    rp = ld state[4]
    cx = ld state[12]
    gp = ld state[8]
    eos = ld state[JSVAL_ERROR_COOKIE]
    eor = ld state[20]
    ld1 = ld cx[152]
    ld2 = ld ld1[56]
    ld3 = ld ld2[0]
    sp[0] = ld2
    ld4 = ld cx[152]
    ld5 = ld ld4[56]
    ld6 = ld ld5[0]
    ld7 = ld gp[616]
    $global0 = i2f ld7
    sp[8] = ld7
    ld8 = ld gp[608]
    $global1 = i2f ld8
    sp[16] = ld8
    and1 = and ld7, ld8
    i2f1 = i2f and1
    sp[8] = and1
    ld9 = ld cx[152]
    ld10 = ld ld9[56]
    ld11 = ld ld10[0]
    gp[616] = and1
    BoxInt321 = BoxInt32 ( cx and1 )
    eq1 = eq BoxInt321, JSVAL_ERROR_COOKIE
    xt1: xt eq1 -> 0x30282f sp+16 rp+0

    ld12 = ld ld2[28]
    ld12[288] = BoxInt321
    add1 = add ld8, 1
    ov1 = ov add1
    xt2: xt ov1 -> 0x302833 sp+0 rp+0

    i2f2 = i2f add1
    sp[0] = ld8
    gp[608] = add1
    sp[0] = add1
    sp[8] = #0x927c0
    lt1 = lt add1, #0x927c0
    xf1: xf lt1 -> 0x30283e sp+16 rp+0

    sp[0] = lt1
    gp[616] = and1
    gp[608] = add1
    loop
False alarm -- it's 454039 that did that.
Yeah, this one was a good speedup for a bunch of benchmarks that use globals.

Sorry about the bug 454039 mess. New patch there tests well.

/be
Fixed on m-c:

http://hg.mozilla.org/mozilla-central/rev/4fe759fa6016

/be
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.