JM: Make tracing with fat values as fast as tracing with 32-bit values

RESOLVED FIXED

Status

()

RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: dmandelin, Unassigned)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

8 years ago
We are about 6% slower as of this rev:

changeset:   43386:6f5c1e641cf1
tag:         qparent
user:        David Mandelin <dmandelin@mozilla.com>
date:        Fri Jun 11 11:29:04 2010 -0700
summary:     Make sure dense_grow gets inlined into the tracer functions that call it
(Reporter)

Updated

8 years ago
Depends on: 571625
(Reporter)

Updated

8 years ago
Depends on: 571974
(Reporter)

Updated

8 years ago
Depends on: 572029
(Reporter)

Comment 1

8 years ago
I tried a fatval+tracing browser against a TM browser (from the last TM rev merged into fatval) on the 4 big benchmarks:

                   FV        TM
SunSpider         544       671
V8 v5            1225      1445
Peacekeeper      3400      3351
Dromaeo           166       145

So, it seems to be good (higher is better on everything except SS). 

There is one thing I am still concerned with. On Peacekeeper, the FV build scored 20-30% slower on the canvas section of the test. As part of getting tracing to work, I took out the traceable natives for gfx stuff, because supposedly the standard quickstubs should work fine from trace. So we might have slowed down there either because that's not true, or because the traceable natives are much faster to call (the latter seems unlikely, because they mostly delegate to the standard version). I'll look into it.
(Reporter)

Comment 2

8 years ago
Luke caught a typo in the last one. I typed in the "splay" subscore for V8 in TM. Here are the correct results:

                   FV        TM
SunSpider         544       671
V8 v5            1225       812
Peacekeeper      3400      3351
Dromaeo           166       145
> because supposedly the standard quickstubs should work fine from trace.

They don't.  They give you the worst of all worlds: they trace and then deep bail on every invocation due to doing JS_THIS_OBJECT stuff.  We have bugs on this.

Comment 4

8 years ago
Alright. That explains it then. Have to fix the traceable native stubs.
(Reporter)

Comment 5

8 years ago
(In reply to comment #4)
> Alright. That explains it then. Have to fix the traceable native stubs.

Actually, removing the traceable natives from a TM build didn't affect the Peacekeeper score, so that change has some other cause.

Still have to revive the traceable natives of course, per bz.
David, is there a particular canvas subtest we get slower on?
(Reporter)

Updated

8 years ago
Depends on: 572593
(Reporter)

Comment 7

8 years ago
(In reply to comment #6)
> David, is there a particular canvas subtest we get slower on?

Is there a way to show that? When I run Peacekeeper, it just gives me a total score, which I can click on to get 6 subscores. But I don't see a way to get scores for the individual tests inside the subscores.

Do we have source code for the Peacekeeper tests?

Comment 9

8 years ago
peterv did the code originally.
Note that if the only TNs which got removed were webgl ones, then the lack of effect is expected; peacekeeper doesn't test webgl.
(In reply to comment #10)
> Note that if the only TNs which got removed were webgl ones, then the lack of
> effect is expected;

It looks like all traceable natives were disabled: http://hg.mozilla.org/users/lwagner_mozilla.com/fatval/rev/40d066081654

Comment 12

8 years ago
peter, dave disabled tracable natives because jsvals are now 64-bit and should be passed by reference, not by value.
Quickstubs use jsval for two things, arguments and return values.

For arguments we take jsval when we mean |JSObject* or null or undefined| (xpc_qsUnwrapObj translates undefined into null). On the fatval branch it looks like we now can only define a traceable native that takes a non-null JSObject*?

For return values we often return a JSVAL_VOID because we don't want to return anything. I don't see a way to define a traceable native that doesn't have a return value? (There's also strings, but I'll have to check with bent why those don't return JSString*)
(Reporter)

Comment 14

8 years ago
(In reply to comment #13)
> Quickstubs use jsval for two things, arguments and return values.
> 
> For arguments we take jsval when we mean |JSObject* or null or undefined|
> (xpc_qsUnwrapObj translates undefined into null). On the fatval branch it looks
> like we now can only define a traceable native that takes a non-null JSObject*?

In fatvals, the jsval-related argument types in builtins.h are VALUEPTR 
(Value *) and CVALUEPTR (const Value *). 'Value' is bit-for-bit identical with 'jsval'. We changed other builtins (e.g. in builtins.cpp) to use that new type.

The thing *I* don't understand about jsval parameters in the WebGL traceable natives is who calls these functions exactly and what parameter types are OK for the caller. In other words, what you have to change if you change the signature.

> For return values we often return a JSVAL_VOID because we don't want to return
> anything. 

Presumably UINT32 would work here.

> I don't see a way to define a traceable native that doesn't have a
> return value? (There's also strings, but I'll have to check with bent why those
> don't return JSString*)

I don't see anything other than JSVAL_VOID returned in CustomQS_WebGL.h. For some reason, I previously thought I had seen functions that return jsvals read out of arrays, but now I don't see that. So the return type problem may not even exist, which would be great.

Updated

8 years ago
Depends on: 572798
(Reporter)

Updated

8 years ago
Depends on: 572807
(In reply to comment #14)
> In fatvals, the jsval-related argument types in builtins.h are VALUEPTR 
> (Value *) and CVALUEPTR (const Value *). 'Value' is bit-for-bit identical with
> 'jsval'. We changed other builtins (e.g. in builtins.cpp) to use that new type.

But when I tried using VALUEPTR and CVALUEPTR with the quickstubs code I got errors, because pch and ach are --.

> I don't see anything other than JSVAL_VOID returned in CustomQS_WebGL.h.

CustomQS_WebGL.h are just a small part of the quickstubs. Most of them are autogenerated, see OBJDIR/js/src/xpconnect/src/dom_quickstubs.cpp

Comment 16

8 years ago
VALUEPTR can be used for arguments, but not for return values (currently). That should be enough for the quickstubs stuff though. Just use INT32 as return value for now and ignore the return value. Outparam for the return jsval if its not JSVAL_VOID.
Created attachment 452094 [details] [diff] [review]
WIP

This is how far I've gotten, doesn't compile yet. I think I managed to do what gal told me to for js::Value arguments, those should work now.
But I'm not entirely sure how to proceed for the js::Value out param though. It seems like the signature of the native should change to take an additional js::Value* argument which should be set to a js::Value allocated on the stack. I'm not sure what type the return value of the native should be though. I also can't figure out how to change the JS_DEFINE_CALLINFO_ macros to add the additional argument based on the type of the return value.
(Reporter)

Comment 18

8 years ago
(In reply to comment #17)
> Created an attachment (id=452094) [details]
> WIP
> 
> This is how far I've gotten, doesn't compile yet. I think I managed to do what
> gal told me to for js::Value arguments, those should work now.
> But I'm not entirely sure how to proceed for the js::Value out param though. It
> seems like the signature of the native should change to take an additional
> js::Value* argument which should be set to a js::Value allocated on the stack.
> I'm not sure what type the return value of the native should be though. I also
> can't figure out how to change the JS_DEFINE_CALLINFO_ macros to add the
> additional argument based on the type of the return value.

It looks to me like all the _tn functions are really returning one of: a JSObject*, a JSString*, or void. If so, then I think we could just change the return type, change how the return value is converted, and remove JSTN_UNBOX_AFTER from their flags. It seems that error reporting is done via a side channel, so the exact return value in the error case doesn't matter. For voids, uint32 is fine.

If some _tn function really does need to return a jsval, then I think we can proceed as follows:

  - change the return type to uint32 (void once we can support it), and always
    return 0.
  - add a Value* argument at the end of the signature.
  - add VALUEPTR to the end of the CALLINFO/TRCINFO argument list.
  - create a way for the recorder to know a traceable native has an outparam
    like this, perhaps a new flag.
  - add a new member, similar to native_rval_ins, to remember the LIns for
    the Value* outparam between recording the call and recording call
    completion.
  - In record_NativeCallComplete, if the flag is set, unbox from the outparam
    LIns instead of native_rval_ins.
Note that DOMStrings can be null, returning null in that case would be ok?
Created attachment 452276 [details] [diff] [review]
WIP

This crashes, still trying to figure out why.
Attachment #452094 - Attachment is obsolete: true
It consistently crashes at browser.js:4528@77:

entering trace at chrome://browser/content/browser.js:4528@77, native stack slots: 15 code: 0x177dfbd0

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x177dfd28 in ?? ()

It looks related to the getElementById quickstub. I can start up the browser, I just have to put a breakpoint in that quickstub and continue repeatedly.
Here's the crash:

undefined Looking for compat peer 4528@77, from 0x1b0e954 (ip: 0x19df981)
checking vm types 0x1b0e954 (ip: 0x19df981): args0=F/F args1=O/O args2=O/O arguments0=N/N scopeChain0=O/O var0=O/O var1=O/O var2=S/S var3=U/U stack0=O/O global0=O/O 
args0: function<0x15e6f0e0:nonBrowserWindowStartup> args1: object<0x16774620:ChromeWindow> args2: object<0x167a3fc0:Event> arguments0: null scopeChain0: object<0x16701e00:ChromeWindow> var0: object<0x167a7000:Array> var1: object<0x167a7ce8:XULElement> var2: string<0x2af5f0> var3: undefined stack0: object<0x167a7038:Iterator> global0: object<0x167829a0:XULDocument> 
entering trace at chrome://browser/content/browser.js:4528@77, native stack slots: 15 code: 0x177dfde5

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x177dff3d in ?? ()


The disassembly:

0x177dff15:	call   0x12b8d685 <_ZL32nsIDOMDocument_GetElementById_tnP9JSContextP8JSObjectS2_P8JSString>
0x177dff1a:	sub    $0x8,%esp
0x177dff1d:	mov    %eax,%edx
0x177dff1f:	mov    %edx,0x8(%edi)
0x177dff22:	movl   $0x0,0x1fc(%ebx)
0x177dff2c:	mov    0x4c(%esi),%eax
0x177dff2f:	test   %eax,%eax
0x177dff31:	jne    0x177eff33
0x177dff37:	mov    %edx,-0x18(%edi)
0x177dff3a:	mov    %edx,0x8(%edi)
0x177dff3d:	mov    (%edx),%ecx


I think this is the corresponding JIT spew:

                            RR edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      sti.s sp[-24] = nsIDOMDocument_GetElementById_tn1
  00177dff37   mov -24(edi),edx
                            RR edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      sti.s sp[8] = nsIDOMDocument_GetElementById_tn1
  00177dff3a   mov 8(edi),edx
                            RR edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      map3 = ldi.o nsIDOMDocument_GetElementById_tn1[0]
  00177dff3d   mov ecx,0(edx)
                            RR ecx(map3) edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      shape3 = ldi.o map3[4]
  00177dff3f   mov eax,4(ecx)
                            RR eax(shape3) edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      immi5 = immi 61318
                            RR eax(shape3) edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      guard_kshape2 = eqi shape3, immi5/*61318*/
                            RR eax(shape3) edx(nsIDOMDocument_GetElementById_tn1) ebx(cx) esi(state) edi(sp)
                            AR 4(edi) 8(esi) 12(ebx) 16(state) 20(ldi3) 24($stack9)
      xf7: xf guard_kshape2 -> pc=0x19df9a3 imacpc=0x0 sp+16 rp+0 (GuardID=013)
  00177dff42   cmp eax,61318
  00177dff47   jne   0x177eff4b


Both ecx and edx are null:

eax            0x0	0
ecx            0x0	0
edx            0x0	0
ebx            0x19aa200	26911232
esp            0xbfffba30	0xbfffba30
ebp            0xbfffba68	0xbfffba68
esi            0xbfffbae8	-1073759512
edi            0x1890a48	25758280
eip            0x177dff3d	0x177dff3d
eflags         0x10246	66118
cs             0x17	23
ss             0x1f	31
ds             0x1f	31
es             0x1f	31
fs             0x0	0
gs             0x37	55

dmandelin, gal: any clue what might be going on?
(Reporter)

Comment 23

8 years ago
(In reply to comment #22)
> The disassembly:
> 0x177dff15:    call   0x12b8d685
> <_ZL32nsIDOMDocument_GetElementById_tnP9JSContextP8JSObjectS2_P8JSString>
> 0x177dff1a:    sub    $0x8,%esp
> 0x177dff1d:    mov    %eax,%edx

Return value from nsIDOMDocument_GetElementById_tn is now in edx.

> 0x177dff1f:    mov    %edx,0x8(%edi)

Store return value to trace native stack.

> 0x177dff22:    movl   $0x0,0x1fc(%ebx)
> 0x177dff2c:    mov    0x4c(%esi),%eax
> 0x177dff2f:    test   %eax,%eax
> 0x177dff31:    jne    0x177eff33

Check for deep bail/failure?

> 0x177dff37:    mov    %edx,-0x18(%edi)
> 0x177dff3a:    mov    %edx,0x8(%edi)

Store return value to stack again (don't know why). Also store it to some local variable (probably the next action in the JS source).

> 0x177dff3d:    mov    (%edx),%ecx

Load obj->map; crash because obj == 0.

I think the problem is just a missing null check on the return value of the call. If the type during recording is an object (or string), then the tracer will assume it is always non-null (at trace run time). This is what it should do, but it needs a guard to verify that.
Note that the peacekeeper slowdown is not related to the custom tn's for WebGL; it only tests the 2D canvas, so it's probably just due to the autogenerated tn's.
(In reply to comment #23)
> I think the problem is just a missing null check on the return value of the
> call. If the type during recording is an object (or string), then the tracer
> will assume it is always non-null (at trace run time). This is what it should
> do, but it needs a guard to verify that.

I tried adding this to TraceRecorder::record_NativeCallComplete:

    } else if (v.isObjectOrNull() || v.isString()) {
        if (pendingSpecializedNative->builtin->returnType() == ARGTYPE_P) {
            guard(v.isNull(),
                  addName(lir->insEqP_0(v_ins), "guard(nullness)"),
                  BRANCH_EXIT);
        }

but then I crash in JS_OBJ_IS_FUN_IMPL:

#0  0x0018c3af in JS_OBJ_IS_FUN_IMPL (obj=0x0)
#1  0x00199d65 in js::Value::setNonFunObj (this=0x141560c0, arg=@0x0)
#2  0x002002bc in js::NativeToValue (cx=0x125f000, v=@0x141560c0, type=js::TT_OBJECT, slot=0x10bd050)
#3  0x002301fd in js::FlushNativeStackFrameVisitor::visitStackSlots (this=0xbfffb958, vp=0x141560c0, count=2, fp=0x14156038)
#4  0x00200a5b in js::VisitFrameSlots<js::FlushNativeStackFrameVisitor> (visitor=@0xbfffb958, cx=0x125f000, depth=0, i=@0xbfffb900, up=0x0)
#5  0x00200b32 in js::VisitStackSlots<js::FlushNativeStackFrameVisitor> (visitor=@0xbfffb958, cx=0x125f000, callDepth=0)
#6  0x00200ba1 in js::FlushNativeStackFrame (cx=0x125f000, callDepth=0, mp=0x1b0b5d8, np=0x10bd000, stopFrame=0x0, ignoreSlots=0)
#7  0x00201cbd in js::LeaveTree (tm=0x10afe88, state=@0xbfffbaf8, lr=0x1b0b584)

Comment 26

8 years ago
Do any of these quickstubs have a return type of JSObject* and return both function objects and non-function objects?  (That seems likely.)  If so, then (like the nullness guard) the tracer would need to guard that the returned object either was or was not a function object, rather like the nullness check you just added.

Another question, perhaps @ Dave or Andreas: is there any chance that, if we are adding these nullness/functionness checks for all specialized natives with return type ARGTYPE_P that we are going to be penalizing specialized natives which return ARGTYPE_P but don't need any check?  If so, is there a better way to only do these checks when necessary, perhaps introduce some new return type that is only used by these quickstubs?
No, I don't think any of the quickstubs return function objects.
Note that it looks like it's crashing because NativeToValue expects a JSObject (TT_OBJECT) but it gets null.

Comment 29

8 years ago
(In reply to comment #28)
Ah, I missed that, thanks.
(Reporter)

Comment 30

8 years ago
I think the latest crash is due to deep bails. The snapshot for the exit will say that the type is TT_OBJECT (for example, in the crash above) if that was the type at record time. But if there is a deep bail, then the return value may be NULL, which is not compatible with TT_OBJECT in ValueToNative. 

This problem existed before, where traceable natives could return a jsval: when they deep-bailed, they left the jsval on the stack. The type of that value was recorded as TT_JSVAL, which was then handled specially by LeaveTree.

So, I think we just need to arrange for these new values to be handled specially in the same way. The direct parallel would be to introduce new trace-types for string-or-null and nonfunobj-or-null, and then handle them in NativeToValue. It might also work to make the object and string cases accept null there, but that would lose us some assertion checking in the more normal case where nulls are not allowed.

Comment 31

8 years ago
I'm in favor of introducing new alternatives to TT_JSVAL and keeping the NativeToValue cases for the "normal" jsval types as regular as possible since this allows us to use faster boxing when the trace type falls in this range.
Can one of you guys fix that? I tried looking at how to do it (in TraceRecorder::snapshot?), but I'm a bit lost.
(Reporter)

Comment 33

8 years ago
(In reply to comment #32)
> Can one of you guys fix that? I tried looking at how to do it (in
> TraceRecorder::snapshot?), but I'm a bit lost.

Can you post your latest page to start from?
(Reporter)

Comment 34

8 years ago
(In reply to comment #33)
> (In reply to comment #32)
> > Can one of you guys fix that? I tried looking at how to do it (in
> > TraceRecorder::snapshot?), but I'm a bit lost.
> 
> Can you post your latest page to start from?

Oh, and also, for the crashes above, is there a specific test case, or is that just on startup?
> but that would lose us some assertion checking in the more normal case where
> nulls are not allowed

I'd think that null is allowed in all cases where a DOM method returns an object...
(Reporter)

Comment 36

8 years ago
(In reply to comment #35)
> > but that would lose us some assertion checking in the more normal case where
> > nulls are not allowed
> 
> I'd think that null is allowed in all cases where a DOM method returns an
> object...

I chose my wording hastily. By "normal case", I meant "operations other than calling traceable natives".
Created attachment 453119 [details] [diff] [review]
Allow null return value

Actually, this seems to work, I have no idea if this is the right approach (flag on the traceable native). Of course, it seems Luke just deleted the TraceType_ enum so this needs to be merged :-).
Created attachment 453120 [details] [diff] [review]
Fix quickstubs

With attachment 453119 [details] [diff] [review] and this one I can start up a browser without crashing. Haven't done any testing beyond that.
Attachment #452276 - Attachment is obsolete: true
(In reply to comment #34)
> Oh, and also, for the crashes above, is there a specific test case, or is that
> just on startup?

The crashes were just on startup.
(Reporter)

Comment 40

8 years ago
I'm going to try to merge this today, unless you tell me not to.
Go ahead.
(Reporter)

Comment 42

8 years ago
Created attachment 453273 [details] [diff] [review]
Rebased patch with a fix or two

This combines the previous two patches and rebases them. It crashes for me on the jQuery Mochitest (but passes the first 60000, at least).
Created attachment 453413 [details] [diff] [review]
Rebased patch with more fixes

Attachment 453273 [details] [diff] didn't work for me, but this one does. It inserts the guard for objects too (the previous patch tested |flags & (JSTN_RETURN_NULLABLE_STR | JSTN_RETURN_NULLABLE_STR)|) and makes quickstubs use STRING_OR_NULL as a return type. In my local build this passes the JQuery tests in dom/tests.
Attachment #453119 - Attachment is obsolete: true
Attachment #453120 - Attachment is obsolete: true
(Reporter)

Comment 44

8 years ago
(In reply to comment #43)
> Created an attachment (id=453413) [details]
> Rebased patch with more fixes
> 
> Attachment 453273 [details] [diff] didn't work for me, but this one does. It inserts the guard
> for objects too (the previous patch tested |flags & (JSTN_RETURN_NULLABLE_STR |
> JSTN_RETURN_NULLABLE_STR)|) and makes quickstubs use STRING_OR_NULL as a return
> type. In my local build this passes the JQuery tests in dom/tests.

Nice catches. This passes trace-tests, jstests, and mochitest-plain, so I think it's basically ready to land, but I am going to do an opt build and make sure it gets us our perf.
(Reporter)

Comment 45

8 years ago
OK. This brings our Peacekeeper canvas perf back up, and our overall score might even be up a bit too (the canvas tests are excluded from the total, but maybe some other ones benefit from traceable natives). 

Thanks for all your help, Peter. It was fun tag-teaming the patch over the timezones.

http://hg.mozilla.org/users/lwagner_mozilla.com/fatval/rev/fbc857fbd43c
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
(Reporter)

Updated

8 years ago
No longer depends on: 572798
You need to log in before you can comment on or make changes to this bug.