Closed Bug 463238 Opened 13 years ago Closed 13 years ago

TM: Support calling arbitrary JSFastNatives from trace

Categories

(Core :: JavaScript Engine, defect, P1)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1b4

People

(Reporter: jorendorff, Assigned: gal)

References

Details

(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 7 obsolete files)

Bug 462027 aims to make it safe to call arbitrary native code from trace.

Once it's safe, we should trace fast natives.  (And then getters and setters--separate bug--and then we can check to see if we're tracing the DOM already or what.)
I need someplace to build the vp vector.

My (vague) plan is to put it on the JS stack, exactly where it would be if we were in the interpreter.  As part of bug 462021, I'll pre-allocate all the JS stack space I could possibly need in js_ExecuteTree.
Depends on: 463153
Here's what's left, to finish this.

1. I need to emit code to calculate, when calling a JSFastNative, the address in the interpreter stack that corresponds to the location of the arguments on the native stack. (It doesn't sound hard to me, but a very similar task took me days to get right in bug 462027. So.)

2. When we deep-bail from a JSFastNative, we LeaveTree in pre-call state. But we need to *not* clobber the interpreter stack locations corresponding to vp with values from the native stack. The native may have changed those values.[*] My plan for now is to add another fake tag next to JSVAL_BOXED and JSVAL_TNULL. (My theory is that we are not actually limited to 8 typemap values, but I haven't looked closely.)

[*] It is in fact totally normal to do this; see e.g. https://developer.mozilla.org/En/SpiderMonkey/JSAPI_Reference/JS_ConvertArguments .
Task #2 was easy, but task #1 looks harder now.  I can emit a constant for the offset of the vp array in the *current tree's* stack, but since we might be in a nested tree, we have to calculate the total offset from the callstack--information that is only available at run time.  :-P
Do we want this feature?

We can instead modify qsgen.py to generate traceable fast natives.  That would run faster, too.
This is preferable for non-Firefox embedders, though, is it not?  Not that that consideration should trump expediency....
Andreas and I were talking about a generic traceable native dispatcher that could call any fast native, boxing as needed and assuming fallibility. Better coverage, seems doable. Any reason why not?

/be
Making qsgen.py emit real FASTCALL versions of the quick stubs would avoid a lot of unnecessary boxing and unboxing at run time.  It might be significantly faster, or not.  It's less general but probably general enough for our purposes, and it's closer to what we *really* want to do which is emit those FASTCALL functions using NanoJIT.

In comments 1-3 I was trying too hard to preserve the invariant that when we're not on trace, the interpreter state is exactly what it would be if you had never gone on trace.  The way you describe in comment 6 will, in the case of a deep bail, leave the vp vector (part of the interpreter stack) in the pre-call state (see comment 2 item #2).  And that's OK.
I agree it's ok (making sure I'm following here, correct me if I'm confused) to let a native canonicalize args, store conversions in local argv roots, maybe even compute a fortune cookie and overwrite an incoming arg with it. Caveat native, and I don't know of any bad fortune cookie scenarios. Canonicalization and local GC rooting of conversions should happen once, the effects should "stick".

Hope this is what you meant.

/be
I don't know that it's worth puzzling out what I meant, beyond "gee, I wish I had thought of that a month or two ago".

Anyway, if I can make getters and setters work, I'll go after this as Brendan suggests in comment 6.  Note that it is not exactly how Ben Turner, Andreas, and I decided to go about it; but it might be easier.
#6 was intended as the fallback if no fast path is available. It might make sense to implement it first though, and then generate the FASTCALL versions.
We should have a generic fallback, and then make type-specific, method-specific traceable natives as needed. Otherwise the coverage problem will dog us forever, by which I mean we'll find an indefinite supply of benchmarks where we fall off trace and underperform, compared to falling back on a generic native dispatcher.

/be
This bug will do 2 things

1) Generate some straight forward code on trace to call arbitrary JSFastNatives as a fallback if we don't find a matching traceable native.

2) Remove all *_tn implementations that are not clearly a performance win. For example, if a FASTCALL implementation merely wraps an existing fast native right now, its likely not a perf win and we can generate that wrapper code on trace instead.
Assignee: general → gal
Flags: blocking1.9.1?
Priority: -- → P1
Summary: TM: Call JSFastNatives from trace → TM: Support calling arbitrary JSFastNatives from trace
Attached patch v1 (obsolete) — Splinter Review
This patch reorganizes the code a bit so we can add the generic JSFastNative argument marshaling code.
Attached patch v2 (obsolete) — Splinter Review
x.js:

for (var i = 0; i < 10; ++i)
    print(5.5);

whale:src gal$ TRACEMONKEY=stats ./Darwin_DBG.OBJ/js -j x.js
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
5.5
recorder: started(1), aborted(0), completed(1), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
monitor: triggered(1), exits(1), type mismatch(0), global mismatch(0)

Generated code:

              mov eax,152(ebx)                ebx(cx) esi(sp) edi(state)
              mov eax,60(eax)                 eax(ld2) ebx(cx) esi(sp) edi(state)
              mov 0(esi),2737528              eax(ld3) ebx(cx) esi(sp) edi(state)
              mov 8(esi),eax                  eax(ld3) ebx(cx) esi(sp) edi(state)
              mov -16(ebp),0                  eax(ld3) ebx(cx) esi(sp) edi(state)
              mov -12(ebp),1075183616         eax(ld3) ebx(cx) esi(sp) edi(state)
              movq xmm0,-16(ebp)              eax(ld3) ebx(cx) esi(sp) edi(state)
              mov 16(esi),0                   eax(ld3) ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              mov 20(esi),1075183616          eax(ld3) ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              mov -16(ebp),2737528            eax(ld3) ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              mov -12(ebp),eax                eax(ld3) ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              sub esp,8                       ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              sub esp,8                       ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              movq 0(esp),xmm0                ebx(cx) esi(sp) edi(state) xmm0(#0x40160000:0)
              mov ecx,ebx                     ebx(cx) esi(sp) edi(state)
              call js_BoxDouble               ebx(cx) esi(sp) edi(state)
              add esp,8                       ebx(cx) esi(sp) edi(state)
              mov ecx,eax                     eax(js_BoxDouble1) ebx(cx) esi(sp) edi(state)
              lea eax,-16(ebp)                ebx(cx) esi(sp) edi(state)
              cmp ecx,16                      eax(alloc1) ecx(js_BoxDouble1) ebx(cx) esi(sp) edi(state)
              je 0x2a7fcd                     eax(alloc1) ecx(js_BoxDouble1) ebx(cx) esi(sp) edi(state)
              mov -8(ebp),ecx                 eax(alloc1) ecx(js_BoxDouble1) ebx(cx) esi(sp) edi(state)
              mov 480(ebx),2490708            eax(alloc1) ebx(cx) esi(sp) edi(state)
              sub esp,4                       eax(alloc1) ebx(cx) esi(sp) edi(state)
              push eax                        eax(alloc1) ebx(cx) esi(sp) edi(state)
              push 1                          ebx(cx) esi(sp) edi(state)
              push ebx                        ebx(cx) esi(sp) edi(state)
              call JSFastNative               ebx(cx) esi(sp) edi(state)
              add esp,16                      ebx(cx) esi(sp) edi(state)
              mov ecx,edi                     ebx(cx) esi(sp) edi(state)
              mov 0(esi),eax                  eax(JSFastNative1) ecx(state) ebx(cx) esi(sp)
              mov 480(ebx),0                  ecx(state) ebx(cx) esi(sp)
              mov ebx,484(ebx)                ecx(state) ebx(cx) esi(sp)
              test ebx,ebx                    ecx(state) ebx(ld5) esi(sp)
              jne 0x2a7fdc                    ecx(state) esi(sp)

We crash somewhere half-way through trace-tests. The patch is by no means complete, but it definitively works as advertised and looks very promising.

We will have to compare performance between fastcall quickstubs and marshaling with this patch. It might still be worth it. On the other hand only doubles are really expensive to box, and that DOM always never needs.

This patch does not yet remove any existing _tn wrappers. I will do that in a separate patch once everything is done and landed, and only if I can prove that its not a perf loss (I suspect it might be a small perf win, actually).
Attachment #367964 - Attachment is obsolete: true
Note: I only intend to remove _tn wrappers that are not actually a fast path and were merely added "for the sake of staying on trace". That we can much better with this patch. The goal is to only code special cases for frequently executed relevant fast paths. Everything else we fall back on this.

Note 2: Everything except constructors and eval are fast natives, so this really traces almost everything we possibly care about. Constructors still need work.
Attached patch v3 (obsolete) — Splinter Review
v3 adds proper unboxing of the return value and the boolean return value of the fast native is propagated into builtinStatus.

This runs SS, and a couple other test cases but still dies in trace-tests.js.

Beyond that we also need some further work on:
- converting Object arguments into strings using an imacro where needed (some string functions need this, we can't do this from inside the native on trace)
- deal with the funny String object vs JSString business. This is simply untested but shouldn't be hard.

More work on this tomorrow.
Attachment #367974 - Attachment is obsolete: true
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Sayrer is right. Not blocking 1.9.1. I meant to ask for wanted 1.9.1, which is what I got.
tracking-fennec: --- → ?
The patch dies on this code:

function testObjectOrderedCmp()
{
    var a = new Array(5);
    for(var i=0;i<5;++i) a[i] = ({} < {});
    return a.join(",");
}
testObjectOrderedCmp();

whale:src gal$ ./Darwin_DBG.OBJ/js -j x.js
x.js:4: TypeError: Function.prototype.toString called on incompatible object
whale:src gal$
Attached patch v4 (obsolete) — Splinter Review
Passes trace-tests.
Attachment #367980 - Attachment is obsolete: true
Comment on attachment 368166 [details] [diff] [review]
v4

So this breaks at least one page:

http://www.chromeexperiments.com/hosted/gravity/index.html

Mousing over the page triggers the gravity effect without this patch but there are lots of 'undefined function' errors in the console when this patch is applied.
For what it's worth, Ben Turner is working hard on tracing DOM access:

https://bugzilla.mozilla.org/show_bug.cgi?id=trace-into-dom

and is going through contortions to get the right #defines necessary to include jsbuiltins.h outside of js/src.  So a proper solution here would be appreciated and useful immediately.
Attached patch refreshed (obsolete) — Splinter Review
Attachment #368166 - Attachment is obsolete: true
Attached patch v5, fix return value processing (obsolete) — Splinter Review
Attachment #368343 - Attachment is obsolete: true
Duplicate of this bug: 478526
Comment on attachment 368422 [details] [diff] [review]
v5, fix return value processing

Passes all mochitests!
Passes mochi tests. The traceable quickstubs are already in, so we already have the same bug surface for the DOM side. This patch adds full support for all traceable natives. The upside is no more perf nightmares due to missing natives in the engine. The downside will be mostly limited to engine builtins going wrong as DOM going wrong the traceable quickstubs already bring into play. In other words I think we should take this for b4.

Outstanding issues:
- trace setters/getters (WIP, easier than this, shouldn't increase risk since the general risk is calling all this stuff in general, a bit more due to setters/getters shouldn't create more hazards)
- avoid side-exiting by doing object2string and defaultvalue for object arguments early (not a priority I think, I don't see many aborts due to this)
Flags: blocking1.9.1- → blocking1.9.1?
Target Milestone: --- → mozilla1.9.1b4
Comment on attachment 368422 [details] [diff] [review]
v5, fix return value processing

FYI in order to build browser this patch needs to remove JSTraceableNative from jsprvtd.h as it was added to jspubtd.h
Attachment #368422 - Attachment is obsolete: true
Attachment #368627 - Flags: review?(brendan)
Attachment #368455 - Attachment is obsolete: true
Attachment #368629 - Flags: review?(brendan)
Comment on attachment 368627 [details] [diff] [review]
refreshed fast native patch

> #define JS_FS(name,call,nargs,flags,extra)                                    \
>-    {name, call, nargs, flags, extra}
>+    {name, call, nargs, flags, extra }

Undo gratuitous change.

>+typedef struct JSCallInfo {
>+    JSWord callinfo[2];
>+#ifdef DEBUG
>+    JSWord debug;
>+#endif
>+} JSCallInfo;

s/JSWord/jsword/g -- don't use JSWord, JSInt32, JSUint64, etc.

Could try to use same as Nanojit here: uintptr_t?

>+    this->generatedTraceableNative = new JSTraceableNative();

Why does this need to be new'ed and delete'd? Why not a struct member?

>+      case FAIL_NEG:
>+      {
>+        res_ins = lir->ins1(LIR_i2f, res_ins);
>+        guard(false, lir->ins2(LIR_flt, res_ins, lir->insImmq(0)), OOM_EXIT);
>+        break;
>+      }

Now that this code moves and starts new blame history, please remove the unnecessary braces.

>+    jsval& fval = stackval(0 - (2 + argc));
>+    jsval& tval = stackval(0 - (argc + 1));

0 - (1 + argc) to match style above and previously.

>+TraceRecorder::callNative(JSFunction* fun, uintN argc, bool constructing)
>+{
>+    if (fun->flags & JSFUN_TRACEABLE)
>+        if (callTraceableNative(fun, argc, constructing))
>+            return true;

Brace multi-line then (outer then).

>+    alloca_ins = lir->insAlloc((2 + argc) * sizeof(jsval));

Nit: is alloca name appropriate here? args_ins? argv_ins? vp_ins? Maybe invokevp_ins.

>+    CallInfo* ci = (CallInfo*)lir->skip(sizeof(struct CallInfo))->payload();

Nit: space after cast for readability.

>+    jsval& tval = stackval(0 - (argc + 1));

0 - (1 + argc)

>+    JS_ASSERT(*cx->fp->regs->pc == JSOP_CALL ||
>+              *cx->fp->regs->pc == JSOP_APPLY);

Is js_GetOpcode required here? I guess so. Elsewhere to?

/be
Comment on attachment 368629 [details] [diff] [review]
remove some traceable natives from jsstr.cpp

Wunderbar!

/be
Attachment #368629 - Flags: review?(brendan) → review+
(In reply to comment #31)
> >+    this->generatedTraceableNative = new JSTraceableNative();
> 
> Why does this need to be new'ed and delete'd? Why not a struct member?

This comment was overlooked, waahhh.

/be
Was not overlooked. Include dependencies. Callinfo not visible can only use the pointer in .h
That's fine: JSCallInfo has only word members, but these can hold pointers into the TraceRecorder at the generatedTraceableNative member-struct.

My point is that there's no cause for dynamic allocation of store pointed at by the current pointer typed member of that same name.

/be
Comment on attachment 368627 [details] [diff] [review]
refreshed fast native patch

Thought I r+'ed this already.

/be
Attachment #368627 - Flags: review?(brendan) → review+
Depends on: 484773
http://hg.mozilla.org/mozilla-central/rev/1c6be1c210b9
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: blocking1.9.1? → blocking1.9.1+
Resolution: --- → FIXED
I think this has a GC hazard.  The JSFastNative being called can modify vp in place, deep-bail, and then GC.
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.