Closed Bug 385393 (jsfastnative) Opened 14 years ago Closed 14 years ago

array.push(b) slower than a[a.length] = b;

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9alpha8

People

(Reporter: moco, Assigned: brendan)

References

Details

(Keywords: perf)

Attachments

(6 files, 37 obsolete files)

812 bytes, patch
Details | Diff | Splinter Review
6.82 KB, text/plain
Details
449.50 KB, patch
brendan
: review+
Details | Diff | Splinter Review
16.81 KB, patch
Details | Diff | Splinter Review
3.07 KB, text/plain
Details
2.87 KB, text/plain
Details
a.push(b) slower than a[a.length] = b;

While working on a places performance bug (#385245), I noticed a measurable gain when in our treeView.js when I did:

-      aVisible.push(curChild);
+      aVisible[aVisible.length] = curChild;

The gain was noticed when doing this in a loop for about 10,000 history visits.  (Note, this issue is *not* the cause of the perf problem with the history sidebar.)

I was using "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070620 Minefield/3.0a6pre"

I did some googling, and found this:

http://dev.opera.com/articles/view/efficient-javascript/?page=2#primitiveoperator

What is true for opera here is true for firefox.

Question for the javascript guys:  is there anything we could do to make a.push(b) be as faster for a[a.length]= b?

Question for john:  perhaps we could figure out parallel document to http://dev.opera.com/articles/view/efficient-javascript/ for firefox?)
Status: NEW → ASSIGNED
This bug should not have changed status to ASSIGNED from NEW without a real human taking assignment. I'll do that, rather than try to coax bugzilla into reverting the status.

/be
Assignee: general → brendan
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 14 years ago
Priority: -- → P3
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta1
Don't think this was intended to be resolved fixed?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Mmm, bugzilla fun! "Reassign to default assignee..." does a nice job of resetting many things.
Assignee: brendan → general
Status: REOPENED → NEW
I'll take ownership here and submit a performance testcase.  My initial reaction is that we have to follow the spec, which details exactly which steps to take for both methods of adding elements to Arrays (|array_push()| and |js_SetProperty()|).

Considering that Opera is telling developers to use the quicker method rather than improving their own implementation, I would expect that we also won't be able to stay within spec AND improve the performance of .push().
Status: NEW → ASSIGNED
(I think this is what Gabriel meant to do)
Assignee: general → gabrielsjoberg
Status: ASSIGNED → NEW
(In reply to comment #4)
> I'll take ownership here and submit a performance testcase.  My initial
> reaction is that we have to follow the spec, which details exactly which steps
> to take for both methods of adding elements to Arrays (|array_push()| and
> |js_SetProperty()|).

Thanks, the perf testcase(s) would be great. They don't require you to take the bug, however.

> Considering that Opera is telling developers to use the quicker method rather
> than improving their own implementation, I would expect that we also won't be
> able to stay within spec AND improve the performance of .push().

Not so. I did mean to take this, and I believe I can make push perform within epsilon of the length approach, while at the same time complying with E262-3.

/be
Assignee: gabrielsjoberg → brendan
Status: NEW → ASSIGNED
Here's a patch to benchmarks/array_bench.js

The existing benchmarks 5 and 6 show the performance of |array_addProperty()|.  This new benchmark shows that |push()| is more than an order of magnitude slower:

array_bench
array_1 : 201
array_2 : 201
array_3 : 193
array_3 : 194
array_5 : 36
array_6 : 36
array_7 : 453
This uncovered so old, bogusly general code in jsarray.c. In patching this bug I changed how the 'length' property, which appears to be a direct property of every Array instance, is implemented. It's now a JSPROP_SHARED | JSPROP_PERMANENT prototype property backed by the private slot's storage in each instance. This enables the JSOP_TRYARRAYPUSH code to make some fast moves.

Amazingly enough, no net size change to -Os build on Mac OS X (verified that .o sizes changed, but differences canceled out).

/be
Attachment #269930 - Flags: review?(igor)
My testing shows this patch results in about as fast if not faster performance for a.push(x) than a[a.length] = x.

/be
Attachment #269930 - Attachment is obsolete: true
Attachment #269960 - Flags: review?(igor)
Attachment #269930 - Flags: review?(igor)
js> function pow2(n){n = 1 << n; for (i=0; i<n; i++) yield i}
js> a = [i for (i in pow2(25))]                               

found this. It now fails with

typein:4: InternalError: array initialiser too large

/be
Attachment #269960 - Attachment is obsolete: true
Attachment #269963 - Flags: review?(igor)
Attachment #269960 - Flags: review?(igor)
Comment on attachment 269963 [details] [diff] [review]
patch v1b: must enforce length limit in JSOP_ARRAYPUSH

An alternative patch would limit array comprehension length at runtime to some greater power of two than 1<<24, say 1<<30 (what fits in a jsval). But is it worth the extra limit and error number/message? Powers of two, DOS abuses, ....

/be
Comment on attachment 269963 [details] [diff] [review]
patch v1b: must enforce length limit in JSOP_ARRAYPUSH

>+          BEGIN_CASE(JSOP_TRYARRAYPUSH)
>+            len = JSOP_TRYARRAYPUSH_LENGTH;
>+            JS_ASSERT(pc[len] == JSOP_CALL);
>+
>+            /*
>+             * The guts of this opcode are guarded by a series of tests for
>+             * the common case of a.push(x) where a is an Array instance and
>+             * a.push is indeed the native method originally available via
>+             * Array.prototype.push.
>+             */
>+            lval = FETCH_OPND(-3);
>+            if (VALUE_IS_FUNCTION(cx, lval)) {
>+                fun = (JSFunction *) JS_GetPrivate(cx, JSVAL_TO_OBJECT(lval));
>+                if (fun->u.n.native == js_array_push) {
>+                    obj = JSVAL_TO_OBJECT(FETCH_OPND(-2));
>+                    if (OBJ_GET_CLASS(cx, obj) == &js_ArrayClass) {

Is it really worth to have a separated bytecode for array.push? As  alternative one can move the check for js_ArrayClass and the optimization to js_array_push while making sure that for a specially marked native methods JSOP_CALL avoids the overhead of js_Invoke call with that argument allocation and frame construction. In this way not only array_push would benefit.
I've been thinking about this for a while, not only in connection with push, but also for document.getElementById, and in general for the Tamarin future where we hope to use trace-tree-based JITting, which inlines hot paths, even including polymorphic call site "dispatching" logic to make sure the right path is taken for the receiver type.

It could be that push is not worth a bytecode. By now there's a slight code size increase due to the error case added to JSOP_ARRAYPUSH (and I was measuring only on Mac OS X with gcc -Os). I will try to generalize the idea a bit.

The insight, which I know Igor has too, is that conditionally inlining a.push(x) as a[a.length] = x if it's correct to do so, is something that comes for free when you write array and most other methods in JS itself, and you run such self-hosted code on an optimizing/inlining VM of some sort.

Indeed I've heard from ECMA TG1 colleagues at Opera that their self-hosted methods actually ran faster compared to native counterparts. Their bytecoding is lower-level and more peephole-optimized than ours, from what I can tell.

Parting shot: optimizing native dispatch to run frameless, and optimizing array_push to have an early special case for obj an Array, still won't compete with a[a.length] = x instead of a.push(x). And it's scary to think about setting the "I'm leafy-green native, call me without a frame" flag on more natives. I was terrified to learn that even document.getElementById can flush buffered content through layout, firing synchronous mutation events, committing effects! Some array natives that can run a while now have conditional branch callback invocations.... We need to proceed with caution here.

More in a bit.

/be
In my last comment, I meant to cite bug 365851 and bug 353235.

/be
(In reply to comment #14)
> Parting shot: optimizing native dispatch to run frameless, and optimizing
> array_push to have an early special case for obj an Array, still won't compete
> with a[a.length] = x instead of a.push(x).

If optimizing a.push is really that important, then the whole a.push(i) should be replaced by single bytecode to perform the whole a.push(i) in one go. Still I  suspect that the main reason for slow a.push is the frame creation overhead so I will try to cook a patch just to measure the overhead.

> And it's scary to think about
> setting the "I'm leafy-green native, call me without a frame" flag on more
> natives.

It depends on the definition of the leafy-green. If it is indeed green meaning something-that-can-be-coded-in-js-but-not-done-for-performance-reasons, then why not? In particular, I do not see problems with Array.prototype methods being green.

> Some array natives that can run a while now have conditional branch
> callback invocations....

But why it is a problem?
I have a significant patch in progress, will attach later today. It's based on these three ideas:

1. The emitter can speculate based on method name with pretty good accuracy -- "a.push(x)" is likely a call to Array.prototype.push, and likely a is an Array.

2. JSStackFrame allocation and initialization costs a lot and should be avoided for many natives.

3. If a native is called with the expected type of |this| and the expected (minimum) number of arguments, it can both be dispatched (see 2 but there's more here than just frame elimination) and implemented in much faster form.

(In reply to comment #16)
> If optimizing a.push is really that important, then the whole a.push(i) should
> be replaced by single bytecode to perform the whole a.push(i) in one go.

That's what the patch does. You can't do better without knowing at compile time the type of the object on which push is being called, and being sure it can't change at runtime.

> Still
> I  suspect that the main reason for slow a.push is the frame creation overhead
> so I will try to cook a patch just to measure the overhead.

Indeed, frame creation is costly and it's a big cost, if not the dominant cost.

> > And it's scary to think about
> > setting the "I'm leafy-green native, call me without a frame" flag on more
> > natives.
> 
> It depends on the definition of the leafy-green. If it is indeed green meaning
> something-that-can-be-coded-in-js-but-not-done-for-performance-reasons, then
> why not? In particular, I do not see problems with Array.prototype methods
> being green.

I meant green as in a leaf in the control flow graph, not ever calling back into the interpreter (and possibly needing a frame for eval, debugging, or security reasons).

> > Some array natives that can run a while now have conditional branch
> > callback invocations....
> 
> But why it is a problem?

If the branch callback does something that walks the stack (a security check from a finalizer?) it might want to see a frame for the native.

/be
Working patch. Progress so far, I got tired around jsstr.c:

$ grep JS_FN *.c | wc -l
      55
$ grep JS_FS *.c | wc -l
     217

Not all native methods in SpiderMonkey can become JSFastNatives, of course, but there are quite a few that do not test argc or use cx->fp expecting to find their own JSStackFrame.

/be
Notes to myself on reviewing the patch:

Leak js_FastNativeTable if rt can't be allocated in JS_NewRuntime -- no, leak it always. Fix: JS_ShutDown frees and resets before_first_new_runtime.

A native can be JSFastNative and still query argc via a JS_GetFastNativeArgCount API. This must test JSFRAME_IN_FAST_CALL to use fp->sp - (vp + 2) to compute argc, else use fp->argc. The only reason to-do with argc that you have to write a slow native is if you require extra argv-based local GC roots. This says that matching fast native to slow via JSOP_TRYFASTNATIVE should also allow matching faster to fast (i.e., both the generic and the specialized entry points for a given method should be able to have JSFastNative signature).

jsapi.h comment on JSFastNativeSpec/Block should talk more about when to use JS_RegisterFastNatives -- viz, when the generic/extractable native must be slow because generic or variadic.

Use STOBJ_GET_SLOT instead of obj->fslots[JSSLOT_ARRAY_LENGTH] in anticipation of write barrier (with matching read barrier for consistency checking in DEBUG builds).

Benchmark array push faster vs. fast natives once JSOP_TRYFASTNATIVE can match JSFastNative, not just JSNative, to see whether it is prematurely optimizing. It still can win if generic this and non-int/overflowing arguments cost enough to allow a "faster than fast" specialization.

s/JSFunctionSpec pointer/JSFastNativeSpec pointer/ in the comment in jscntxt.c.

date_getTime etc. should be fast natives

fun_to{Source,String} s.b. fast natives. Can fun_{apply,call} be also? Look out for dependencies elsewhere on their currently having a frame.

Attach diff -w for jsinterp.c to avoid showing noise-changes due to indenting the inline_call: code.

num_parseInt, num_to{Source,String} s.b. fast natives

Change bool_ and num_ primitive-this (vp[1]) testing to use JSVAL_IS_OBJECT instead of !JSVAL_IS_{BOOLEAN,NUMBER} for tigher code.

js_obj_to{Source,String,LocaleString} and obj_propertyIsEnumerable should be fast natives

The regexp_methods entries should all be JS_FN.

Same for the JS_FS script_methods.

Same for most JS_FS uses in jsstr.c (beware replace, possibly others).

Bug: js_fast_str_slice should test begin <= end of course. Also, is it really the case that Firefox startup uses single-arg slice and never two-arg slice?

Same for most JS_FS uses in jsxml.c (beware those needing a frame -- are there any such?).
More note-to-self noise:

I'll probably name the new API JS_GetArgumentCount and make it insist on the top frame being native or fast-native (which is to say, an unrelated frame with the JSFRAME_IN_FAST_CALL flag set).

/be
About JS_GetArgumentCount: why not to pass the number of arguments to fast native directly? Given that many potentially fast calls accepts variable number of arguments it seems wrong to close a fast option for them.

Also currently the patch AFAICS does not reserve the extra slots for missing argumnets for a fast native call. Without the extra argc parameter this is a must to do as otherwise each and every native would end up calling JS_GetArgumentCount.

> More note-to-self noise:
> 
> I'll probably name the new API JS_GetArgumentCount and make it insist on the
> top frame being native or fast-native (which is to say, an unrelated frame with
> the JSFRAME_IN_FAST_CALL flag set).

(In reply to comment #21)
> About JS_GetArgumentCount: why not to pass the number of arguments to fast
> native directly? Given that many potentially fast calls accepts variable number
> of arguments it seems wrong to close a fast option for them.

The question is how many? Most JS methods have fixed arity (assertion based on memory over the years).

> Also currently the patch AFAICS does not reserve the extra slots for missing
> argumnets for a fast native call.

Oh, of course -- the minimum number of expected args must be passed. That call site (in JSOP_CALL) was added while I was sleepy :-). Thanks for pointing it out.

/be
The cases for fast natives are: 1) a fixed number m of arguments, 2) a variable number between m and n, and 3) variable from m without bound. Case 1 is common. Case 2 less so. Case 3 even less so. This is based on

$ grep JS_F[NS] *.c | wc -l
     272
$ grep 'if (argc [!=>]* [0-9]' *.c | wc -l
      78
$ grep 'i < argc' *.c | grep 'for (' | wc -l
      22

I can try to measure dynamic frequency, but I suspect static is good enough. Static at least can help assess code size cost of not passing argc to the new JSFastNative signature.

My design center for JSFastNative is the at most nearly 200 methods that fall in case 1 and 2. The case 3 methods need to call JS_GetArgumentCount. I'm also going to try to speed up XPConnect and DOM method dispatch, so I'll try to add up case 3 methods vs. 1 and 2 for those kinds of methods too.

/be
Sorry for unclear comments. To be precise, JSFastNative can call JS_GetArgumentCount and learn only how many actuals were passed if >= nargs, otherwise it will get nargs (from the JSFunctionSpec) and have to interpret the trailing undefined values passed by the interpreter to satisfy nargs.

/be
Getting closer to right. Still have to finish jsstr.c and jsxml.c conversion. Going through jsdate.c found some code bloat and redundant instanceof tests, which I fixed. Hope to finish tomorrow.

/be
Attachment #271005 - Attachment is obsolete: true
Attachment #271188 - Attachment is obsolete: true
Attached patch checkpoint (obsolete) — Splinter Review
9 tests failing (with -L lc2 lc3 spidermonkey-n.tests slow-n.test), probably due to the same underlying problem. More later tonight.

/be
Attachment #271314 - Attachment is obsolete: true
Attachment #271392 - Attachment is obsolete: true
Attached patch oops, correct patch (obsolete) — Splinter Review
Attachment #271489 - Attachment is obsolete: true
That regressed performance measured by this bug's testcase slightly. Now with this patch, a[a.length] = x and a.push(x) seem about the same speed again (noisy on my Macbook Pro, but best of three for a.push(x) was 101 just now, 109 for the length based alternative).

/be
Attachment #271491 - Attachment is obsolete: true
Depends on: 387501
Comment on attachment 271534 [details] [diff] [review]
remote js_ComputeThis from JSOP_TRYFASTNATIVE

This is ready for review.

/be
Attachment #271534 - Flags: review?(igor)
Attachment #271534 - Flags: review?(mrbkap)
In the patch the only significant difference between JSOP_TRYFASTNATIVE and ordinary call is that the former avoids js_ComputeThis call which is done by the latter. But why that optimization from TRYFASTNATIVE can not be applied to the generic call?

Also given that ComputThis is expensive, perhaps some ifs there can be avoided if it is called when "this" is pushed on the stack, not during call. AFAICS side effects during argument calculations can not affect the result. For example, in CALLPROP and CALLNAME it is known that "this" is non-null object.

Another thought is that JS_GetPrivate(cx, JSVAL_TO_OBJECT(lval)) means an extra call for what is essentially obj->fslot[PRIVATE]. Avoiding it may make things even faster.
Comment on attachment 271534 [details] [diff] [review]
remote js_ComputeThis from JSOP_TRYFASTNATIVE

Here is a couple of quick notes:

>Index: jsopcode.tbl
>===================================================================
> 
> /* More exception handling ops. */
> OPDEF(JSOP_EXCEPTION, 116,"exception",  NULL,         1,  0,  1,  0,  JOF_BYTE)
>-OPDEF(JSOP_UNUSED117, 117,"",           NULL,         0,  0,  0,  0,  0)
>+
>+/*
>+ * Optimize array.push while coping with an unrelated 'push' method called with
>+ * one argument on an arbitrary object. This is a prefix to JSOP_CALL, which if
>+ * it can optimize the push, skips the JSOP_CALL.
>+ */
>+OPDEF(JSOP_TRYFASTNATIVE,117,"tryfastnative",NULL,    3,  0,  0,  0,  JOF_UINT16)

The comments is wrong.
 
>Index: jsstr.c
>===================================================================
> void
> js_FinishRuntimeStringState(JSContext *cx)
> {
>     JSRuntime *rt = cx->runtime;
> 
>     js_UnlockGCThingRT(rt, rt->emptyString);
>     rt->emptyString = NULL;
>+
>+    if (rt->unitStrings) {
>+        jschar c;
>+
>+        for (c = 0; c < UNIT_STRING_LIMIT; c++) {
>+            if (rt->unitStrings[c])
>+                js_UnlockGCThingRT(rt, rt->unitStrings[c]);
>+        }
>+        free(rt->unitStrings);
>+        rt->unitStrings = NULL;
>+    }
> }

Here is an alternative for js_LockGCThingRT/js_UnlockGCThingRT that should also save memory. The idea is to allocate a single jschar array for bodies of one-char strings and patch GC to check for a pointer that fits it and avoid GC-ing the string if so.
Attachment #271534 - Attachment is obsolete: true
Attachment #271534 - Flags: review?(mrbkap)
Attachment #271534 - Flags: review?(igor)
Attachment #271706 - Flags: review?(igor)
Attachment #271706 - Flags: review?(mrbkap)
Here is another alternative for tryfastnative. 

The interpreter can use a cache indexed by few lower bits of pc to record mapping from jsval into JSFunction* for the last executed fast native function. Assuming that the call site executes mostly the same function object this should result in getting JSFunction with single if. The cache can also store the value of the last "this" to avoid ComputeThis call if the object remains the same.

When the cache works, the schema would be able to execute a fast native call after 2 above checks plus a check for number of arguments. This is better than the number of checks that tryfastnative from the patch does. 
Yeah, the callsites are mostly monomorphic for natives; see data gathered for bug 365851. This is the ticket. New patch in a bit, I'm having Macbook problems....

/be
(In reply to comment #35)
> The interpreter can use a cache indexed by few lower bits of pc to record
> mapping from jsval into JSFunction* for the last executed fast native function.

Need JSObject * to handle cloned function objects which have different scope chains (this also gets to the security point alluded to earlier in the bug: we have to be careful not to go "up" to the JSFunction, then "down" or "back" via fun->object to the wrong funobj).

> Assuming that the call site executes mostly the same function object this
> should result in getting JSFunction with single if. The cache can also store
> the value of the last "this" to avoid ComputeThis call if the object remains
> the same.

The data from bug 365851 showed ~46% native monomorphic, probably because of different XPConnect wrapper function objects for the same underlying native (I need to confirm). With the JSFastNativeSpec/JSFastNativeBlock API I'm trying to enable "faster than light" specializations that do not need to go through heavy XPConnect argument conversion, wrapper creation, result conversion, and security checking layes, e.g. document.getElementById.

But this is tricky, and in fact (to take this bug's summary topic and focus on one method) js_fast_array_push is better called via JSOP_CALL as *the* definition of the array push method, instead of via JSOP_TRYFASTNATIVE with fallback to a slow native js_array_push: timing best of three 95 vs. 101. (The JS_GetPrivate => obj->fslots change didn't help.)

So I will separate out JSFastNativeSpec/Block, JS_RegisterFastNatives, and JSOP_TRYFASTNATIVE into a later patch, and convert generic natives to fast natives where safe.

/be
With correct jsarray.c merging (botched in last patch, sorry if anyone wasted time on it).

/be
Attachment #271706 - Attachment is obsolete: true
Attachment #271706 - Flags: review?(mrbkap)
Attachment #271706 - Flags: review?(igor)
(In reply to comment #37)
> (In reply to comment #35)
> > The interpreter can use a cache indexed by few lower bits of pc to record
> > mapping from jsval into JSFunction* for the last executed fast native function.
> 
> Need JSObject * to handle cloned function objects which have different scope
> chains

But JSFunction* pointer stored in JSObject never changes. Thus using jsval as a cache key is safe. And if the call site meets a clone holding the same JSFunction, the interpreter just invalidates the cache as if meets different function. 
Attached patch latest patch (obsolete) — Splinter Review
Review welcome again. Still more JS_FS() to convert, but there are some issues with certain generic natives (ask me on IRC if you are around).

/be
Attachment #271757 - Flags: review?(igor)
Attachment #271724 - Attachment is obsolete: true
I'll convert the remaining generics shortly.

/be
Attachment #271757 - Attachment is obsolete: true
Attachment #271757 - Flags: review?(igor)
Attachment #271782 - Attachment is obsolete: true
Attachment #271793 - Attachment is obsolete: true
Comment on attachment 271800 [details] [diff] [review]
support extra arg slots (local roots) for fast natives too

>Index: jsapi.c

>+JS_PUBLIC_API(uintN)
>+JS_GetArgumentCount(JSContext *cx, jsval *vp)
>+{
>+    JSStackFrame *fp;
>+    uintN argc, extra;
>+    jsval fval;
>+    JSFunction *fun;
>+
>+    fp = cx->fp;
>+    JS_ASSERT(fp);
>+    if (!fp)
>+        return 0;
>+
>+    if (fp->flags & JSFRAME_IN_FAST_CALL) {
>+        JS_ASSERT(fp->sp >= vp + 2);
>+        argc = fp->sp - (vp + 2);
>+        fval = *vp;

This assumes that fast natives do not mess with vp[0]. But given that vp[0] is also used to store the result this can lead to a subtle bugs when one calls GetArgumentCount after assigning to to vp[0]. So either extra or function object must be stored somehow.

>Index: jsarray.c
> static JSBool
>+array_push(JSContext *cx, jsval *vp)
...
>+    v += 2;
>+    JS_ASSERT(STOBJ_GET_SLOT(obj, JSSLOT_ARRAY_LENGTH) == v);
>+    *vp = v;
>+    return JS_TRUE;

Nit: replacing v += 2 by v += INT_TO_JSVAL(1) makes the code much more clear.

The rest of comments is ignorable suggestions and can go to a follow up bugs and does not affect r+. Note also I have not spent much time with native rewrites trusting the test suite and the fuzzier.

>Index: jsdate.c
>===================================================================
....
> static JSBool
>-GetUTCTime(JSContext *cx, JSObject *obj, jsval *argv, jsdouble *rv)
>+GetUTCTime(JSContext *cx, JSObject *obj, jsval *vp, jsdouble *dp)
> {
>     jsval v;
>-    if (!JS_InstanceOf(cx, obj, &js_DateClass, argv))

Removing JSObject* argument here and in GetLocalTime simplifies all fast native callers while slow ones can just pass argv - 2 there.

>Index: jsinterp.c
>===================================================================
>RCS file: /cvsroot/mozilla/js/src/jsinterp.c,v
>retrieving revision 3.358
>diff -p -u -1 -r3.358 jsinterp.c
>--- jsinterp.c	11 Jul 2007 00:29:25 -0000	3.358
>+++ jsinterp.c	11 Jul 2007 06:58:24 -0000
>@@ -508,2 +508,66 @@ PutBlockObjects(JSContext *cx, JSStackFr
> JSBool
>+js_GetPrimitiveThis(JSContext *cx, jsval *vp, JSClass *clasp, jsval *thisvp)
>+{
>+    jsval v;
>+    JSObject *obj;
>+
>+    v = vp[1];
>+    if (JSVAL_IS_OBJECT(v)) {
>+        obj = JSVAL_TO_OBJECT(v);
>+        if (!JS_InstanceOf(cx, obj, clasp, vp + 2))
>+            return JS_FALSE;

A version of JS_InstanceOf without obj argument would be nice.

>+#define PUSH_GLOBAL_THIS(cx,sp)                                               \
>+    JS_BEGIN_MACRO                                                            \
>+        PUSH_OPND(JSVAL_NULL);                                                \
>+        SAVE_SP_AND_PC(fp);                                                   \
>+        ok = ComputeGlobalThis(cx, sp);                                       \
>+        if (!ok)                                                              \
>+            goto out;                                                         \
>+    JS_END_MACRO
>+

Something like JSOP_GLOBALTHIS to replace JSOP_NULL that the emitter pushes in couple of cases to accomplish JSOP_CALL removes the need for null checks in js_ComputeThis.

>===================================================================
> #define NON_LIST_XML_METHOD_PROLOG                                            \
>-    JS_BEGIN_MACRO                                                            \
>-        xml = StartNonListXMLMethod(cx, &obj, argv);                          \
>-        if (!xml)                                                             \
>-            return JS_FALSE;                                                  \
>-        JS_ASSERT(xml->xml_class != JSXML_CLASS_LIST);                        \
>-    JS_END_MACRO
>+    JSObject *obj = JS_THIS_OBJECT(cx, vp);                                   \
>+    JSXML *xml = StartNonListXMLMethod(cx, &obj, vp+2);                       \
>+    if (!xml)                                                                 \
>+        return JS_FALSE;                                                      \
>+    JS_ASSERT(xml->xml_class != JSXML_CLASS_LIST)

StartNonListXMLMethod can call JS_THIS_OBJECT(cx, vp) on its own.
(In reply to comment #46)
> This assumes that fast natives do not mess with vp[0]. But given that vp[0] is
> also used to store the result this can lead to a subtle bugs when one calls
> GetArgumentCount after assigning to to vp[0]. So either extra or function
> object must be stored somehow.

Or (I know, this is a straw man, it may burn up soon due to proper flaming) we say "you must be *this* tall to ride JSFastNative's roller coaster." It would be good to have DEBUG-only sanity checking if we take that approach.

> >Index: jsarray.c
> > static JSBool
> >+array_push(JSContext *cx, jsval *vp)
> ...
> >+    v += 2;
> >+    JS_ASSERT(STOBJ_GET_SLOT(obj, JSSLOT_ARRAY_LENGTH) == v);
> >+    *vp = v;
> >+    return JS_TRUE;
> 
> Nit: replacing v += 2 by v += INT_TO_JSVAL(1) makes the code much more clear.

But alas incorrect, since that adds the tag bit (bit 0) to itself, overflowing into the 31-bit signed int part.

> > static JSBool
> >-GetUTCTime(JSContext *cx, JSObject *obj, jsval *argv, jsdouble *rv)
> >+GetUTCTime(JSContext *cx, JSObject *obj, jsval *vp, jsdouble *dp)
> > {
> >     jsval v;
> >-    if (!JS_InstanceOf(cx, obj, &js_DateClass, argv))
> 
> Removing JSObject* argument here and in GetLocalTime simplifies all fast native
> callers while slow ones can just pass argv - 2 there.

See the JS_FRIEND_API entry points such as js_DateGetMsecSinceEpoch, which call GetUTCTime with null vp but non-null obj. These APIs take no argv param either. Could fake up a jsval [] but I think this is cleaner.

Note how SetUTCTime{,Ptr} has same interface, for same reason (no vp, or else NULL vp to suppress redundant instanceof check).

> >+        obj = JSVAL_TO_OBJECT(v);
> >+        if (!JS_InstanceOf(cx, obj, clasp, vp + 2))
> >+            return JS_FALSE;
> 
> A version of JS_InstanceOf without obj argument would be nice.

Yeah, true. Originally I wanted JSFastNative calls to be guaranteed of the right instance, but we've generalized away from that. So new vp-based APIs to match old argv-based ones could become popular. I will let them do so and file RFE bugs, or at least wait a bit after this bug's patch lands.

> >+#define PUSH_GLOBAL_THIS(cx,sp)                                               \
> >+    JS_BEGIN_MACRO                                                            \
> >+        PUSH_OPND(JSVAL_NULL);                                                \
> >+        SAVE_SP_AND_PC(fp);                                                   \
> >+        ok = ComputeGlobalThis(cx, sp);                                       \
> >+        if (!ok)                                                              \
> >+            goto out;                                                         \
> >+    JS_END_MACRO
> >+
> 
> Something like JSOP_GLOBALTHIS to replace JSOP_NULL that the emitter pushes in
> couple of cases to accomplish JSOP_CALL removes the need for null checks in
> js_ComputeThis.

Good idea, and good use for JSOP_UNUSED117. I'll do this here.

> >+    JSObject *obj = JS_THIS_OBJECT(cx, vp);                                   \
> >+    JSXML *xml = StartNonListXMLMethod(cx, &obj, vp+2);                       \
> >+    if (!xml)                                                                 \
> >+        return JS_FALSE;                                                      \
> >+    JS_ASSERT(xml->xml_class != JSXML_CLASS_LIST)
> 
> StartNonListXMLMethod can call JS_THIS_OBJECT(cx, vp) on its own.

Yeah, good point -- I'll do this now too.

Thanks for the comments. With bugzilla interdiff's help, I hope you can do one more round after I finish abolishing JS_FS usage outside of js.c and a few hard cases (maybe), and do a bunch of testing.

/be
(In reply to comment #47)
> (In reply to comment #46)
> > This assumes that fast natives do not mess with vp[0]. But given that vp[0] is
> > also used to store the result this can lead to a subtle bugs when one calls
> > GetArgumentCount after assigning to to vp[0]. So either extra or function
> > object must be stored somehow.
> 
> Or (I know, this is a straw man, it may burn up soon due to proper flaming) we
> say "you must be *this* tall to ride JSFastNative's roller coaster." It would
> be good to have DEBUG-only sanity checking if we take that approach.

Currently the patch contains 28 JS_ARGC calls. I suspect that adding uint32 field to context and stating that the fast native if it wants argc then it must read it asap would be much less code bloat then those extra 28 calls.

Alternatively instead of using vp[0] GetArgumentsCount can do GET_UINT16(pc) when the fast native is called from the interpreter. In this way no vp is necessary to recover argc.

> > Nit: replacing v += 2 by v += INT_TO_JSVAL(1) makes the code much more clear.
> 
> But alas incorrect, since that adds the tag bit (bit 0) to itself, overflowing
> into the 31-bit signed int part.

What I have been thinking when wrote that?... But some comments would be nice in any case.


> 
> > > static JSBool
> > >-GetUTCTime(JSContext *cx, JSObject *obj, jsval *argv, jsdouble *rv)
> > >+GetUTCTime(JSContext *cx, JSObject *obj, jsval *vp, jsdouble *dp)
> > > {
> > >     jsval v;
> > >-    if (!JS_InstanceOf(cx, obj, &js_DateClass, argv))
> > 
> > Removing JSObject* argument here and in GetLocalTime simplifies all fast native
> > callers while slow ones can just pass argv - 2 there.
> 
> See the JS_FRIEND_API entry points such as js_DateGetMsecSinceEpoch, which call
> GetUTCTime with null vp but non-null obj. These APIs take no argv param either.
> Could fake up a jsval [] but I think this is cleaner.
> 
> Note how SetUTCTime{,Ptr} has same interface, for same reason (no vp, or else
> NULL vp to suppress redundant instanceof check).
> 
> > >+        obj = JSVAL_TO_OBJECT(v);
> > >+        if (!JS_InstanceOf(cx, obj, clasp, vp + 2))
> > >+            return JS_FALSE;
> > 
> > A version of JS_InstanceOf without obj argument would be nice.
> 
> Yeah, true. Originally I wanted JSFastNative calls to be guaranteed of the
> right instance, but we've generalized away from that. So new vp-based APIs to
> match old argv-based ones could become popular. I will let them do so and file
> RFE bugs, or at least wait a bit after this bug's patch lands.
> 
> > >+#define PUSH_GLOBAL_THIS(cx,sp)                                               \
> > >+    JS_BEGIN_MACRO                                                            \
> > >+        PUSH_OPND(JSVAL_NULL);                                                \
> > >+        SAVE_SP_AND_PC(fp);                                                   \
> > >+        ok = ComputeGlobalThis(cx, sp);                                       \
> > >+        if (!ok)                                                              \
> > >+            goto out;                                                         \
> > >+    JS_END_MACRO
> > >+
> > 
> > Something like JSOP_GLOBALTHIS to replace JSOP_NULL that the emitter pushes in
> > couple of cases to accomplish JSOP_CALL removes the need for null checks in
> > js_ComputeThis.
> 
> Good idea, and good use for JSOP_UNUSED117. I'll do this here.
> 
> > >+    JSObject *obj = JS_THIS_OBJECT(cx, vp);                                   \
> > >+    JSXML *xml = StartNonListXMLMethod(cx, &obj, vp+2);                       \
> > >+    if (!xml)                                                                 \
> > >+        return JS_FALSE;                                                      \
> > >+    JS_ASSERT(xml->xml_class != JSXML_CLASS_LIST)
> > 
> > StartNonListXMLMethod can call JS_THIS_OBJECT(cx, vp) on its own.
> 
> Yeah, good point -- I'll do this now too.
> 
> Thanks for the comments. With bugzilla interdiff's help, I hope you can do one
> more round after I finish abolishing JS_FS usage outside of js.c and a few hard
> cases (maybe), and do a bunch of testing.
> 
> /be
> 

(In reply to comment #48)
> Currently the patch contains 28 JS_ARGC calls. I suspect that adding uint32
> field to context and stating that the fast native if it wants argc then it must
> read it asap would be much less code bloat then those extra 28 calls.

Problems:

1. This helps only the built-into-js/src fast natives. Not a problem so much as an observation that code bloat (minor, I will measure) in calling the new API under JS_ARGC is saved only in a bounded manner, not for all JS_ARGC users.

2. Writing this cx->argc is a problem. Fast natives may nest provided there is an intervening frame not flagged JSFRAME_IN_FAST_CALL, so calling a fast native means save = cx->argc; cx->argc = argc; make call; cx->argc = save. Starting to look like the wrong trade-off to me, especially for those hoped-for fixed arity natives that do not need argc and want to be as fast as they can.

> Alternatively instead of using vp[0] GetArgumentsCount can do GET_UINT16(pc)
> when the fast native is called from the interpreter. In this way no vp is
> necessary to recover argc.

I like this idea -- thanks!

/be
Attached patch latest patch (obsolete) — Splinter Review
I hope bugzilla interdiff works. Please pay extra attention to JSOP_THIS and why it changed. Ask questions; I'm too tired to answer now but I'll do so tomorrow. This passed the testsuite a while ago, but needs retesting; my Minefield build is working well and I've used it for dogfood without incident.

Jesse, Bob: could use help fuzzing and testsuiting this patch. Thanks,

/be
Attachment #272922 - Flags: review?(igor)
BTW, I added uintN argc to the JSFastNative signature. There's no point in trying to hide it in memory associated with the cx or fp, since it will go through memory on x86 and the like anyway. If a fast native implementation does not need argc, the cost in pushing it is the same or less as if it were manually pushed.

It's not possible to implement JS_GetArgumentCount based on bytecode inspection due to the generic natives and their dispatcher, which reduces argc by one (if non-zero).

/be
js> [].map(1 for (x in []))
Assertion failure: *pc == JSOP_NULL, at jsopcode.c:3862

js> (4).__lookupGetter__("w")  
Crash [@ obj_lookupGetter]
Attached patch fix bugs jesse found (obsolete) — Splinter Review
I lost a crucial js_ComputeThis call for "inline" calls (scripted function calls that do not nest js_Interpret). This undoes the wrong-headed change to JSOP_THIS. I also forgot about JSFUN_THISP_PRIMITIVE etc. checking for fast natives. All better now.

/be
Attachment #269963 - Attachment is obsolete: true
Attachment #271716 - Attachment is obsolete: true
Attachment #271726 - Attachment is obsolete: true
Attachment #271800 - Attachment is obsolete: true
Attachment #272922 - Attachment is obsolete: true
Attachment #272994 - Flags: review?(igor)
Attachment #269963 - Flags: review?(igor)
Attachment #272922 - Flags: review?(igor)
I plan to finish the review today.
js> f = function() { new (delete y) }
function () {
    new delete y;
}
js> eval(uneval(f))
typein:2: SyntaxError: syntax error:
typein:2: (function () {new delete y;})
typein:2: ..................^

This works correctly without the patch (the parens are kept).
js> 'a'.replace(/a/g, eval)
Assertion failure: !(down->flags & JSFRAME_IN_FAST_CALL), at jsinterp.c:1573

This script causes a crash [@ js_ValueToString].


function c(gen)
{
  Iterator;    

  "" + gen;

  for (var i in gen())
    ;
}

function gen()
{
  ({}).hasOwnProperty();
  yield;
}

c(gen);
Comment on attachment 272994 [details] [diff] [review]
fix bugs jesse found

>RCS file: /cvsroot/mozilla/js/src/jsinterp.c,v
...
>@@ -3843,195 +3922,236 @@ interrupt:
>           END_CASE(JSOP_ENUMELEM)
> 
>           BEGIN_CASE(JSOP_CALL)
>           BEGIN_CASE(JSOP_EVAL)
>             argc = GET_ARGC(pc);
>             vp = sp - (argc + 2);
>             lval = *vp;
>             SAVE_SP_AND_PC(fp);
...
>+                if (fun->flags & JSFUN_FAST_NATIVE) {
>+                    uintN nargs = JS_MAX(argc, fun->u.n.minargs);
>+                    
>+                    nargs += fun->u.n.extra;
>+                    if (argc < nargs) {
>+                        /*
>+                         * If we can't fit missing args and local roots in
>+                         * this frame's operand stack, take the slow path.
>+                         */
>+                        nargs -= argc;
>+                        if (sp + nargs > fp->spbase + depth)
>+                            goto do_invoke;

Any particular reason to check against depth and not against cx->stackPool.current->avail? The current check means that the second visit to the same PC would also skip the fast path despite the available stack space.

>+                        do {
>+                            PUSH(JSVAL_VOID);
>+                        } while (--nargs != 0);
>+                        SAVE_SP(fp);
>+                    }
>+
>+                    if (!PRIMITIVE_THISP_TEST(fun, vp[1])) {
>+                        ok = js_ComputeThis(cx, vp + 2);

This is no longer necessary, isn't it?

>+                        if (!ok)
>+                            goto out;
>+                    }
>+                    START_FAST_CALL(fp);
>+                    ok = ((JSFastNative) fun->u.n.native)(cx, argc, vp);
>+                    END_FAST_CALL(fp);

What roots the local roots for fast native? I do not see that TraceStackFrame takes any particular measure to trace beyond fp->sp which points AFAICS to vp + 2 + argc.

>+                if (fun->flags & JSFUN_INTERPRETED) {
>+                    uintN nframeslots, nvars, nslots, missing;
>+                    JSArena *a;
>+
>+                    /* Compute the 'this' parameter now that argv is set. */
>+                    if (JSFUN_BOUND_METHOD_TEST(fun->flags))
>+                        vp[1] = OBJECT_TO_JSVAL(parent);
>+                    if (!js_ComputeThis(cx, vp + 2))
>+                        goto bad_inline_call;

Again, "this" must be already computed here so js_ComputeThis(cx, vp + 2) is redundant. The same problem AFAICS exists in NoSuchMethod .
(In reply to comment #59)
> 
> Any particular reason to check against depth and not against
> cx->stackPool.current->avail? The current check means that the second visit to
> the same PC would also skip the fast path despite the available stack space.
...
> What roots the local roots for fast native? I do not see that TraceStackFrame
> takes any particular measure to trace beyond fp->sp which points AFAICS to vp +
> 2 + argc.

Please ignore these comments: PUSH(JSVAL_VOID) loop moves sp beyond local roots and the check for depth in fact allows to keep the current logic in js_TraceStackFrame. The optimization to check for space in arena can be done later.
(In reply to comment #59)
> >+                    if (!PRIMITIVE_THISP_TEST(fun, vp[1])) {
> >+                        ok = js_ComputeThis(cx, vp + 2);
> 
> This is no longer necessary, isn't it?

To remove the call to js_ComputeThis here JSOOP_CALLPROP should be patched to call ComputeThis.
(In reply to comment #61)
> To remove the call to js_ComputeThis here JSOOP_CALLPROP should be patched to
> call ComputeThis.

That's necessary (thanks for pointing out the missing ComputeThis) but not enough: there's a function at http://cosmiclog.msnbc.msn.com/blogs/sifr.js (line 7, but the lines are very long) that calls a scripted "normalize" method on a string primitive. Something between JSOP_CALLPROP and JSOP_CALL (which follows right away as there are zero arguments in this case) has to notice that the function being called is not native (in this case) or not flagged with the right JSFUN_THISP_* flags, and force a String wrapper to be created for |this|.

/be
Comment on attachment 272994 [details] [diff] [review]
fix bugs jesse found

>Index: jsopcode.tbl
>===================================================================

> /* More exception handling ops. */
> OPDEF(JSOP_EXCEPTION, 116,"exception",  NULL,         1,  0,  1,  0,  JOF_BYTE)
>-OPDEF(JSOP_UNUSED117, 117,"",           NULL,         0,  0,  0,  0,  0)
>+OPDEF(JSOP_GLOBALTHIS,117,"globalthis", js_null_str,  1,  0,  1,  0,  JOF_BYTE)

Nit: blank line before the JSOP_GLOBALTHIS as it is not exception handling op. 

>-/* ECMA-mandated parenthesization opcode, which nulls the reference base register, obj; see jsinterp.c. */
>+/*
>+ * ECMA-mandated parenthesization opcode, which nulls the reference base
>+ * register, obj; see jsinterp.c.
>+ */
> OPDEF(JSOP_GROUP,       131, "group",       NULL,     1,  0,  0, 19,  JOF_BYTE)

Nit: the comment about "nulls the reference base register, obj" has not been true since the interpreter started to use [callprop] and friends.
Attached patch fix known issues (obsolete) — Splinter Review
I'm not thrilled with the division of labor between JSOP_CALLPROP and the inline call case of JSOP_CALL, but moving the JSFUN_BOUND_METHOD and primitive this block from the latter, which has loaded fun, to the former, would require redundantly loading fun in the former. So JSOP_CALLPROP leaves a primitive this pushed at what will become argv[-1] in hope that argv[-2] is a native flagged JSFUN_THISP_*.

Other fixes should be clear from interdiff. JSOP_CALLPROP had some redundancies and a delayed push, which I moved up to protect nominal this from any GC that might nest in ComputeThis called from the non-primitive-this case.

/be
Attachment #272994 - Attachment is obsolete: true
Attachment #273094 - Flags: review?(igor)
Attachment #272994 - Flags: review?(igor)
Oops, missed the jsopcode.tbl nits. Here's an interdiff showing fixes for those:

diff -u jsopcode.tbl jsopcode.tbl
--- jsopcode.tbl        20 Jul 2007 06:57:22 -0000
+++ jsopcode.tbl        20 Jul 2007 07:04:51 -0000
@@ -262,6 +262,8 @@
 
 /* More exception handling ops. */
 OPDEF(JSOP_EXCEPTION, 116,"exception",  NULL,         1,  0,  1,  0,  JOF_BYTE)
+
+/* Push the global object as |this| for a non-reference-type callable. */
 OPDEF(JSOP_GLOBALTHIS,117,"globalthis", js_null_str,  1,  0,  1,  0,  JOF_BYTE)
 
 /*
@@ -310,10 +312,7 @@
  */
 OPDEF(JSOP_SETLOCALPOP, 130, "setlocalpop", NULL,     3,  1,  0,  3,  JOF_LOCAL|JOF_NAME|JOF_SET)
 
-/*
- * ECMA-mandated parenthesization opcode, which nulls the reference base
- * register, obj; see jsinterp.c.
- */
+/* Parenthesization opcode to help the decompiler. */
 OPDEF(JSOP_GROUP,       131, "group",       NULL,     1,  0,  0, 19,  JOF_BYTE)
 
 /*

/be
Comment on attachment 273094 [details] [diff] [review]
fix known issues

>Index: jsopcode.c
>===================================================================
>@@ -2109,16 +2109,17 @@ Decompile(SprintStack *ss, jsbytecode *p
>                 }
>                 break;
> 
>               case JSOP_GROUP:
>                 cs = &js_CodeSpec[lastop];
>                 if ((cs->prec != 0 &&
>                      cs->prec <= js_CodeSpec[NEXT_OP(pc)].prec) ||
>                     pc[JSOP_GROUP_LENGTH] == JSOP_NULL ||
>+                    pc[JSOP_GROUP_LENGTH] == JSOP_GLOBALTHIS ||
>                     pc[JSOP_GROUP_LENGTH] == JSOP_DUP ||
>                     pc[JSOP_GROUP_LENGTH] == JSOP_IFEQ ||
>                     pc[JSOP_GROUP_LENGTH] == JSOP_IFNE) {
>                     /*
>                      * Force parens if this JSOP_GROUP forced re-association
>                      * against precedence, or if this is a call or constructor
>                      * expression, or if it is destructured (JSOP_DUP), or if
>                      * it is an if or loop condition test.

[callsomething] always prefixes [globalthis] so the new bytecode can not follow [group].

>@@ -3854,17 +3855,17 @@ Decompile(SprintStack *ss, jsbytecode *p
>                     }
>                     jp->script = outer;
> 
>                     /*
>                      * Advance over this op and its null |this| push, and
>                      * arrange to advance over the call to this lambda.
>                      */
>                     pc += len;
>-                    LOCAL_ASSERT(*pc == JSOP_NULL);
>+                    LOCAL_ASSERT(*pc == JSOP_NULL || *pc == JSOP_GLOBALTHIS);
>                     pc += JSOP_NULL_LENGTH;
>                     LOCAL_ASSERT(*pc == JSOP_CALL);
>                     LOCAL_ASSERT(GET_ARGC(pc) == 0);
>                     len = JSOP_CALL_LENGTH;
> 

The assert should check just for [globalthis] as the patch makes sure that null is not used to push this.

r+ with these issues addressed. Note that I reviewed changes in native method signatures and argv[i] -> vp[i + 2] replacements rather shallowly trusting the test suite. I can do it more thoroughly if desired.
js> true.watch("x", function(){})
Assertion failure: !JSVAL_IS_PRIMITIVE(vp[1]) || PRIMITIVE_THIS_TEST(fun, vp[1]), at jsinterp.c:3941

js> (2).eval()
Assertion failure: !JSVAL_IS_PRIMITIVE(vp[1]), at jsinterp.c:620

(In reply to comment #64)
> I'm not thrilled with the division of labor between JSOP_CALLPROP and the
> inline call case of JSOP_CALL, but moving the JSFUN_BOUND_METHOD and primitive
> this block from the latter, which has loaded fun, to the former, would require
> redundantly loading fun in the former. So JSOP_CALLPROP leaves a primitive this
> pushed at what will become argv[-1] in hope that argv[-2] is a native flagged
> JSFUN_THISP_*.

(In reply to comment #68)
> js> (2).eval()
> Assertion failure: !JSVAL_IS_PRIMITIVE(vp[1]), at jsinterp.c:620

Loading of fun in JSOP_CALLPROP is only necessary when checking for the primitive value. Given that the code calls OBJ_GET_PROPERTY there calling OBJ_GET_PRIVATE is nothing. Which also suggests to state that JSFUN_THISP_NUMBER etc. can only be applied to fast native functions and avoid reinterpreting arbitrray jsval as JSObject* when calling slow_native(cx, (JSObject*)thisval, argc, argv, rval) with non-object this. 

(In reply to comment #69)
> Which also suggests to state that
> JSFUN_THISP_NUMBER etc. can only be applied to fast native functions and avoid
> reinterpreting arbitrray jsval as JSObject* when calling slow_native(cx,
> (JSObject*)thisval, argc, argv, rval) with non-object this. 

And if in addition JSStackFrame.(thisp|rval|argv) would also be replaced via single JSStackFrame.vp, it would be even better. 

(In reply to comment #66)
> [callsomething] always prefixes [globalthis] so the new bytecode can not follow
> [group].

Not so:

js> function f(x){(x+2)()}
js> dis(f)
main:
00000:  getarg 0
00003:  int8 2
00005:  add
00006:  group
00007:  globalthis
00008:  call 0
00011:  pop
00012:  stop

Source notes:
  0:     8 [   8] xdelta  
  1:     8 [   0] pcbase   offset 8

> >@@ -3854,17 +3855,17 @@ Decompile(SprintStack *ss, jsbytecode *p
> >                     }
> >                     jp->script = outer;
> > 
> >                     /*
> >                      * Advance over this op and its null |this| push, and
> >                      * arrange to advance over the call to this lambda.
> >                      */
> >                     pc += len;
> >-                    LOCAL_ASSERT(*pc == JSOP_NULL);
> >+                    LOCAL_ASSERT(*pc == JSOP_NULL || *pc == JSOP_GLOBALTHIS);
> >                     pc += JSOP_NULL_LENGTH;
> >                     LOCAL_ASSERT(*pc == JSOP_CALL);
> >                     LOCAL_ASSERT(GET_ARGC(pc) == 0);
> >                     len = JSOP_CALL_LENGTH;
> > 
> 
> The assert should check just for [globalthis] as the patch makes sure that null
> is not used to push this.

Thanks.

Let me fix the problems Jesse found and put up one more patch.

/be
(In reply to comment #70)
> (In reply to comment #69)
> > Which also suggests to state that
> > JSFUN_THISP_NUMBER etc. can only be applied to fast native functions and avoid
> > reinterpreting arbitrray jsval as JSObject* when calling slow_native(cx,
> > (JSObject*)thisval, argc, argv, rval) with non-object this. 
> 
> And if in addition JSStackFrame.(thisp|rval|argv) would also be replaced via
> single JSStackFrame.vp, it would be even better. 

Followup bug? I'm loath to try more in this bug, and the elimination of frame.rval makes a harsh anti-dependency (can't store *vp until you're sure no one is looking at the callee also stored there on entry).

/be
Attached patch updated patch (obsolete) — Splinter Review
On 20/07/07, Brendan Eich <brendan@meer.net> wrote:
> Hey, I'm behind a firewall and can't use cvs. Attached are two
> files modified since the last patch to bug  385393. I could use help
> getting a new patch in the bug. Could one or more of you apply the latest
> patch, replace jsinterp.c and jsopcode.c with these two attachments,
> extract a new diff -pu8 and upload it to the bug? Testing would be bonus!
> Thanks,
Attachment #273142 - Flags: review?(igor)
Attachment #273094 - Attachment is obsolete: true
Attachment #273094 - Flags: review?(igor)
Attachment #273142 - Flags: review?(igor) → review+
js> this.__proto__ = []; [1,2,3,4].map.call();
Assertion failure: OBJ_GET_CLASS(cx, obj) == &js_ArrayClass, at jsarray.c:389

The last patch produces the following regressions:

> js1_5/Function/regress-222029-002.js
1a3,5
> js1_5/Function/regress-222029-001.js
> js1_5/Date/regress-346363.js
> js1_5/Date/regress-346027.js
10a15
> js1_5/Regress/regress-330352.js
19a26
> js1_6/extensions/regress-312385-01.js
25a33
Is it intentional that the "undefined property" warning has been extended to arrays?  I noticed this because reftest hits the warning with "m[2]".

js> [][1]
typein:4: strict warning: reference to undefined property [][1]
(In reply to comment #76)
> Is it intentional that the "undefined property" warning has been extended to
> arrays?  I noticed this because reftest hits the warning with "m[2]".
> 
> js> [][1]
> typein:4: strict warning: reference to undefined property [][1]

I see this with an unpatched shell too.

/be
Oops, I guess I forgot the -s switch when I tested the unpatched shell.
Attached patch updated patch, ready for checkin (obsolete) — Splinter Review
One more review of the interdiff would be welcome.

The failure on js1_5/Regress/regress-330352.js is not caused by this patch. I see it in a clean tree too, and I believe it is on file as bug 379056 -- it seems like a bug that should be fixed soon (I'll comment in it in a minute).

Thanks to Jesse and Igor for testing and review.

/be
Attachment #273142 - Attachment is obsolete: true
Attachment #273248 - Flags: review?(igor)
Comment on attachment 273248 [details] [diff] [review]
updated patch, ready for checkin

(In reply to comment #79)
> The failure on js1_5/Regress/regress-330352.js is not caused by this patch. 

Indeed, I got the failure with CVS tip as well and in fact can not reproduce a successful run of the test...
Attachment #273248 - Flags: review?(igor) → review+
Patch checked in:

js/src/js.c 3.157
js/src/jsapi.c 3.339
js/src/jsapi.h 3.155
js/src/jsarray.c 3.110
js/src/jsarray.h 3.16
js/src/jsatom.h 3.72
js/src/jsbool.c 3.27
js/src/jscntxt.c 3.113
js/src/jscntxt.h 3.154
js/src/jsdate.c 3.85
js/src/jsdbgapi.c 3.97
js/src/jsdbgapi.h 3.35
js/src/jsemit.c 3.268
js/src/jsexn.c 3.86
js/src/jsfun.c 3.203
js/src/jsfun.h 3.38
js/src/jsgc.c 3.227
js/src/jsinterp.c 3.361
js/src/jsinterp.h 3.55
js/src/jsiter.c 3.66
js/src/jsnum.c 3.80
js/src/jsobj.c 3.365
js/src/jsobj.h 3.66
js/src/jsopcode.c 3.256
js/src/jsopcode.tbl 3.100
js/src/jsparse.c 3.294
js/src/jspubtd.h 3.85
js/src/jsregexp.c 3.156
js/src/jsregexp.h 3.27
js/src/jsscript.c 3.150
js/src/jsstr.c 3.150
js/src/jsstr.h 3.37
js/src/jsxml.c 3.161

Watching the tinderboxes....

/be
Status: ASSIGNED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
I backed out.

/be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
This costs a local root. Makes me want to implement igor's suggestion of lifting the stack pointer above depth if possible, but that's tricky. For a followup bug. I could use testing buddies, but console^2 addon works now with this patch.

/be
Attachment #273248 - Attachment is obsolete: true
Attachment #273272 - Flags: review?(igor)
Attached file crash in array concat (obsolete) —
This is crashing in array_concat for me.
Forgot to screen for warnings.

/be
Attachment #273280 - Attachment is obsolete: true
Attachment #273282 - Flags: review?(igor)
Comment on attachment 273282 [details] [diff] [review]
fix blunder in last patch (missing & unary operator)

I should have seen the problem with local roots. The test suite can not be trusted even with that :(
Attachment #273282 - Flags: review?(igor) → review+
eval("this.__defineSetter__('x', gc); this.watch('x', [].slice); x = 1;");

Triggers one of the following assertions:

Assertion failure: thing, at jsgc.c:1722
Assertion failure: offsetInArena < GC_THINGS_SIZE, at jsgc.c:514
Assertion failure: GCTypeToTraceKindMap[*flagp & GCF_TYPEMASK] == kind, at jsgc.c:1768

Without the patch, it seems to work fine.
(In reply to comment #84)
> Created an attachment (id=273272) [details]
> fix array slice/splice/concat to set *vp late
> 
> This costs a local root.

The real problem is that js_TraceStackFrame does not trace frame->argv[-2] or vp[0] so vp[0] can not be used as a rooted location with fast natives when they are invoke slowly. 
Comment on attachment 273282 [details] [diff] [review]
fix blunder in last patch (missing & unary operator)

>Index: jsdbgapi.c
>===================================================================
...
>@@ -520,29 +520,29 @@ js_watch_set(JSContext *cx, JSObject *ob
>                 nslots = 2;
>                 if (fun) {
>-                    nslots += fun->nargs;
>+                    nslots += FUN_MINARGS(fun);
>                     if (FUN_NATIVE(fun))
>                         nslots += fun->u.n.extra;
>                 }

Changing FUN_NATIVE(fun) to !FUN_INTERPRETED(fun) fixes the bug reported in comment 88.
Attached patch latest and greatest (obsolete) — Splinter Review
Thanks, Igor -- fixed here, and I see no FUN_NATIVE tests left that could be similarly wrong.

Thanks also to jresig who has run his js-speed tests at

http://ejohn.org/apps/js-speed/results/

on a patched, optimized js shell. What bothers me are the tests where the patch causes a slowdown. I bet this is due to lack of sufficient operand stack depth to avoid trying the fast path, then bailing to the slow path, paying even more than without the patch. Would be good to prove this, though.

(Another thing bothering me about John's results: JavaScriptCore and its hacked PCRE regular expression implementation kick the ass of SpiderMonkey's regular expression implementation. Crowder, can you analyze further in a separate bug?)

We really need to simplify and streamline the way the JS interpreter's stack is scanned. Right now, various old places in SpiderMonkey will "lift the stack" to push arguments and js_Invoke something, and doing so can cause fp->sp to be far greater than depth jsvals from fp->spbase. If the GC finds this to be the case, it scans only to fp->spbase + depth and stops; any further jsvals must be found via cx->stackHeaders.

I will file a followup bug about this and note it in a comment here.

/be
Attachment #273282 - Attachment is obsolete: true
Attachment #273357 - Flags: review?(igor)
John noted in email that some test failures lead to the wrong engine color being used in certain bar charts -- you can probably spot these just by looking for too few bars in the chart. If the test involves eval or Function, Tamarin can't yet do it, for example.

/be
(In reply to comment #89)
> (In reply to comment #84)
> > Created an attachment (id=273272) [details] [details]
> > fix array slice/splice/concat to set *vp late
> > 
> > This costs a local root.
> 
> The real problem is that js_TraceStackFrame does not trace frame->argv[-2] or
> vp[0] so vp[0] can not be used as a rooted location with fast natives when they
> are invoke slowly.

The slot for the callee is still scanned, from the js_TraceStackFrame for the caller frame. But this gets at the redundant and restrictive scanning and stack layout bug I'm about to file: bug 389214.

/be
> 

> The slot for the callee is still scanned, from the js_TraceStackFrame for the
> caller frame. But this gets at the redundant and restrictive scanning and stack
> layout bug I'm about to file: bug 389214.

Right, and the reason for anti-dependency leading for extra roots in array_slot etc. is that js_TraceStackFrame does not trace frame->fun but rather assume vp[0] roots it. But this should be addressed in bug 389214. For now I will check one more tome that after fast native returns, frame->fun is not accessed. 
(In reply to comment #94)
> Right, and the reason for anti-dependency leading for extra roots in array_slot
> etc. is that js_TraceStackFrame does not trace frame->fun but rather assume
> vp[0] roots it. But this should be addressed in bug 389214. For now I will
> check one more tome that after fast native returns, frame->fun is not accessed. 

js_Invoke() runs callHook after a native call. If the hook first triggers GC and then access frame->fun, then a GC hazard happens with a fast native. When the fast method returns, argv[-2] is overwritten with the result and nothing roots funobj that wraps frame->fun.

A simple fix is to copy argv[-2] to rval before calling the fast native and swap it with frame->rval when the fast method returns. r+ with this addressed.
Attachment #273272 - Attachment is obsolete: true
Attachment #273272 - Flags: review?(igor)
(In reply to comment #94)
> etc. is that js_TraceStackFrame does not trace frame->fun but rather assume
> vp[0] roots it.

Just noting that frame->fun->object is not necessarily the same as the object stored as a jsval in vp[0] (cloning function objects for different scope chains means fun->object != funobj in general).

Thanks for the slow-path fastnative catch. Final patch shortly.

/be
Attached patch patch to commit (obsolete) — Splinter Review
Attachment #273357 - Attachment is obsolete: true
Attachment #273432 - Flags: review+
Attachment #273357 - Flags: review?(igor)
I'm thinking about a followup bug and patch that adds JSCLASS_FAST_NATIVE_OPS flag so the constructor argument to JS_InitClass, the JSObjectOps.{call,construct}, and JSClass.{call,construct} all are of JSFastNative type. Thoughts?

/be
Attached patch urgh -- thanks, Jesse (obsolete) — Splinter Review
This attempts to fix problems reported by Jesse in advance of this bug's patch landing on the trunk (see bug 389331 and bug 389333).

/be
Attachment #273432 - Attachment is obsolete: true
Attachment #273508 - Flags: review?(igor)
Depends on: 389331, 389333
Attachment #273508 - Flags: review?(igor) → review+
Attached patch final patch, or bust (obsolete) — Splinter Review
This includes jsmath.c changes to use fast natives. Also, I put the interpreted inline_call: code first again, to remove any penalty due to failing fast native function flag test.

Still digging into why real-nbody.js and some other tests are slower with this patch at scale, than without.

/be
Attachment #273508 - Attachment is obsolete: true
Attachment #273558 - Flags: review?(igor)
js> [1 for (x in [])].watch();      
Assertion failure: ss->opcodes[pos] == JSOP_NEWINIT, at jsopcode.c:2853

Attached patch bust (thanks again, Jesse) (obsolete) — Splinter Review
Attachment #273558 - Attachment is obsolete: true
Attachment #273666 - Flags: review?(igor)
Attachment #273558 - Flags: review?(igor)
Attachment #273666 - Flags: review?(igor) → review+
js> n = null; [1 for (x in [])](n.b());
Assertion failure: ss->opcodes[pos] == JSOP_NEWINIT, at jsopcode.c:2853

Attached patch bust again, fuzzer finds things (obsolete) — Splinter Review
This is obviously the right fix, but I'll bug Igor one more time in case he can spot anything like this bug elsewhere.

/be
Attachment #273666 - Attachment is obsolete: true
Attachment #273732 - Flags: review?(igor)
(In reply to comment #104)
> Created an attachment (id=273732) [details]
> bust again, fuzzer finds things
> 
> This is obviously the right fix,

But why JSDVG_SEARCH_STACK works without the patch? Given that I have not got it, my r+ for the last change would have no weight.
(In reply to comment #105)
> (In reply to comment #104)
> > Created an attachment (id=273732) [details] [details]
> > bust again, fuzzer finds things
> > 
> > This is obviously the right fix,
> 
> But why JSDVG_SEARCH_STACK works without the patch? Given that I have not got
> it, my r+ for the last change would have no weight.

See just above in that hunk, near the top of JSOP_CALLPROP: to use the stack to GC-protect possibly temporary results, the code pushes JSOP_NULL for the |this| placeholder. This makes a false-positive JSDVG_SEARCH_STACK null match in the testcase. But there's no need to search when we know the exact negative index to get lval.

/be
Looking at why the patch slows down real-nbody.js from jresig's suite (originally from the alioth.debian great language shootout):

$ ../src.old/Darwin_OPT.OBJ/js -f runner.js -f real-nbody.js gc.js
__start_report:real-nbody
N-Body:2:13:16.29:12:198:18.918375974035126
N-Body:4:24:29.08:22:201:26.119260280026264
N-Body:8:47:59.33:44:310:46.15609330841324
N-Body:16:93:118.52:91:362:64.00070706680127
__summarystats:223.21999999999997
__end_report
before 28905392, after 36928, break 01008000
gcNum 15
$ ./Darwin_OPT.OBJ/js -f runner.js -f real-nbody.js 
__start_report:real-nbody
N-Body:2:13:16.16:12:202:19.38640573372814
N-Body:4:23:28.91:22:209:27.959239812789814
N-Body:8:47:58.97:44:236:42.65090973250279
N-Body:16:93:117.03:89:279:57.20200868833015
__summarystats:221.07
__end_report
gcNum 14

So the patch results in one fewer GC runs (14, not 15). But shark's pc sampling shows a greater percentage of hits under js_GC. The src.old (baseline, no patch) results first (top-down):

	42.8%	98.1%	js	js_Interpret	
	7.2%	25.6%	js	 js_GetProperty	
	7.9%	14.0%	js	  js_LookupPropertyWithFlags	
	3.6%	3.7%	js	  js_NativeGet	
	0.0%	0.0%	js	   array_length_getter	
	0.7%	0.7%	js	  __i686.get_pc_thunk.bx	
	0.0%	0.0%	js	  js_SearchScope	
	4.7%	17.3%	js	 js_NewDoubleValue	
	7.2%	12.7%	js	  js_NewGCThing	
	5.2%	5.3%	js	   js_GC	

With the patch:

	42.3%	98.5%	js	js_Interpret	
	8.2%	25.1%	js	 js_GetProperty	
	7.1%	12.9%	js	  js_LookupPropertyWithFlags	
	3.5%	3.5%	js	  js_NativeGet	
	0.5%	0.5%	js	  __i686.get_pc_thunk.bx	
	0.0%	0.0%	js	  array_length_getter	
	5.9%	20.4%	js	 js_NewDoubleValue	
	8.5%	14.4%	js	  js_NewGCThing	
	5.7%	5.8%	js	   js_GC	

The patch results in more wall-time and user-time as measured by time(1). More time is spent under js_GetProperty and js_NewDoubleValue, in relative and absolute terms. The array_length_getter tree widget entry in Shark's profile UI oddly changes from a kid of js_NativeGet to a peer. Maybe that's a shark bug. In no case does js_GetProperty call a getter directly.

Since we are running fewer GCs, but spending more time in GC, we must be marking excessively with the patch. But that's odd too, since the patch eliminates fast native JSStackFrames, reusing operand stack space scanned as part of the caller JSStackFrame. So we should be scanning less.

Any insights?

/be
FYI: with brute force counter instrumentation, I verified that both with and without the patch, real-nbody.js run from the shell (using jresig's rigging, Darwin_OPT.OBJ/js -f runner.js -f real-nbody.js) allocates the same number of double GC-things.

/be
Also, adding a gc.js to the end of the command line, which contains

gc()

I saw the same after brk (end of data segment) and high watermark numbers. Without patch:

before 28905392, after 36928, break 01008000

And with patch:

before 28905392, after 36928, break 01008000

Mysterious why the runtime under js_GC is proportionally and absolutely greater.

/be
Comment on attachment 273732 [details] [diff] [review]
bust again, fuzzer finds things

Ok, I got it.
Attachment #273732 - Flags: review?(igor) → review+
One of my other fuzzers is hitting this assertion in JS_GetFrameFunctionObject a lot:

Assertion failure: OBJ_GET_CLASS(cx, obj) == &js_FunctionClass, at mozilla/js/src/jsdbgapi.c:1063

I can't reproduce easily.  Can you figure out what's going wrong based on this stack trace?  (I'm just guessing that it's due to having this patch in my tree.)
Comment on attachment 273732 [details] [diff] [review]
bust again, fuzzer finds things

>@@ -1046,17 +1053,23 @@ JS_PUBLIC_API(JSFunction *)
> JS_GetFrameFunction(JSContext *cx, JSStackFrame *fp)
> {
>     return fp->fun;
> }
> 
> JS_PUBLIC_API(JSObject *)
> JS_GetFrameFunctionObject(JSContext *cx, JSStackFrame *fp)
> {
>-    return fp->argv && fp->fun ? JSVAL_TO_OBJECT(fp->argv[-2]) : NULL;
>+    if (fp->argv && fp->fun) {
>+        JSObject *obj = JSVAL_TO_OBJECT(fp->argv[-2]);
>+        JS_ASSERT(OBJ_GET_CLASS(cx, obj) == &js_FunctionClass);
>+        JS_ASSERT(OBJ_GET_PRIVATE(cx, obj) == fp->fun);
>+        return obj;
>+    }
>+    return NULL;
> }

This is the reason for the assert from the previous comment. fp->argv[-2] is rval here.
(In reply to comment #107)
> Since we are running fewer GCs, but spending more time in GC, we must be
> marking excessively with the patch.

Here is another possibility. With fewer GC triggered through malloc pressure but the same total amount of garbage the garbage per GC run may no longer fit the CPU cache. Similarly more garbage accumulated before each GC mean that live things interleaved with dead things can occupy more space again consuming more CPU cache. I.e. few GC runs can increase heap fragmentation.

If this is true, then the run time for the test should be affected via changing GC parameters or via frequently calling gc() from the test js code and comparing timing with and without the patch. Also stats assembled with JS_GCMETER can put more light on the issue, especially the fragmentation stats. 
I'm hitting the assertion in comment 111 even without this patch.
(In reply to comment #114)
> I'm hitting the assertion in comment 111 even without this patch.

Hmm, that's odd. I can see why you're hitting it with the patch: array_join_sub clobbers *rval before looping over array elements, converting each to string.

New patch in a few.

/be
I hope this is the last one!

I toyed with the idea of adding 1 to the stack burden of all calls, for the return value (argv[-3]) on the stack. Such a chance could then do away with frame.rval. Should I go for it here? This could wait for that followup bug on my list about stack efficiency.

/be
Attachment #273732 - Attachment is obsolete: true
Attachment #274097 - Flags: review?(igor)
(In reply to comment #115)
> Hmm, that's odd. I can see why you're hitting it with the patch: array_join_sub
> clobbers *rval before looping over array elements, converting each to string.

This can happen with this-optimized native calls where argv[-2] is primitive. What is is the stack trace with CVS tip crashes? 
(In reply to comment #116)
> Created an attachment (id=274097) [details]
> fix array_join_sub and callers/specs

A better idea is to add JSStackFrame.fval so fast natives can use argv[-2] as they want. Then follow up patches can remove JSStackFrame.rval etc. 
(In reply to comment #117)
> (In reply to comment #115)
> > Hmm, that's odd. I can see why you're hitting it with the patch: array_join_sub
> > clobbers *rval before looping over array elements, converting each to string.
> 
> This can happen with this-optimized native calls where argv[-2] is primitive.
> What is is the stack trace with CVS tip crashes? 

Please ignore this: this is argv[-1], not argv[-2].

Now that bug 389880 is fixed, I get conflicts in jsstr.c when I apply this patch.
I will update the patch in few minutes.
Attached patch updated patch (obsolete) — Splinter Review
Attachment #274097 - Attachment is obsolete: true
Attachment #274097 - Flags: review?(igor)
Attached patch patch with JSStackFrame.callee (obsolete) — Splinter Review
This is a patch that adds JSStackFrame.callee. Given the amount if ifs it removed it seems the right thing to do. Plus to avoid redundant GC-tracing the patch changes js_Invoke to set fp->sp to vp to delegate the scanning of callee, this and arguments to the new frame. In this way js_TraceStackFrame can unconditionally scan the whole vp array.

Note I have not done any testing of the patch yet.
A thought about local roots: what about removal of local root support for fast natives and using JSTempValueRoot instead? Surely it means more code bloat but it also means one less if to check and it significantly decreases the chance in the interpreter of taking a slow path. And removal of minargs would allow to always take the fast path when calling the fast functions.

But this can wait a follow up bug.
(In reply to comment #123)
> Created an attachment (id=274403) [details]
> patch with JSStackFrame.callee

The patch generated the following regressions:

*-* Testcase js1_6/Regress/regress-350417.js failed:
Bug Number 350417
STATUS: Do not crash decompiling "is not function" msg
Failure messages were:
 FAILED! expected: [reported from test()] Expected value 'TypeError: y.a = [2 for each (p in [])] is not a function', Actual value 'TypeError: (y.a = [2 for each (p in [])])() is not a function' 


*-* Testcase js1_7/block/regress-352609.js failed:
Bug Number 352609
STATUS: decompilation of |let| expression for |is not a function| error
Failure messages were:
 FAILED! expected: [reported from test()] Expected match to '/TypeError: (p.z = \[let \(x = 3, y = 4\) x\]|.*Array.*) is not a function/', Actual value 'TypeError: (p.z = [let (x = 3, y = 4) x])() is not a function' 

I am going to check why the patch caused the decompiler to wrap those expressions with extra ()().
Attached patch fix to last patch (obsolete) — Splinter Review
(In reply to comment #124)
> A thought about local roots: what about removal of local root support for fast
> natives and using JSTempValueRoot instead? Surely it means more code bloat but
> it also means one less if to check and it significantly decreases the chance in
> the interpreter of taking a slow path. And removal of minargs would allow to
> always take the fast path when calling the fast functions.

I thought about that, but it seemed like an unnecessary reduction in efficiency for the built-ins affected. I'd rather work on stack layout and get it right, with the benefit that running out of room in a stack arena for extra local root slots would be very rare.

How's this for a fix to your last patch (thanks for that -- I was offline too much last week)? The fp->sp = vp; you added in js_Invoke was to blame for the problems under js_DecompileValueGenerator -- need to see the caller's operands that became the callee's arguments, in order to find their generating pc's at -depth. This patch also removes some trailing whitespace that crept in.

I still would like to restore lost *vp rooting that was replaced with local roots, but I haven't done that yet. Should be safe now, thanks to fp->callee, right?

/be
Attachment #274403 - Attachment is obsolete: true
Attachment #274713 - Flags: review?(igor)
(In reply to comment #126)
> I still would like to restore lost *vp rooting that was replaced with local
> roots, but I haven't done that yet. Should be safe now, thanks to fp->callee,
> right?

Wrong, I was thinking only of the slow path. Oh well, best to get this in and get on to fixing bug 389214.

/be
With the patch I'm seeing these results with real-nbody.js:

__start_report:real-nbody
N-Body:2:11:11.49:11:13:0.5594911107696917
N-Body:4:20.5:22.15:20:97:10.642372903168775
N-Body:8:41:43.66:40:116:14.797774880506239
N-Body:16:81:86.42:80:157:19.232327582547683
__summarystats:163.72
__end_report

compared to without:

__start_report:real-nbody
N-Body:2:12:11.66:11:14:0.5359782899266796
N-Body:4:21:22.78:20:101:11.253354157671383
N-Body:8:42:45.16:41:121:15.413309216715955
N-Body:16:84:89.36:83:164:20.160727900920378
__summarystats:168.95999999999998
__end_report

So it seems the patch is not slowing things down for this case -- jresig, can you apply the latest patch, rerun, and update your http://ejohn.org/apps/js-speed/results/ page? Thanks,

/be
Attachment #274389 - Attachment is obsolete: true
Attachment #274713 - Flags: review?(igor) → review+
(In reply to comment #126)
> I still would like to restore lost *vp rooting that was replaced with local
> roots, but I haven't done that yet. Should be safe now, thanks to fp->callee,
> right?

That should be safe and, given the closed state of the tree, it does not matter if it would be done in a separated bug or here. 
And I eliminated the tvr in array_join_sub.

/be
Attachment #274713 - Attachment is obsolete: true
Attachment #274840 - Flags: review?(igor)
Comment on attachment 274840 [details] [diff] [review]
last patch, plus local root elimination via early *vp setting for array methods

> JS_PUBLIC_API(JSObject *)
> JS_GetFrameFunctionObject(JSContext *cx, JSStackFrame *fp)
> {
>-    return fp->argv && fp->fun ? JSVAL_TO_OBJECT(fp->argv[-2]) : NULL;
>+    if (fp->fun) {
>+        JSObject *obj = fp->callee;
>+
>+        JS_ASSERT(fp->argv);
>+        JS_ASSERT(OBJ_GET_CLASS(cx, obj) == &js_FunctionClass);
>+        JS_ASSERT(OBJ_GET_PRIVATE(cx, obj) == fp->fun);
>+        return obj;
>+    }
>+    return NULL;
> }

...

>@@ -774,16 +775,17 @@ FunctionBody(JSContext *cx, JSTokenStrea
>     uintN oldflags, firstLine;
>     JSParseNode *pn;
> 
>     fp = cx->fp;
>     funobj = fun->object;
>     if (!fp || fp->fun != fun || fp->varobj != funobj ||
>         fp->scopeChain != funobj) {
>         memset(&frame, 0, sizeof frame);
>+        frame.callee = funobj;
>         frame.fun = fun;
>         frame.varobj = frame.scopeChain = funobj;
>         frame.down = fp;
>         if (fp)
>             frame.flags = fp->flags & JSFRAME_COMPILE_N_GO;
>         cx->fp = &frame;
>     }
> 

When I added fp.callee, I kept that fp->argv check in JS_GetFrameFunctionObject not to change the semantics of the code so JS_GetFrameFunctionObject continues to return null when it does  without the patch. This can be wrong concern, but then JS_ASSERT(fp->argv) is not compatible with FunctionBody either. r+ with this addressed.
I don't think the compiler can reach a jsdbgapi entry point, but no reason to take a chance for this bug, or ever (very minor code bloat, and it makes this jsdbgapi entry point cope with all known frame state combinations). Commented.

/be
Attachment #274840 - Attachment is obsolete: true
Attachment #274851 - Flags: review+
Attachment #274840 - Flags: review?(igor)
Committed on trunk:

js/src/js.c 3.159
js/src/jsapi.c 3.342
js/src/jsapi.h 3.157
js/src/jsarray.c 3.112
js/src/jsarray.h 3.18
js/src/jsatom.h 3.74
js/src/jsbool.c 3.29
js/src/jscntxt.c 3.116
js/src/jscntxt.h 3.156
js/src/jsdate.c 3.87
js/src/jsdbgapi.c 3.99
js/src/jsdbgapi.h 3.37
js/src/jsemit.c 3.270
js/src/jsexn.c 3.88
js/src/jsfun.c 3.206
js/src/jsfun.h 3.40
js/src/jsgc.c 3.229
js/src/jsinterp.c 3.363
js/src/jsinterp.h 3.57
js/src/jsiter.c 3.68
js/src/jsmath.c 3.30
js/src/jsnum.c 3.82
js/src/jsobj.c 3.367
js/src/jsobj.h 3.68
js/src/jsopcode.c 3.258
js/src/jsopcode.tbl 3.102
js/src/jsparse.c 3.296
js/src/jspubtd.h 3.87
js/src/jsregexp.c 3.159
js/src/jsregexp.h 3.29
js/src/jsscript.c 3.152
js/src/jsstr.c 3.153
js/src/jsstr.h 3.40
js/src/jsxml.c 3.164

/be
Status: ASSIGNED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Can't clobber *vp early, something is looking at fp->argv[-2] and I don't have time to find it right now:

js/src/jsarray.c 3.113

/be

/be
Attachment #274880 - Flags: review?(igor)
I'm pretty intent on eliminating the anti-dependency between *vp as callee and *vp as rval, in bug 389214. I will clean up any remaining excessive local roots then. But this turns out to be not a problem currently: the jsarray.c rev 3.113 followup patch did not help. I'm backing it out.

Thanks to waldo for debugging help. The problem is uninitialized v passed by &v to rt->checkObjectAccess (called via local checkAccess) in InitExnPrivate (jsexn.c). Igor take note, this was from one of your recent patches (which I'm grateful for; I should have seen this, it's a mozilla/caps/src/nsScriptSecurityManager.cpp thing): checkObjectAccess's implementation uses its final jsval *vp param as an in/out parameter(!). Fixing that now by initializing v = JSVAL_NULL:

js/src/jsexn.c 3.89

/be
(In reply to comment #135)
> The problem is uninitialized v passed by &v
> to rt->checkObjectAccess (called via local checkAccess) in InitExnPrivate
> (jsexn.c).

The code that was in jsexn.c before I did the changes was:

            v = (fp->fun && fp->argv) ? fp->argv[-2] : JSVAL_NULL;
            if (!JSVAL_IS_PRIMITIVE(v)) {
                ok = checkAccess(cx, JSVAL_TO_OBJECT(fp->argv[-2]), callerid,
                                 JSACC_READ, &v /* ignored */);

Those /* ignored */ comments spoiled me.
$ cvs ci -m"Fix bogus assertion in last patch (for 385393)." jsinterp.c
Checking in jsinterp.c;
/cvsroot/mozilla/js/src/jsinterp.c,v  <--  jsinterp.c
new revision: 3.364; previous revision: 3.363
done

/be
jsstr.c:945: error: conflicting types for 'js_BoyerMooreHorspool'
jsstr.h:423: error: previous declaration of 'js_BoyerMooreHorspool' was here

jsstr.c:
| int
| js_BoyerMooreHorspool(const jschar *text, jsint textlen,
|                       const jschar *pat, jsint patlen,
|                       jsint start)
jsstr.h:
| extern jsint
| js_BoyerMooreHorspool(const jschar *text, jsint textlen,
|                       const jschar *pat, jsint patlen,
|                       jsint start);

This int/jsint mismatch causes a compilation error on mingw
where jsint is a typdef for unsigned long.
(In reply to comment #138)
> jsstr.c:945: error: conflicting types for 'js_BoyerMooreHorspool'
> jsstr.h:423: error: previous declaration of 'js_BoyerMooreHorspool' was here

I fixed that:

Checking in jsstr.c;
/cvsroot/mozilla/js/src/jsstr.c,v  <--  jsstr.c
new revision: 3.154; previous revision: 3.153
done
Depends on: 390598
Could this patch caused https://bugzilla.mozilla.org/show_bug.cgi?id=390633 Chatzilla not opening, and reporting 'slow script' errors ? 
Depends on: 390684
Depends on: 390847
Depends on: 391280
Depends on: 391406
Attached file test check in log
These tests cover the problems found by Jesse but do not include the test for performance of push vs. append since the non deterministic variability of the results make it inappropriate for an automated test.
Flags: in-testsuite+
Attaching js/js1_5/Regress/regress-385393.js for posterity's sake. This does test the performance aspect of this bug.
verified fixed 1.9.0 linux/macppc/windows on performance and Jesse's regressions.
Status: RESOLVED → VERIFIED
Comment on attachment 274880 [details] [diff] [review]
should have played it even safer

The patch was made unnecessary due to other changes.
Attachment #274880 - Flags: review?(igor)
Might be nice to get some MDC docs related to the fast-native stuff.
(In reply to comment #145)
> Might be nice to get some MDC docs related to the fast-native stuff.

You mean update the JS API reference? What a concept ;-).

/be
Summary: a.push(b) slower than a[a.length] = b; → array.push(b) slower than a[a.length] = b;
Alias: jsfastnative
You need to log in before you can comment on or make changes to this bug.