Last Comment Bug 614138 - Remove the temp value rooting traceable quickstubs do
: Remove the temp value rooting traceable quickstubs do
Status: RESOLVED FIXED
fixed-in-tracemonkey
: perf
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: Trunk
: x86 Mac OS X
: P1 normal (vote)
: mozilla2.0b8
Assigned To: Boris Zbarsky [:bz] (still a bit busy)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks: domperf 614145 618217
  Show dependency treegraph
 
Reported: 2010-11-22 17:42 PST by Boris Zbarsky [:bz] (still a bit busy)
Modified: 2010-12-22 19:36 PST (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
We don't need to temp-root our various JSObjects, since they're either reachable from our arguments, kept alive by strong refs to an nsXPCWrappedJS, or reachable via conservative stack scanning. (14.32 KB, patch)
2010-11-22 19:11 PST, Boris Zbarsky [:bz] (still a bit busy)
jorendorff: review-
Details | Diff | Splinter Review
Add an API for holding GC objects while we use values they own. (4.62 KB, patch)
2010-12-02 16:32 PST, Jim Blandy :jimb
jorendorff: feedback+
Details | Diff | Splinter Review
Difference in code generated for test case without/with js::Anchor instance (7.59 KB, patch)
2010-12-03 12:29 PST, Jim Blandy :jimb
no flags Details | Diff | Splinter Review
Add an API for holding GC objects while we use values they own. (5.54 KB, patch)
2010-12-03 13:13 PST, Jim Blandy :jimb
no flags Details | Diff | Splinter Review
Add an API for holding GC objects while we use values they own. (5.74 KB, patch)
2010-12-08 11:02 PST, Jim Blandy :jimb
no flags Details | Diff | Splinter Review
Add an API for holding GC objects while we use values they own. (7.24 KB, patch)
2010-12-08 12:36 PST, Jim Blandy :jimb
no flags Details | Diff | Splinter Review
Add an API for holding GC objects while we use values they own. (7.29 KB, patch)
2010-12-08 13:18 PST, Jim Blandy :jimb
jorendorff: review+
bzbarsky: approval2.0+
Details | Diff | Splinter Review
Merged on top of jimb's patch (12.23 KB, patch)
2010-12-08 13:48 PST, Boris Zbarsky [:bz] (still a bit busy)
jorendorff: review+
bzbarsky: approval2.0+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] (still a bit busy) 2010-11-22 17:42:21 PST
I looked through the uses of xpc_qsArgValArray; the following places write to it:

A)  Conversion of the return value.  This is converting into a JSObject on the
    stack that we then return, so we can rely on stack scanning here.
B)  |this| unwrapping (FromCcx or not).  This calls getWrapper on the given
    JSObject (which hands back an obj2 which is on the parent or proto chain of
    the given object or something that object wraps) and then calls castNative,
    which just calls getNative, which just hands out the object it was given. 
    So the value we write into the rooted array is something that seems to be
    reachable from the original object we had, and hence is being kept alive by
    it, unless I'm missing something.
C)  Argument unboxing for traceable natives.  This works like so:

   1) Take incoming |v| and converts to |JSObject *src| (no object crations
      here)
   2) getWrapper gets an obj2 which is on the parent or proto chain of |src| or
      something |src| wraps.
  3a) Create a wrapped JS (this always sets the argNref.ptr too, so that strong
      ref will keep things alive).
    or
  3b) Call castNative on obj2.  This calls one of getNativeFromWrapper or
      getNative.  getNativeFromWrapper just calls getNative. getNative just
      hands out the obj that was passed in as *vp.

    So again, I think not storing to the array for this stuff is safe.

I'd really appreciate a sanity check on the above!

In fact, except for the return value none of these really care about the resulting jsval at all...  The non-traceable versions might, though (they stick it into the actual vp, etc).  But if not, can we just simplify some of these function signatures?

In any case, patch to nix the memset+rooting coming up, since it seems to be unnecessary.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2010-11-22 19:11:44 PST
Created attachment 492557 [details] [diff] [review]
We don't need to temp-root our various JSObjects, since they're either reachable from our arguments, kept alive by strong refs to an nsXPCWrappedJS, or reachable via conservative stack scanning.
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2010-11-22 19:12:44 PST
Comment on attachment 492557 [details] [diff] [review]
We don't need to temp-root our various JSObjects, since they're either reachable from our arguments, kept alive by strong refs to an nsXPCWrappedJS, or reachable via conservative stack scanning.

Please review my assumptions here carefully!
Comment 3 Jason Orendorff [:jorendorff] 2010-12-02 12:36:06 PST
Comment on attachment 492557 [details] [diff] [review]
We don't need to temp-root our various JSObjects, since they're either reachable from our arguments, kept alive by strong refs to an nsXPCWrappedJS, or reachable via conservative stack scanning.

>diff --git a/content/canvas/src/CustomQS_WebGL.h b/content/canvas/src/CustomQS_WebGL.h
In content/canvas/src/CustomQS_WebGL.h:
> static inline void FASTCALL
> helper_nsIDOMWebGLRenderingContext_Uniform_x_iv_tn(JSContext *cx, JSObject *obj, JSObject *locationobj,
>                                                       JSObject *arg, int nElements)
> {
>     XPC_QS_ASSERT_CONTEXT_OK(cx);
> 
>     nsIDOMWebGLRenderingContext *self;
>     xpc_qsSelfRef selfref;
>-    xpc_qsArgValArray<3> vp(cx);
>-    if (!xpc_qsUnwrapThis(cx, obj, nsnull, &self, &selfref.ptr, &vp.array[0], nsnull)) {
>-        js_SetTraceableNativeFailed(cx);
>-        return;
>+    { /* scope for |dummy| */
>+        jsval dummy;
>+        if (!xpc_qsUnwrapThis(cx, obj, nsnull, &self, &selfref.ptr, &dummy,
>+                              nsnull)) {
>+            js_SetTraceableNativeFailed(cx);
>+            return;
>+        }
>     }

Hmm. Well, only obj is necessarily rooted, and obj might not be the XPCWrappedNative that keeps the COM object, self, alive. The XPCWrappedNative may be on obj's prototype chain. In the words of your comment 0, item B, this is "reachable from the original object we had"; but it won't necessarily stay that way the whole time we need self to stay alive. If we call back into JS code, obj.__proto__ could be assigned. The XPCWN could become garbage. GC could destroy self.

I don't know if that could actually happen or not in this code. If self is an XPCWrappedJS, it probably gets AddRef'd and stored in selfref, right? So the only remaining question is whether any of the Uniform*iv_array methods, or any of the other junk we're calling (js_CreateTypedArrayWithArray and so on) could call user code.

A quick look at traceableArgumentConversionTemplate in qsgen.py suggests that we can't reenter, and [traceable] is opt-in, so ... perhaps none of the methods that are [traceable] have any implementations that can possibly re-enter, even in unusual cases? I dunno. That's a pretty big assumption for me to swallow. How badly do we need to get rid of this rooting code?

This seems precarious. I'd really rather not, unless I'm missing something that makes it safer, or measurement indicates this is much faster. But if we ever manage to ban assigning to __proto__, this would be OK.

r- for now, and clearing r?peterv.
Comment 4 Andreas Gal :gal 2010-12-02 12:48:20 PST
If you guarantee that a stack variable points to the object thats sufficient for rooting and cheaper.
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2010-12-02 14:36:04 PST
> If self is an XPCWrappedJS, it probably gets AddRef'd and stored in selfref,
> right? 

Yes.

> If we call back into JS code

I don't think that's a problem for the GL stuff, but for general quickstubs it could be, in theory.  Think dispatchEvent, where the actual C++ method to be called can most definitely enter JS code.

And for general argument conversions, of course, we can call various toString methods and whatnot on later arguments after we've unwrapped the earlier arguments, right?  Though maybe we don't do that in the traceable natives.

> How badly do we need to get rid of this rooting code?

For some microbenchmarks (I was looking at dromaeo) it's 20% or more of total runtime.  At least once I get dromaeo back on trace.

I guess plan B might be to more carefully manage our rooting array so things are only rooted if really needed; for getElementById we create a rooted slot for the string, then don't use it, say.  That'll give a smaller speedup, but still noticeable.

> If you guarantee that a stack variable points to the object

We can't guarantee that, can we?  The whole point is that we have these JSObject* that aren't used directly (we're working with their private ptrs only), so the compiler could optimize them away...
Comment 6 Jim Blandy :jimb 2010-12-02 16:32:54 PST
Created attachment 494876 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

The comments should say it all.  This would need an MSVC port.
Comment 7 Jim Blandy :jimb 2010-12-02 18:39:29 PST
There may need to be a "memory" operand to the asm there, too. I'm not sure we've really prevented all the reorderings we need to. G++ could still move (what it believes to be) irrelevant code past the asm.
Comment 8 Jason Orendorff [:jorendorff] 2010-12-03 10:45:08 PST
(In reply to comment #5)
> And for general argument conversions, of course, we can call various toString
> methods and whatnot on later arguments after we've unwrapped the earlier
> arguments, right?  Though maybe we don't do that in the traceable natives.

We don't currently. But it's hard to be sure, and hard to vouch for future changes to qsgen.py.

jimb's approach is better, though the lack of certainty is mildly scary.
Comment 9 Jason Orendorff [:jorendorff] 2010-12-03 10:59:42 PST
Comment on attachment 494876 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

Yes, this sounds good. feedback+. Who can provide the MSVC code? Surely other C++ programs that use conservative GC have precisely this problem. Whose code can we look at?

FWIW, does the test fail without the Anchor?

Did you look at the effect this has on generated code?

Ultimately, we need to use Anchors in a bunch of places, I think -- or rather, Anchor-based RAII classes: StringChars, FunctionScript, that kind of thing.
Comment 10 Jim Blandy :jimb 2010-12-03 12:11:29 PST
(In reply to comment #9)
> Yes, this sounds good. feedback+. Who can provide the MSVC code? Surely other
> C++ programs that use conservative GC have precisely this problem. Whose code
> can we look at?

I'll check Guile, although I don't know that it's been ported to MSVC.

> FWIW, does the test fail without the Anchor?

Indeed it does.

> Did you look at the effect this has on generated code?

No, I have not.

> Ultimately, we need to use Anchors in a bunch of places, I think -- or rather,
> Anchor-based RAII classes: StringChars, FunctionScript, that kind of thing.

Should there be 'typedef Anchor<JSString *> StringAnchor;' and things like that? I was thinking there should be some dummy 'traits' template to ensure it's only used at the types it makes sense for.
Comment 11 Jim Blandy :jimb 2010-12-03 12:18:35 PST
(In reply to comment #9)
> Yes, this sounds good. feedback+. Who can provide the MSVC code? Surely other
> C++ programs that use conservative GC have precisely this problem. Whose code
> can we look at?

Guile simply passes the value to a no-op function on non-GCC platforms. We need to do better than that.
Comment 12 Jim Blandy :jimb 2010-12-03 12:29:54 PST
Created attachment 495080 [details] [diff] [review]
Difference in code generated for test case without/with js::Anchor instance

Here's the effect on the machine code. Looking through it now...
Comment 13 Jim Blandy :jimb 2010-12-03 12:47:33 PST
So, it affected register allocation (as it should), which affected some prologue and epilogue code (more registers to save/restore). This is all expected.

It also rearranged some of the cold branches (calls to JS_Assert). I didn't see any changes in their actual behavior.

I didn't see any other effects on the code in this particular case. It certainly didn't deoptimize the world.
Comment 14 Jim Blandy :jimb 2010-12-03 13:13:06 PST
Created attachment 495090 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

This version has the memory-clobbering notation, and it also prevents instantiating the template at types that the conservative GC wouldn't actually notice.
Comment 15 Jason Orendorff [:jorendorff] 2010-12-07 11:17:15 PST
Well, then. Very much let's get some MSVC code for this and land it.
Comment 16 Jim Blandy :jimb 2010-12-08 08:49:09 PST
Who can we shanghai into doing the MSVC port?  Let's name names.
Comment 17 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 09:50:21 PST
Would it make sense to be able to ask for the &hold of the Anchor so it can be passed to a function?  Otherwise I end up needing to use this pattern:

  jsval temp;
  GetFoo(&temp);
  js::Anchor temp_anchor(temp);
Comment 18 Robert Sayre 2010-12-08 09:57:34 PST
Comment on attachment 495090 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

>Bug 614138: Add an API for holding GC objects while we use values they own.
>
>+         *
>+         * The "memory" clobber operand ensures that G++ will not move prior memory
>+         * accesses after the asm --- it's a barrier. Unfortunately, it also means that
>+         * G++ will assume that all memory has changed after the asm, as it would for a
>+         * call to an unknown function. I don't know of a way to
>+         */
>+        asm volatile("":: "g" (hold) : "memory");

This comment trails off. Also, I'm naively wondering why the "hold" field can't be marked volatile with plain old C++? Does that not actually work?
Comment 19 Steve Fink [:sfink] [:s:] 2010-12-08 10:29:11 PST
Perhaps a dumb question, but why isn't it enough to use an RAII class that uses the value at destruction time?

class Anchor<T> {
  ~Anchor() {
    if (!hold)
      *(int*)hold = 17; // Or ++static_nonsense_variable, or whatever.
  }
}

Is that too easy to reorder? Or am I missing something else?

If reordering is a problem, maybe additionally making 'hold' volatile would help? (Probably not.) Or perhaps

static void* global_hold;
class Anchor<T> {
  T* hold;
  Anchor(T* v) : hold(v) { global_hold = (void*) v; }
  ~Anchor() { if (!hold) global_hold = NULL; }
}

though that obviously adds a small amount of useless work. But nothing reads global_hold, so the write will never stall the pipeline. gcc will think that the GC-triggering thing in the body might read global_hold, so it can't move the read of 'hold' up, and moving the store to 'hold' at the top wouldn't matter.

Or I'm just confusing myself. I'll be quiet now.
Comment 20 Steve Fink [:sfink] [:s:] 2010-12-08 10:32:00 PST
(In reply to comment #18)
> Also, I'm naively wondering why the "hold" field can't
> be marked volatile with plain old C++? Does that not actually work?

AIUI, volatile just means the value in memory can change without you writing to it, so you'd better re-fetch it every time you need it. But if you never use it, then that's irrelevant; you can still optimize it away entirely, because all zero uses are properly re-fetching it from memory.
Comment 21 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 10:34:10 PST
>      *(int*)hold = 17; // Or ++static_nonsense_variable, or whatever.

|hold| is not necessarily a pointer type.  In fact, in the case of jsval it most definitely is not.
Comment 22 Jim Blandy :jimb 2010-12-08 11:02:50 PST
Created attachment 496203 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

Revised; comment fixes, permit const variants.
Comment 23 Steve Fink [:sfink] [:s:] 2010-12-08 11:52:04 PST
(In reply to comment #21)
> >      *(int*)hold = 17; // Or ++static_nonsense_variable, or whatever.
> 
> |hold| is not necessarily a pointer type.  In fact, in the case of jsval it
> most definitely is not.

Ok. But that still seems like a relatively minor detail, if you can eliminate the compiler-specific construct.

It's already enumerating the exact set of types that it supports, so it can just specialize the non-pointer type to do something else with a jsval (the only non-pointer it accepts.) Is it good enough to have the address of a jsval on the stack? (If so, you don't even need to specialize for jsval. You can make the general case take the address of the anchored value, and specialize for pointers by just grabbing the pointer.)

FWIW, I modified the patch to just do

    ~Anchor() {
        if (!reinterpret_cast<int*>(hold)) {
            *reinterpret_cast<int*>(hold) = 17;
        }
    }
 
and it passed the test (and failed if I commented out the destructor body.)

I also made a version with a specialization for jsval. The test doesn't use an Anchor<jsval>, though, so I don't know if it really works. (I added a dummy declaration just to be sure it compiles, but I don't have a test for it.)
Comment 24 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 12:09:18 PST
> Is it good enough to have the address of a jsval on the stack? 

No.

Note that the idea here is to have zero impact, btw.  Doing stuff in the dtor sorta fails that test.  For my use case (the one this bug is about), the cost of the zero writes to the stack in the current setup is sufficient to cause a slowdown.  Note that I'm trying to shave off processor cycles (as in, probably less than 10) per DOM call here; the relevant stupid benchmarks are measuring operations that are taking order of 100 processor cycles on my hardware.
Comment 25 Jim Blandy :jimb 2010-12-08 12:36:17 PST
Created attachment 496235 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

Adds a portable alternative to the __GNUC__-only implementation. The tests can detect whether the 'volatile' qualifier is present in the destructor, so I think we'll at least have some warning if this fails.
Comment 26 Jim Blandy :jimb 2010-12-08 12:42:51 PST
(In reply to comment #19)
> Perhaps a dumb question, but why isn't it enough to use an RAII class that uses
> the value at destruction time?
> 
> class Anchor<T> {
>   ~Anchor() {
>     if (!hold)
>       *(int*)hold = 17; // Or ++static_nonsense_variable, or whatever.
>   }
> }
> 
> Is that too easy to reorder? Or am I missing something else?

... the code you wrote above writes a 17 into the JSObject, doesn't it?

But in general, it's too easy to reorder, or to throw away entirely. Imagine the compiler simply substituting the destructor's body into the function at the appropriate point(s), and see if you can throw it away.
Comment 27 Jim Blandy :jimb 2010-12-08 13:18:15 PST
Created attachment 496250 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

Add a T &get() member and a no-argument constructor, as requested in comment #17.
Comment 28 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 13:48:08 PST
Created attachment 496267 [details] [diff] [review]
Merged on top of jimb's patch
Comment 29 Jason Orendorff [:jorendorff] 2010-12-08 17:46:13 PST
Comment on attachment 496250 [details] [diff] [review]
Add an API for holding GC objects while we use values they own.

I still say all this conservative GC stuff is a big mistake.

Soon someone will want Anchor<js::Value>, but we can burn that bridge when we come to it.
Comment 30 Jason Orendorff [:jorendorff] 2010-12-08 18:02:10 PST
Comment on attachment 496267 [details] [diff] [review]
Merged on top of jimb's patch

Phew. It's such a relief not to have to care about the kind of details I was worrying about in comment 3.

>@@ -1366,19 +1369,21 @@ def writeTraceableQuickStub(f, customMet
>         # Call the method.
>         comName = header.methodNativeName(member)
>-        if getBuiltinOrNativeTypeName(member.realtype) == '[jsval]':
>-            argNames.append("&vp.array[0]")
>-        elif not isVoidType(member.realtype):
>+        # If we start allowing [jsval] return types here, we need to
>+        # tack the return value onto the arguments list... and handle
>+        # properly returning it too.  See bug 604198.
>+        assert getBuiltinOrNativeTypeName(member.realtype) is not '[jsval]'
>+        if not isVoidType(member.realtype):
>             argNames.append(outParamForm(resultname, member.realtype))

To me, it seems like that comment and assertion belong in outParamForm, not here.

r=me either way.
Comment 31 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 18:56:09 PST
> To me, it seems like that comment and assertion belong in outParamForm, 

Bah!  There's jsval code there too!  OK, I'll move the comment + assert there.
Comment 32 Boris Zbarsky [:bz] (still a bit busy) 2010-12-08 22:34:24 PST
OK, pushed:

  http://hg.mozilla.org/tracemonkey/rev/929ed9d5d81b
  http://hg.mozilla.org/tracemonkey/rev/00524af0568d

and then several attempts to make the anchor stuff compile on Windows and on both debug and opt and for all types:

  http://hg.mozilla.org/tracemonkey/rev/bac4a8d70492
  http://hg.mozilla.org/tracemonkey/rev/ae03fdb249ad
  http://hg.mozilla.org/tracemonkey/rev/4d97e9955bfb

jimb, I'd really appreciate you looking over |hg diff -r 00524af0568d -r 4d97e9955bfb| to make sure you're ok with those changes.
Comment 34 Jim Blandy :jimb 2010-12-09 18:51:35 PST
(In reply to comment #32)
> jimb, I'd really appreciate you looking over |hg diff -r 00524af0568d -r
> 4d97e9955bfb| to make sure you're ok with those changes.

That looks correct to me. I wonder if there isn't a simpler way to do it, but that could be a separate patch.
Comment 35 Jim Blandy :jimb 2010-12-09 19:31:34 PST
Filed neaten-up bug 618217.

Note You need to log in before you can comment on or make changes to this bug.