Closed Bug 513190 Opened 15 years ago Closed 15 years ago

avoiding jsint tagging of the private slot data

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

v2
86.86 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
With the bug 493457 and 511425 fixed it is no longer necessary to tag the data stored in the private slot as jsint. Instead the value can be stored there directly with direct cast to jsval avoiding setting and clearing the int bit. The GC can just check for that when marking object's slots.


+++ This bug was initially created as a clone of Bug # +++

Currently SpiderMonkey sources use the following ways to access the private data pointer stored in the private slot:

Reading a slot that can be not-yet-initialized:

    JS_GetPrivate(cx, obj);

    jsval v = STOBJ_GET_SLOT(obj, JSSLOT_PRIVATE); 
    if (!JSVAL_IS_VOID(v))
        p = JSVAL_TO_PRIVATE(v);

Reading a previously initialized slot:
   
    OBJ_GET_PRIVATE(cx, obj);

    STOBJ_GET_PRIVATE(cx, obj);

    JSVAL_TO_PRIVATE(OBJ_GET_SLOT(cx, obj, JSSLOT_PRIVATE))

    JSVAL_TO_PRIVATE(obj->fslots[JSSLOT_PRIVATE])

Writing:

    STOBJ_SET_SLOT(obj, JSSLOT_PRIVATE, JSVAL_TO_PRIVATE(data)); 

    obj->fslots[JSSLOT_PRIVATE] = JSVAL_TO_PRIVATE(data);

    JS_SetPrivate(cx, obj, data);

This suggests to consolidate all of these cases using few inline functions covering all the intended usage. 

It would not only improve the readability of the code but also reduce useless code bloat resulting from using function calls like JS_(Get|Set)Property for what is effectively a field access. Another source of the bloat is using OBJ_GET_SLOT(cx, obj, JSSLOT_PRIVATE) that adds a locking/unlocking code for accessing this fixed slot.
Closely related to that, XP connect, DOM and layout all implement their own lock-free variant of JS_SetPrivate. We should use a public API instead. JS_SetPrivateUnlocked or Immutable or something like that.
JS_InitPrivate to imply up-front one time setting? Don't want too long a name or it'll see less use than bad old JS_SetPrivate.

/be
Init is not the problem. GetPrivate is the really issue. It requires a JSContext* that it never uses and DOM/XPConnect/content don't seem to have handy when calling it.
(In reply to comment #3)
> Init is not the problem. GetPrivate is the really issue. It requires a
> JSContext* that it never uses and DOM/XPConnect/content don't seem to have
> handy when calling it.

Right, I was bouncing off your comment 1. How about JS_Private(obj)?

/be
Love #5. I take 3 of those. Brendan?
Comment 5 is already in jsobj.h:JSObject. Do try to keep up :-P.

Problem is that jsobj.h is not public API. If we are just adding cx-free versions of JS_[GS]etPrivate to jsapi.h, then comment 1 and comment 3 suggest some names in keeping with existing API.

If we want a C++ API, I contend we nevertheless do not want a fragile base class or a bunch of private stuff in jsobj.h to be visited upon all and sundry. Cc'ing Luke and Jason.

/be
Just do a context-less overload, #ifdef CPLUSPLUS?
(In reply to comment #8)
> Just do a context-less overload, #ifdef CPLUSPLUS?

BRILLIANT! Also possibly confusing. I'm in favor.

/be
We wanted to hide JS-engine internals anyway, so why not expose a different version of JSObject via JSAPI? I really would like to hide all the engine internals with hard #error guards against including anything but public headers.

namespace JSPrivate {
    class JSObject;
}

namespace JSPublic {
    class JSObject {
        operator JSPrivate::JSObject() {
            union {
                JSPrivate::JSObject* priv;
                JSPublic::JSObject* pub;
            } u;
            u.priv = this;
            return u.pub;
        }

        JSObject* getPrivate() {
            return JS_GetPrivateUnlocked(this);
        }
    }
}

Something like this should work and JSPrivate::JSObject remains completely opaque. Its 2009. Lets use some C++ magic =)
(In reply to comment #9)
> Also possibly confusing. I'm in favor.

Classic!

(This might also point the way towards JS_GetStringBytes(cx, str) overload to solve a problem that I dimly recall Andreas having a bit recently.)
Make it really confusing. Ifdef C++ then use a default parameter with cx = NULL. I think that works.
#error if you don't define the secret password? C'mon, this is all advisory and you can't stop people from hacking right into the most intimate source file. That does not mean we support all such "embedders".

If we want a *real* C++ API, it shouldn't be jammed into jsapi.h.

Shortest path says some additions to jsapi.h could win, but too many hops and we have made a bigger mess. Just one more hop? Heard that before...

/be
Comment 10 is painfully overwrought (at least use an anonymous union :-P).

The problem with embedders is not holding a "unsupported private API usage, use a public API" line, it's having decent public APIs. We exported all .h files in the ancient days to avoid starving those embedders who might be best situated to build better mousetraps without first trying to specify them for us so we could provide the blessed public API. That kind of astronaut architecture waterfall process never works.

OTOH we didn't go out and harvest the real-world "embedding-side APIs" that were built from private and public parts. Not much, anyway (some of the ById and Iterator APIs, and a few others, come to mind).

If we want a *real* C++ API, it shouldn't be jammed into jsapi.h.

/be
Could see

1. Some small-effort/short-path evolution of jsapi.h
1a. C only, because embedders haven't had to swallow C++ yet for this header
1b. C++ with overloads as shaver says, to fix old parameterization bugs.
2. Big effort/win C++ API -- more to say later.
3. Both, with 1 fading away as 2 comes online.

I think 1a is the better short-term course, on reflection. We should definitely newsgroup and wiki up proposals for 2.

/be
I agree, 1a then 2.

Design for 2 should follow GC design decisions.

Applause for the actual change proposed in the description, too.
A simple hack to export inlined version of JSObject::getPrivate in jsapi.cpp without exposing jsobj.h would be something like this:

#define JS_PRIVATE_OFFSET (4 * JS_BYTES_PER_WORD)

static JS_INLINE JS_GetPrivate(JSObject *obj)
{
    asserts;
    return (void *) ((uint8 *) obj + PRIVATE_OFFSET);
}

with a static assert in jsobj.h that validates JS_PRIVATE_OFFSET. This assumes that this bug is fixed.
Assignee: general → igor
Attached patch v1 (obsolete) — Splinter Review
The patch changes the private slot to store the private pointer directly and changes code accordingly. That required to add an extra check during object construction for the JSCLASS_HAS_PRIVATE flag to set JSSLOT_PRIVATE to either JSVAL_VOID or NULL. But at least with GCC 4.2 that initialization code is branch-free.

In the patch I also added JSSLOT_PRIMITIVE_THIS as an alias for JSSLOT_PRIVATE to emphasize that Boolean, Number and String uses that slot to store the corresponding jsval, not the private value.
Attachment #397538 - Flags: review?(jorendorff)
What is the advantage of this patch? Why should we make object creation slower (even if its just a little bit slower)? What is the problem with tagging that field?
(In reply to comment #19)
> What is the advantage of this patch? Why should we make object creation slower
> (even if its just a little bit slower)? What is the problem with tagging that
> field?

The problem is that the initialization of that field and its each read for HAS_PRIVATE classes right now requires a conditional check and bit manipulations. This means that reads like JSObject->JSFunction or JSObject->xpconnect_private or JSObject->ISupport is slower than necessary. 

On the other hand the cost of 3 extra branchless insructions can be avoided for all hot paths like object creation in the same way as it is avoided now in couple of places with explicit passing of the value for JSSLOT_PRIVATE. I will fail that in a separated bug together with a suggestion to skip initializing unused object's slots to void until their scope is grown.
Why does initialization of that field require a conditional check?

The bit manipulation to set and get the field is completely hidden by the API overhead at the moment. Also, masking off that bit is pretty inexpensive. Do we have some profiles showing that getPrivate() is a performance problem at the moment?

I am strongly opposed to adding a single cycle to the object creation path. Its already several times slower than where we want to be.
This shouldn't go in while Andreas is strongly opposed. Apart from that, this patch would get my r+.

Andreas, the place I want to get to is:

  - we can subclass JSObject
  - objects can be different sizes and the GC can cope
  - object fields don't have to be jsvals and the GC can cope
  - we should never go groveling about for a parent or proto except
    in jsapi.cpp--the proto cache in bug 512195 works but isn't the
    final answer, imho
  - whatever we need to cache in Function objects etc. to
    make that happen, let's do it
  - each class we care about should come with an initialization
    function slim enough to call on trace, instead of js_NewObject
    having to be all things to all classes

It seems to me igor's patch gets us a little closer to that eventual goal, and unless it regresses benchmarks we should take it.
Also, object creation at high rate, only to do nothing with the object, is stupid benchmarketing and we should not over-optimize for it.

We certainly can and should separate (and inline where appropriate) code to init an object without private data, compared to one with.

/be
Originally (1996 fall) objects were a pair {map, slots} and the GC blindly scanned all slots. Even then I could have made the GC check JSCLASS_HAS_PRIVATE and not interpret that slot. That would have been a good idea from the get-go, since it saves cycles in the GC and (most importantly) on every DOM get-private. That's the real workload to tune for, not a fake-o bazillion unused new objects benchmark.

/be
Comment on attachment 397538 [details] [diff] [review]
v1

The hacks in DOM and Caps make me queasy.  Is caps really so fast that we can't afford a DSO call there? Is that code even correct anymore, given Block objects?

I guess fixing this API leakage is another bug.

r?gal for further performance review.
Attachment #397538 - Flags: review?(jorendorff)
Attachment #397538 - Flags: review?(gal)
Attachment #397538 - Flags: review+
My only objection is that we are assuming that we are making something faster here, but there is no proof for that. Nobody has actually measured the speedup on every DOM get-private Brendan is referring to. Please measure before you optimize, otherwise we just add complexity and chase ghosts which has no benefit and costs complexity (which we already have more than enough of).

whale:tmp gal$ cat x.cc
unsigned priv = 12;

struct Object {
  unsigned priv;

  unsigned* getPrivate() {
    return (unsigned*)priv;
  }
};

Object o;

int main() {
  o.priv = unsigned(&priv);
  unsigned x = 0;
  for (long l = 0; l < 1000000000; ++l)
    x += *o.getPrivate();
  return 0;
}
whale:tmp gal$ cat x2.cc
unsigned priv = 12;

struct Object {
  unsigned priv;

  unsigned* getPrivate() {
    return (unsigned*)(priv & 0xfffffffe);
  }
};

Object o;

int main() {
  o.priv = unsigned(&priv) | 1;
  unsigned x = 0;
  for (long l = 0; l < 1000000000; ++l)
    x += *o.getPrivate();
  return 0;
}

whale:tmp gal$ time ./x

real	0m4.070s
user	0m4.024s
sys	0m0.013s
whale:tmp gal$ time ./x2

real	0m4.066s
user	0m4.026s
sys	0m0.013s

This benchmark is synthetic. If you don't believe it, I invite you to do better, but the bit masking is clearly completely hidden by the pointer fetch, even in case of a cache hit. Please note that on top of this the GetPrivate() API calls a non-inlineable function from the DOM side, which hides the pointer shuffling even further. That effect isn't even measured here.

I don't think my review is needed for this patch, jorendorff is the expert for that code. I am just pointing out that we are optimizing a non-existing problem with added complexity and added object-creation cost.
Attachment #397538 - Flags: review?(gal)
Heh, my original design was performance-neutral, at least. Andreas is right that by requiring a tag, it avoided a special case for JSCLASS_HAS_PRIVATE. Genius! :-)

But Jason's on the right track. We want to move away from untyped jsval fslots if we can, for cleaner code. If this does not win perf, at least we get cleaner code.

/be
(In reply to comment #26)
> Nobody has actually measured the speedup
> on every DOM get-private Brendan is referring to.

In past quite a few cases of JS_GetPrivate were replaced by a custom inlined version of it precisely because JS_GetPrivate noticeably affected performance. 

>   unsigned* getPrivate() {
>     return (unsigned*)(priv & 0xfffffffe);
>   }

The bit manipulation is not the problem. The issue is an extra check to return NULL when the stored value is JSVAL_VOID. With modified test cases I have with GCC 4.3:

~/s> cat x.cc
volatile unsigned priv = 12;

struct Object {
    long priv;
    Object(volatile unsigned *ptr) : priv(long(ptr)) { }

    volatile unsigned* getPrivate() volatile {
        return (volatile unsigned *) priv;
    }
};

volatile Object o(&priv);

int main() {
    unsigned x = 0;
    for (long l = 0; l < 1000000000; ++l)
        x += *o.getPrivate();
    return x;
}
~/s> cat x2.cc
volatile unsigned priv = 12;

struct Object {
    long priv;
    Object(volatile unsigned *ptr) : priv(long(ptr) | 1L) { }

    volatile unsigned* getPrivate() volatile {
        // JSVAL_VOID is 22
        return priv != 22 ? (volatile unsigned*)(priv & ~1L) : 0;
    }
};

volatile Object o(&priv);

int main() {
    unsigned x = 0;
    for (long l = 0; l < 1000000000; ++l)
        x += *o.getPrivate();
    return x;
}
~/s> time ./x

real	0m2.814s
user	0m2.778s
sys	0m0.002s
~/s> time ./x2

real	0m4.626s
user	0m4.603s
sys	0m0.002s

That is, the if check and bit manipulations slow dowm the test case by 70%.
Attached patch v2Splinter Review
The new version avoids JSVAL_VOID/JSVAL_NULL check for Object construcor called from the trace and unifies JSObject initialization behind the single init method.
Attachment #397538 - Attachment is obsolete: true
Attachment #398424 - Flags: review?(jorendorff)
Attachment #398424 - Attachment mime type: application/octet-stream → text/plain
Attachment #398424 - Attachment is patch: true
Comment on attachment 398424 [details] [diff] [review]
v2

Even better.
Attachment #398424 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/45772700955a
Whiteboard: fixed-in-tracemonkey
Flags: wanted1.9.2?
http://hg.mozilla.org/mozilla-central/rev/45772700955a
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.