Self-host %TypedArray%.prototype.set

ASSIGNED
Assigned to

Status

()

Core
JavaScript: Standard Library
ASSIGNED
3 years ago
7 months ago

People

(Reporter: Waldo, Assigned: Waldo)

Tracking

({leave-open})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

I'm about 70% done with this, at least as far as writing the code goes.  I haven't yet done any work to keep it performant through different code paths for different target/source element type combinations, tho -- that could be trouble, and somewhat uncertain cost in time to fix.
Created attachment 8575340 [details] [diff] [review]
Partial rough-hewn progress
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
Created attachment 8576259 [details] [diff] [review]
Patch

This is not entirely perf-tested, so it may or may not be 100% ready to go.  Not sure.  In any event, if perf-testing suggests it's not ready yet, there is a one-line change to revert to using the current C++ implementation, so it seems to make a lot of sense to land what we have now, so that tuning patches are simpler to review.  (Not to mention, this eases schedule pressure for this by giving more time to finish it.)

I haven't try-servered this, but all the jit-tests/jstests have passed on it at various times, so I'm reasonably confident in its not being horribly wrong.  ^-^
Attachment #8576259 - Flags: review?(till)
Attachment #8575340 - Attachment is obsolete: true
Comment on attachment 8576259 [details] [diff] [review]
Patch

Review of attachment 8576259 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SelfHosting.cpp
@@ +824,5 @@
> +        //
> +        // Yeah, yeah, it's pretty unlikely.  Are you willing to stake a
> +        // sec-critical bug on that assessment, now and forever, against
> +        // all changes those pesky GC and JIT people might make?
> +        JS_ReportError(cx, "dude u just got hueyfixed lol");

Yeah, yeah, I probably should change this before landing.  :-)
Comment on attachment 8576259 [details] [diff] [review]
Patch

Review of attachment 8576259 [details] [diff] [review]:
-----------------------------------------------------------------

My sincerest apologies for the delay :(

This all mostly looks good, if terrible for intrinsic (no pun intended) reasons.

I am, however, not convinced that self-hosting parts of SetFromTypedArray is the right decision. ISTM that the JS parts are really the ones that are easiest to reason about in C++, too. That alone is not an argument against self-hosting them, of course. But AFAICS, if we didn't self-host them, we'd not need the inlining of SetDisjointTypedElements at all, so the amount of complexity we save inside SetFromTypedArray we more than spend in the JIT, which seems altogether more hairy and harder to reason about. I'm probably overlooking something substantial here, though.

In any case, if you still think that this tradeoff is worth it (or there are reasons for making it that I overlooked), please ask a JIT peer for review of the inlining parts. I think I understand them well enough to review them, but I'm not sure I trust my judgment on that question.

::: js/src/builtin/TypedArray.js
@@ +663,5 @@
> +    if (buffer === null)
> +        return false;
> +
> +    assert(IsArrayBuffer(buffer),
> +           "non-ArrayBuffer passed to IsDetachedBuffer");

Nit: no \n needed.

@@ +666,5 @@
> +    assert(IsArrayBuffer(buffer),
> +           "non-ArrayBuffer passed to IsDetachedBuffer");
> +
> +    var flags =
> +        UnsafeGetInt32FromReservedSlot(buffer, JS_ARRAYBUFFER_FLAGS_SLOT);

Same here.

@@ +673,5 @@
> +
> +// ES6 draft 20150220 22.2.3.22.1 %TypedArray%.prototype.set(array [, offset])
> +function SetFromNonTypedArray(target, array, targetOffset, targetLength, targetBuffer) {
> +    assert(!ValueIsTypedArray(array),
> +           "typed arrays must be passed to SetFromTypedArray");

And here. (Though if you want to keep all asserts formatted like this, that's fine.)

@@ +725,5 @@
> +           "only typed arrays may be passed to this method");
> +
> +    // Steps 12-24.
> +    var res =
> +        SetFromTypedArrayApproach(target, typedArray, targetOffset,

Nit: only wrap the arguments.

::: js/src/vm/SelfHosting.cpp
@@ +682,4 @@
>  }
>  
>  bool
> +js::intrinsic_ValueIsTypedArray(JSContext *cx, unsigned argc, Value *vp)

Theses two names should have the same naming scheme. I.e, either rename this to `intrinsic_IsTypedArray`, or the above to `intrinsic_ValueIsArrayBuffer`.

@@ +814,5 @@
> +    // try to maintain discipline.
> +
> +    // An unwrapped pointer to an object on the other side of a compartment
> +    // boundary!  Isn't this such fun?
> +    JSObject *unwrapped = CheckedUnwrap(obj);

Hey, at least you're using the checked version :)

@@ +824,5 @@
> +        //
> +        // Yeah, yeah, it's pretty unlikely.  Are you willing to stake a
> +        // sec-critical bug on that assessment, now and forever, against
> +        // all changes those pesky GC and JIT people might make?
> +        JS_ReportError(cx, "dude u just got hueyfixed lol");

Probably :) Plus perhaps explain why a GC could cause this check to fail. It's not entirely obvious if you don't know about CCWs and all that.

@@ +831,5 @@
> +
> +    // Be super-duper careful using this, as we've just punched through
> +    // the compartment boundary, and things like buffer() on this aren't
> +    // same-compartment with anything else in the calling method.
> +    return &unwrapped->as<TypedArrayObject>();

Perhaps have bholley take a look at this? If only so he doesn't freak out for a moment when he sees it later on.

@@ +835,5 @@
> +    return &unwrapped->as<TypedArrayObject>();
> +}
> +
> +bool
> +js::intrinsic_SetFromTypedArrayApproach(JSContext *cx, unsigned argc, Value *vp)

Add a comment referring to the section and version of the spec this is implementing parts of.

@@ +846,5 @@
> +               "something should have defended against a neutered target");
> +
> +    // In case you missed it, we're doing something TOTALLY UNSAFE here.  Sigil
> +    // the variable name, and all variables derived from it, all to pieces to
> +    // try to maintain discipline.

While I'm fond of the comment, perhaps instead of repeating it here verbatim add a pointer to the instance in DangerouslyUnwrapTypedArray?

@@ +857,5 @@
> +    MOZ_ASSERT(doubleTargetOffset >= 0, "caller failed to ensure |targetOffset >= 0|");
> +
> +    uint32_t targetLength = uint32_t(args[3].toInt32());
> +
> +    // Handle all checks preliminary to performing the actual element-setting.

uber-nit: s/preliminary/prior/, perhaps?

@@ +1068,5 @@
> +// object not necessarily in the current compartment, or in the same
> +// compartment as |target|.  WE ARE DOING SOMETHING TOTALLY UNSAFE HERE.
> +// We sigil the argument name, and all variables derived from it, all to pieces
> +// to try to maintain discipline.  If discipline ever fails to be maintained,
> +// this method's gonna have a bad time.

Again, I think it'd be best to have a careful explanation of the dangers in all this attached to DangerouslyUnwrapTypedArray, with all places that (indirectly) use it pointing to that explanation.

@@ +1116,5 @@
> +    MOZ_ASSERT(args.length() == 3);
> +
> +    Rooted<TypedArrayObject*> target(cx, &args[0].toObject().as<TypedArrayObject>());
> +    MOZ_ASSERT(!target->hasBuffer() || !target->buffer()->isNeutered(),
> +               "shouldn't be setting elements if neutered");

s/shouldn't/mustn't/

@@ +1122,5 @@
> +    uint32_t targetOffset = uint32_t(args[1].toInt32());
> +
> +    // In case you missed it, we're doing something TOTALLY UNSAFE here.  Sigil
> +    // the variable name, and all variables derived from it, all to pieces to
> +    // try to maintain discipline.

Same comment as earlier.

@@ +1132,5 @@
> +    // Smarter algorithms exist to perform overlapping transfers of the sort
> +    // this method performs (for example, v8's self-hosted implementation).
> +    // But it seems likely deliberate overlapping transfers are rare enough
> +    // that it's not worth the trouble to implement one (and worry about its
> +    // safety/correctness!).  Make a copy and do a disjoint set from that.

My hunch is that we'll need to optimize for cases like this in time. But premature optimization and all that, so I agree we shouldn't do it now.
Attachment #8576259 - Flags: review?(till)
(In reply to Till Schneidereit [:till] from comment #4)
> I am, however, not convinced that self-hosting parts of SetFromTypedArray is
> the right decision. ISTM that the JS parts are really the ones that are
> easiest to reason about in C++, too.

It's reasonably fair.  Part of the reason I did it this way is that I really would like to reach a point where we can self-host other nested parts within this.  The disjoint-copy code *should* be self-hostable.  The only reason it's not, is that doing so without the ability to have custom-JITted versions for every From/To type pair isn't practical.  With things laid out this way, it's trivial to self-host that part as a platform for JIT improvements to inline polymorphic loops and similar.

> AFAICS, if we didn't self-host them, we'd
> not need the inlining of SetDisjointTypedElements at all, so the amount of
> complexity we save inside SetFromTypedArray we more than spend in the JIT,
> which seems altogether more hairy and harder to reason about.

The JIT bits for SetDisjointTypedElements, at least, are pretty minimal, as it's just making a slightly-more-optimal function call.  That extra work, I'm not worried about at all, as the pattern for that is used in other places, and offers little room for getting things wrong without doing so catastrophically.

> > +    assert(IsArrayBuffer(buffer),
> > +           "non-ArrayBuffer passed to IsDetachedBuffer");
> 
> Nit: no \n needed.

I was somewhat inconsistent about these because I view assertion messages as akin to comments -- and subject to our reduced column limit for comments.  Not to mention, having the message on a separate line distinguishes the condition better, making the whole thing slightly more readable.

> > +    var flags =
> > +        UnsafeGetInt32FromReservedSlot(buffer, JS_ARRAYBUFFER_FLAGS_SLOT);
> 
> Same here.

This bumps to 80ch exactly, but I guess it's code, so that doesn't apply.  Changed.

> ::: js/src/vm/SelfHosting.cpp
> >  bool
> > +js::intrinsic_ValueIsTypedArray(JSContext *cx, unsigned argc, Value *vp)
> 
> Theses two names should have the same naming scheme. I.e, either rename this
> to `intrinsic_IsTypedArray`, or the above to `intrinsic_ValueIsArrayBuffer`.

There are unfortunately subtly different semantics for all of these.  There's already an IsTypedArray, which does (value is object and obj->is<TypedArrayObject>()), i.e. excluding wrapped typed arrays.  This method is necessary because the set method forks behavior on the provided source object -- and wrapped typed arrays and typed arrays must be treated identically.  There's no way to bootstrap ValueIsTypedArray from IsTypedArray.  So two methods must exist to do almost the same thing, named differently.

IsArrayBuffer excludes wrapped typed arrays, so it should be named the way IsTypedArray is.  But, it's only used for an assertion, so no need to inline it or anything -- removed that code from MCallOptimize.cpp.

> @@ +824,5 @@
> > +        //
> > +        // Yeah, yeah, it's pretty unlikely.  Are you willing to stake a
> > +        // sec-critical bug on that assessment, now and forever, against
> > +        // all changes those pesky GC and JIT people might make?
> > +        JS_ReportError(cx, "dude u just got hueyfixed lol");
> 
> Probably :) Plus perhaps explain why a GC could cause this check to fail.
> It's not entirely obvious if you don't know about CCWs and all that.

Updated:

// By *appearances* this can't happen, as self-hosted TypedArraySet
// checked this.  But.  Who's to say a GC couldn't happen between
// the check that this value was a typed array, and this extraction
// occurring?  A GC might turn a cross-compartment wrapper |obj| into a
// dead object no longer connected its typed array.
//
// Yeah, yeah, it's pretty unlikely.  Are you willing to stake a
// sec-critical bug on that assessment, now and forever, against
// all changes those pesky GC and JIT people might make?

> @@ +846,5 @@
> > +               "something should have defended against a neutered target");
> > +
> > +    // In case you missed it, we're doing something TOTALLY UNSAFE here.  Sigil
> > +    // the variable name, and all variables derived from it, all to pieces to
> > +    // try to maintain discipline.
> 
> While I'm fond of the comment, perhaps instead of repeating it here verbatim
> add a pointer to the instance in DangerouslyUnwrapTypedArray?

Fair enough.  Moved the sigil/warnings bits next to DangerouslyUnwrapTypedArray, changed this comment to note sigiling as recommended by that method's comments.

> > +    // Handle all checks preliminary to performing the actual element-setting.
> 
> uber-nit: s/preliminary/prior/, perhaps?

A good idea, but I went one more and did s/preliminary to performing/preceding/.  :-)

> @@ +1116,5 @@
> > +    MOZ_ASSERT(args.length() == 3);
> > +
> > +    Rooted<TypedArrayObject*> target(cx, &args[0].toObject().as<TypedArrayObject>());
> > +    MOZ_ASSERT(!target->hasBuffer() || !target->buffer()->isNeutered(),
> > +               "shouldn't be setting elements if neutered");
> 
> s/shouldn't/mustn't/

Yeah, I get where you're coming from there.  But, "mustn't be" is slightly stilted English, or overly formal, or something.  Went with "a neutered typed array has no elements to set, so it's nonsensical to be setting them".

> @@ +1132,5 @@
> > +    // Smarter algorithms exist to perform overlapping transfers of the sort
> > +    // this method performs (for example, v8's self-hosted implementation).
> > +    // But it seems likely deliberate overlapping transfers are rare enough
> > +    // that it's not worth the trouble to implement one (and worry about its
> > +    // safety/correctness!).  Make a copy and do a disjoint set from that.
> 
> My hunch is that we'll need to optimize for cases like this in time. But
> premature optimization and all that, so I agree we shouldn't do it now.

Worth noting: this is what we currently do (and for *any* setting from/into a single ArrayBuffer, even including the same-element-type case).  This just preserves status quo; it's not a regression.
Created attachment 8592393 [details] [diff] [review]
Land code to self-host %TypedArray%.prototype.set, but don't enable it yet, pending perf-testing

Take two, with particular emphasis on JIT changes.  Also with the actual self-hosting disabled for now: best to land this fully, then enable it when I've done perf-testing I haven't completed yet.

A try run with self-hosting enabled, to hopefully demonstrate this code, while not used in the patch because of disabling, is nonetheless correct:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b48ffd8f0cf
Attachment #8592393 - Flags: review?(jdemooij)
Attachment #8576259 - Attachment is obsolete: true
And a retry with a spurious "template" removed, since it seems like everything gccish would otherwise fail:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=20b54ee62e9a
Blocks: 1154480
Comment on attachment 8592393 [details] [diff] [review]
Land code to self-host %TypedArray%.prototype.set, but don't enable it yet, pending perf-testing

Review of attachment 8592393 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, sorry for the delay.

::: js/src/builtin/TypedArray.js
@@ +769,5 @@
> +
> +    // Steps 6-8, either algorithm.
> +    var targetOffset = ToInteger(offset);
> +    if (targetOffset < 0)
> +        ThrowError(JSMSG_TYPED_ARRAY_BAD_INDEX, "2");

(As mentioned on IRC, ThrowError is gone.)

::: js/src/builtin/TypedObjectConstants.h
@@ +20,5 @@
> +// Slots for objects using the typed array layout
> +
> +#define JS_TYPEDARRAYLAYOUT_BUFFER_SLOT      0
> +#define JS_TYPEDARRAYLAYOUT_LENGTH_SLOT      1
> +#define JS_TYPEDARRAYLAYOUT_BYTEOFFSET_SLOT  2

To be pedantic, any reason you aligned the numbers here but not the JS_SETTYPEDARRAY_* values above?

::: js/src/jit/CodeGenerator.cpp
@@ +4999,5 @@
> +    Register source = ToRegister(lir->source());
> +
> +    Register temp = ToRegister(lir->temp());
> +
> +    saveVolatile();

If your LIR instruction always makes a call, it's more efficient to mark it as a call instruction. This lets the register allocator do the spilling; it can do a better job because it knows which registers are live (on x86/x64 saveVolatile/restoreVolatile will move a ton of xmm registers).

To do that, make LSetDisjointTypedElements inherit from LCallInstructionHelper instead of LInstructionHelper and remove the saveVolatile/restoreVolatile.

(LMathFunctionD and LRandom are similar callWithABI instructions.)

::: js/src/jit/IonBuilder.h
@@ +803,5 @@
>                                                 MIRType knownValueType);
>  
>      // TypedArray intrinsics.
>      InliningStatus inlineIsTypedArray(CallInfo& callInfo);
> +    InliningStatus inlineValueIsTypedArray(CallInfo& callInfo);

This function is not used/defined anywhere.

::: js/src/jit/MCallOptimize.cpp
@@ +2303,5 @@
> +    // necessarily, because of wrappers.)
> +
> +    MDefinition* arrays[] = { target, sourceTypedArray };
> +
> +    for (MDefinition* def : arrays) {

\o/

@@ +2317,5 @@
> +    }
> +
> +    auto sets = MSetDisjointTypedElements::New(alloc(), target, targetOffset, sourceTypedArray);
> +    current->add(sets);
> +    current->push(sets);

It's a bit strange to push a MIRType_None value on the stack. Although it probably doesn't matter here because we always pop it immediately, I'd prefer a pushConstant(UndefinedValue());

@@ +2319,5 @@
> +    auto sets = MSetDisjointTypedElements::New(alloc(), target, targetOffset, sourceTypedArray);
> +    current->add(sets);
> +    current->push(sets);
> +
> +    callInfo.setImplicitlyUsedUnchecked();

It's an effectful instruction, so to make sure we resume after it when we bailout later:

if (!resumeAfter(sets))
    return InliningStatus_Error;

I thought we had assertions for this... Maybe LIRGenerator::visitInstruction can do:

MOZ_ASSERT_IF(ins->isEffectful(), ins->resumePoint());

But there are probably some exceptions :( Maybe we can whitelist those; it seems better than nothing. Follow-up bug?

::: js/src/vm/TypedArrayObject.cpp
@@ +783,5 @@
>  
>  /* static */ const JSFunctionSpec
>  TypedArrayObject::protoFunctions[] = {
>      JS_SELF_HOSTED_FN("subarray", "TypedArraySubarray", 2, 0),
> +    JS_SELF_HOSTED_FN("set", "TypedArraySet", 2, 0),

Is this correct, considering you wanted to land it with the self-hosted implementation disabled?
Attachment #8592393 - Flags: review?(jdemooij) → review+

Comment 9

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d2b8afeb3b4
Super-preliminary testing suggests the self-hosted version is slower for non-overlapping, same-type cases.  Still investigating the other cases.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/9d2b8afeb3b4
...oh wait.  I forgot to do poking as to one part of this patch:

(In reply to Till Schneidereit [:till] from comment #4)
> @@ +831,5 @@
> > +
> > +    // Be super-duper careful using this, as we've just punched through
> > +    // the compartment boundary, and things like buffer() on this aren't
> > +    // same-compartment with anything else in the calling method.
> > +    return &unwrapped->as<TypedArrayObject>();
> 
> Perhaps have bholley take a look at this? If only so he doesn't freak out
> for a moment when he sees it later on.

bholley, probably worth your time to take a brief look at js/src/vm/SelfHosting.cpp's DangerouslyUnwrapTypedArray code, so you're aware of what it's doing.

Basically, the typed array set() method accepts a typed array, then does exactly the same thing whether that typed array is same-compartment or not.  So we have to unwrap in the cross-compartment case.

But per spec, the is-typed-array check (#1) occurs before arbitrary code execution (#2), itself before the unwrapped typed array is used (#3).  #2 theoretically could turn what's (initially) a wrapped typed array into a dead object by #3, if the wrapped typed array isn't otherwise kept alive.  And we want #1 in self-hosted code, so we can't store the unwrapped pointer in a Rooted to ensure it stays alive through #2.

So we have this super-crazy method that we call when we *actually* need the underlying typed array, called DangerouslyUnwrapTypedArray.  (The JITs also have a case where, if TI knows that the argument is a non-wrapped typed array, it'll not go through that method.  Just to mention, for completeness.) The return value might or might not be from another compartment, and everything we derive from it we prefix with "unsafe" and suffix with "CrossCompartment" and use super-duper-carefully.

Anyway.  Be sad, be very sad, but I see no way around this.  This seems the safest possible way to do a very hairy thing.
Flags: needinfo?(bobbyholley)
I feel like I'm missing something here. The above seems to be explaining why we need to check a second time that the object unwraps to a typed array, even though JS theoretically checked it. That seems fine.

But why do we need to manipulate cross-compartment pointers? Can't we just enter the compartment of the thing we get from DangerouslyUnwrapTypedArray, making this equivalent to the very-common pattern of:

RootedObject unwrapped(cx, CheckedUnwrap(obj));
if (!unwrapped) {
  JS_ReportError(cx, JSMSG_UNWRAP_DENIED);
  return false;
}
JSAutoCompartment ac(cx, unwrapped);
Flags: needinfo?(bobbyholley) → needinfo?(jwalden+bmo)
Basically, thus far it sounds like a relatively common pattern, one that isn't the greatest thing in the world, but not as remarkable as this case seems to imply itself to be. So I'm concerned that I'm missing a key point.
Updated thoughts on the current code and compartment goo are in bug 1140152 comment 10.  To make a long story short, I think there are reasons to do things the way the current code does, that can't be expressed in terms of JSAutoCompartment.  Specifically, if memory serves, writing out the self-hosting intrinsic's behavior in a way that it can be reused in the normal code path, rather than being double-implemented with the correctness/safety scares that entails.  I don't believe that can be done with JSAutoCompartment, and so I think, pending some perf-testing, the existing self-hosting code can be enabled without special delay.

That said, I'm on vacation for awhile starting...er, actually yesterday, so I can't do the deed (or review it if someone else does it) myself, unless this waits (yet another :-\ ) four months.
Flags: needinfo?(jwalden+bmo)
You need to log in before you can comment on or make changes to this bug.