Closed Bug 730873 Opened 12 years ago Closed 12 years ago

(memmove) ArrayBufferView.prototype.move

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: dherman, Assigned: sfink)

References

Details

(Keywords: dev-doc-complete, Whiteboard: [DocArea=JS])

Attachments

(2 files, 2 obsolete files)

We are considering adding a copyRange method to ArrayBufferViews for doing efficient memmoves. We would like to see how much more efficient a built-in method would be than implementing it by hand in JS.

First cut at the intended behavior: calling

    a.move(start, end, dest)

should be equivalent to:

    start = start >>> 0; // ToUint32
    end = end >>> 0;     // ToUint32
    dest = dest >>> 0;   // ToUint32
    this.set(new this.constructor(this.buffer.slice(start * this.BYTES_PER_ELEMENT,
                                                    end * this.BYTES_PER_ELEMENT)),
             dest);

That is, copy the *initial values* of the elements from [start, end) to dest, even if the destination range overlaps with the source range. More efficiently, this might be done something like:

    start = start >>> 0; // ToUint32
    end = end >>> 0;     // ToUint32
    dest = dest >>> 0;   // ToUint32
    let length = end - start, destEnd = dest + length;
    if (dest >= start && dest < end) {
        for (let i = length - 1; i >= 0; i--) {
            this[start + i] = this[dest + i];
        }
    } else if (destEnd >= start && destEnd < end) {
        for (let i = 0; i < length; i++) {
            this[start + i] = this[dest + i];
        }
    } else {
        memcpy(this.buffer.[[InternalBuffer]] + dest * this.BYTES_PER_ELEMENT,
               this.buffer.[[InternalBuffer]] + start * this.BYTES_PER_ELEMENT,
               length * this.BYTES_PER_ELEMENT);
    }

IOW, if the destination overlaps the source to the right, copy right-to-left; if it overlaps to the left, copy left-to-right; and if there's no overlap just do a memcpy. (I'm not sure if there are ways to still do chunked copying even when there's overlap?)

Dave
Oops, I wrote `copyRange` and then `move`. Not sure what the best name for the method is.

Dave
> I'm not sure if there are ways to still do chunked copying even when there's overlap?

Like "memmove"?  Or are you looking for something else?
> Like "memmove"?  Or are you looking for something else?

Oh, I guess since there's guaranteed not to be any need for conversion, memmove should just work out of the box. So all that **** above is unnecessary.

Dave
(In reply to Dave Herman [:dherman] from comment #1)
> Oops, I wrote `copyRange` and then `move`. Not sure what the best name for
> the method is.

Well, in the version I started working on over the weekend, I called the internal function "moveRange" :)

The question that came up is when I can optimize it to memmove(). This will reveal my limited grasp of Javascript yet again, but in what situations is it invalid to simply move the memory around? eg if there's a getter for all indexed elements, or a setter, or element #3723 has been set non-writable, or ...?

Oh. In experimenting, Spidermonkey tells me that typedarrays are not extensible. So can I *always* use memmove?
(and where in the spec does it say that? I searched for "extensible" in http://www.khronos.org/registry/typedarray/specs/latest/ and didn't find it.)
(In reply to Steve Fink [:sfink] from comment #4)
> Oh. In experimenting, Spidermonkey tells me that typedarrays are not
> extensible. So can I *always* use memmove?
> (and where in the spec does it say that? I searched for "extensible" in
> http://www.khronos.org/registry/typedarray/specs/latest/ and didn't find it.)

Hm, this is a great question. SpiderMonkey and V8 currently disagree on

    Object.isExtensible(new Uint8Array(8))

The typed arrays spec does not, AFAIK, address this. WebIDL doesn't even seem to have much support for talking about object extensibility.

If it's important for optimization, it's acceptable to me for typed arrays to be non-extensible.

Dave
I would suggest calling the method 'copy' or 'copyRange', since logically we are copying here (the original data is left there, not moved somewhere else). 'move' only comes from the libc function memmove I think, which is only called that because memcpy was already a taken name.
I made typed arrays and array buffers non-extensible just because it kind of matched the semantics for them that we'd already implemented, and it seemed like less of a lie to be consistent.  Once those puppies can have named properties defined on them, it might be worth marking them as extensible.  But in any case, that choice wasn't a super-rationalized/reasoned decision.
(In reply to Jeff Walden (:Waldo) (remove +bmo to email) from comment #8)
> I made typed arrays and array buffers non-extensible just because it kind of
> matched the semantics for them that we'd already implemented, and it seemed
> like less of a lie to be consistent.  Once those puppies can have named
> properties defined on them, it might be worth marking them as extensible. 
> But in any case, that choice wasn't a super-rationalized/reasoned decision.

So it seems like it would be good for the typed array spec to disallow doing any problematic weird stuff. The semantics of copyRange or whatever this is to be called are pretty murky when a getter in the middle of the range mutates the array (or its ArrayBuffer, eg neutering it halfway through the operation). Allowing getters on indexed elements at all seems like it defeats much of the point of typed arrays.
(In reply to Steve Fink [:sfink] from comment #9)
> (In reply to Jeff Walden (:Waldo) (remove +bmo to email) from comment #8)
> > I made typed arrays and array buffers non-extensible just because it kind of
> > matched the semantics for them that we'd already implemented, and it seemed
> > like less of a lie to be consistent.  Once those puppies can have named
> > properties defined on them, it might be worth marking them as extensible. 
> > But in any case, that choice wasn't a super-rationalized/reasoned decision.
> 
> So it seems like it would be good for the typed array spec to disallow doing
> any problematic weird stuff. The semantics of copyRange or whatever this is
> to be called are pretty murky when a getter in the middle of the range
> mutates the array (or its ArrayBuffer, eg neutering it halfway through the
> operation). Allowing getters on indexed elements at all seems like it
> defeats much of the point of typed arrays.

Changes to the typed array spec would be welcome to avoid nasty corner cases like you describe. The goal of this bug and bug 730880 are to prototype a couple of additions to the spec to see if they provide significant speedups for use cases like Emscripten. Please assume for the time being that the proposed entry points can be implemented trivially; this is the intent.
Not ready for review yet, and applied on top of DataView and Transferable patches, but more or less functioning.
No reason to hold this up on the transferable stuff.
Attachment #630321 - Flags: review?(jwalden+bmo)
Attachment #601054 - Attachment is obsolete: true
Comment on attachment 630321 [details] [diff] [review]
Implement ArrayBufferView.prototype.move

Review of attachment 630321 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jstypedarray.cpp
@@ +768,5 @@
>  
>  bool
> +TypedArray::memmoveData(JSContext *cx, JSObject *tarray, uint32_t byteDest, uint32_t byteSrc, uint32_t byteSize)
> +{
> +    JS_ASSERT(tarray);

Make the argument a JSObject& to syntactically indicate non-nullness.

@@ +776,5 @@
> +
> +    JS_ASSERT(byteDest <= getByteLength(tarray));
> +    JS_ASSERT(byteSrc <= getByteLength(tarray));
> +    JS_ASSERT(byteDest + byteSize <= getByteLength(tarray));
> +    JS_ASSERT(byteSrc + byteSize <= getByteLength(tarray));

Also assert that |byteDest + byteSize >= byteDest| and similar for src, too, to detect overflows -- logically not possible because typed arrays are INT32_MAX long at most, but it requires some implementation knowledge to verify.  (See also below.)

@@ +781,5 @@
> +
> +    uint8_t *data = static_cast<uint8_t*>(getDataOffset(tarray));
> +    memmove(&data[byteDest], &data[byteSrc], byteSize);
> +
> +    return true;

This should be a void method, as it always succeeds.

@@ +1620,5 @@
> +            return ok;
> +
> +        JSObject *tarray = getTypedArray(obj);
> +        if (!tarray)
> +            return JS_TRUE;

How can this ever be the case?  You've got a fastClass() object by this point.

@@ +1624,5 @@
> +            return JS_TRUE;
> +
> +        if (args.length() < 3) {
> +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS);
> +            return JS_FALSE;

false/true throughout.

@@ +1629,5 @@
> +        }
> +
> +        int32_t srcBegin;
> +        int32_t srcEnd;
> +        int32_t dest;

Make these all uint32_t, and change ToClampedIndex to take in/output uint32_t.  I don't see anything that makes it tricky to do this -- every user uses the output int32_t in a uint32_t context, and every user casts a uint32_t input length to int32_t to satisfy typing.

@@ +1631,5 @@
> +        int32_t srcBegin;
> +        int32_t srcEnd;
> +        int32_t dest;
> +
> +        int32_t length = int32_t(getLength(tarray));

...like this one too.  :-)

@@ +1635,5 @@
> +        int32_t length = int32_t(getLength(tarray));
> +        if (!ToClampedIndex(cx, args[0], length, &srcBegin)
> +            || !ToClampedIndex(cx, args[1], length, &srcEnd)
> +            || !ToClampedIndex(cx, args[2], length, &dest)
> +            || srcBegin > srcEnd)

|| at end of lines.

@@ +1643,5 @@
> +        }
> +
> +        int32_t nelts = srcEnd - srcBegin;
> +
> +        if (dest + nelts > length) {

Assert |dest + nelts >= dest|, because it takes a little thinking (and knowledge that typed array lengths in our implementation are never greater than INT32_MAX) to verify correctness.

@@ +1648,5 @@
> +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS);
> +            return JS_FALSE;
> +        }
> +
> +        if (!memmoveData(cx, tarray, dest * sizeof(NativeType), srcBegin * sizeof(NativeType), nelts * sizeof(NativeType)))

So, memmoveData is a method at the level of typed arrays.  It's not a method at the level of array buffers.  For that reason, perhaps these parameters shouldn't be sizeof(NativeType)-multiplied, and that should happen in memmoveData?  Seems more consistent with all the other methods for indexes and such to be in element-sized values, not byte-sized values.

@@ +1650,5 @@
> +        }
> +
> +        if (!memmoveData(cx, tarray, dest * sizeof(NativeType), srcBegin * sizeof(NativeType), nelts * sizeof(NativeType)))
> +            return JS_FALSE;
> +        args.rval().setNull();

setUndefined() surely?
Attachment #630321 - Flags: review?(jwalden+bmo) → review+
I ended up touching a high enough percentage of the code that I'd like a re-review. (See, it doesn't matter if you give me r+ or r-. I'll pester you again anyway.)

The tests were hopelessly wrong and minimal besides, so I fixed them and beefed them up a bunch.

When I attempted to convert TypedArray::memmoveData or whatever it was to talk in terms of elements instead of bytes, the multiplications by sizeof(NativeType) got sufficiently verbose that I ended up manually inlining the darn thing into its caller.
Attachment #632125 - Flags: review?(jwalden+bmo)
Attachment #630321 - Attachment is obsolete: true
Comment on attachment 632125 [details] [diff] [review]
Implement ArrayBufferView.prototype.move

Review of attachment 632125 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jstypedarray.cpp
@@ +78,5 @@
>  
>  /*
>   * Convert |v| to an array index for an array of length |length| per
>   * the Typed Array Specification section 7.0, |subarray|. If successful,
>   * the output value is in the range [0, length).

Nuh-uh!  It's in the range [0, length].

@@ +81,5 @@
>   * the Typed Array Specification section 7.0, |subarray|. If successful,
>   * the output value is in the range [0, length).
>   */
>  static bool
> +ToClampedIndex(JSContext *cx, const Value &v, uint32_t length, uint32_t *out)

I noted on IRC that there's an issue with this algorithm, in implementations which don't limit typed array/arraybuffer/etc. lengths to less than 2**31 (note: we are not such an implementation): you can't directly refer to indexes which are above 2**31 - 1, only by negative indexes.  In other words, this produces a slice containing the second-to-last element:

  var ta = new Uint8Array(Math.pow(2, 31) + 1);
  ta.slice(-2, -1);

but this doesn't:

  var ta = new Uint8Array(Math.pow(2, 31) + 1);
  ta.slice(Math.pow(2, 31) - 1, Math.pow(2, 31));

(Modulo any off-by-one errors there.)

dherman says he's taking this to the khronos list -- just noting it here in passing, so others are aware of it.

@@ +1622,5 @@
> +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS);
> +            return false;
> +        }
> +
> +        int32_t nelts = srcEnd - srcBegin;

Make this a uint32_t, please.

@@ +1628,5 @@
> +        if (dest + nelts > length) {
> +            JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS);
> +            return false;
> +        }
> +        JS_ASSERT(dest + nelts >= dest);

I'd prefer the assert before the if, so that when you hit the if reading code you know the assert-condition's true (assuming no bugs).
Attachment #632125 - Flags: review?(jwalden+bmo) → review+
Second attempt:

https://hg.mozilla.org/integration/mozilla-inbound/rev/fb6b64407b31
Assignee: general → sphink
https://hg.mozilla.org/mozilla-central/rev/fb6b64407b31
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla16
This seems experimental, but still dev-doc-needed ;)
Keywords: dev-doc-needed
Great to see this progress. Two questions. First, are there any performance measurements of speedups of Emscripten compiled code using this new primitive? Second, is it wise to expose this method un-prefixed? I would have thought it would be exposed under "mozMove" or similar until it is actually in the specification.
This can produce nice speedups. Raw memcpy/memmove becomes 6-7x faster (measured on mozilla-central now), so how much it speeds up compiled code depends on how much that code depends on memcpy/memmove. Not a big issue in most codebases probably, but in some cases it would be significant.

Not sure about the prefixing policy, now that you mention it I'm curious about that too.
Alon, can you do some measurements on representative Emscripten compiled code (maybe ammo.js, box2d.js, others?) and see what the speedup is in practice? Per our last discussion, the reason this was added was to see whether it speeds up real world scenarios, and is therefore worth adding to the typed array spec. Thanks.
We've already seen results of 6x - 7x speedups in benchmarks, and it's a simple, well-understood API with just a single method. This is a clear win; I see no downside to adding this to the spec.

Dave
Where are these benchmarks? Can they be run publicly?

Have you thought through all of the consequences of adding this method to ArrayBufferView rather than for example ArrayBuffer? For example, because of Web IDL semantics, overloads taking arrays do not behave intuitively -- I am not sure whether overloads taking ArrayBufferView will. See https://www.khronos.org/webgl/public-mailing-list/archives/1204/msg00223.html and http://www.khronos.org/webgl/public-mailing-list/archives/1204/threads.html#00026 . Adding it to ArrayBufferView may end up being the correct decision, but it isn't clear this has been considered.

Please add the "moz" prefix to this method until this method has been added to the typed array specification.
Depends on: 786386
Attached file little benchmark
Here's an example little benchmark. The last three lines in the inner loop determine if it uses manual move or move() or set(). Comparing the two on IonMonkey, I get a 2.33x speedup with move() compared to manual move (set() is slightly slower still). This might be lower than previous tests because IonMonkey does better on the manual move than JM+TI.

This is an artificial benchmark, but it does show the raw operation is faster. I am not currently aware of a real-world codebase that stresses memcpy or memmove enough for this to be noticeable, looks like typically optimized code avoids copies in various ways. Still, I think it's very good to be able to be sure to not become slower when someone does actually compile such a project.
When it comes to justifying the performance gains from offering high-performance memset/memmove intrinsics, this paper may be useful:

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40679.pdf

Google ran an experiment where they instrumented uses of memcpy/memmove (and i think strcpy) across their codebase to identify cases where they could use a slightly-faster variant of the copy (basically, bypassing the CPU cache). Their instrumentation showed that many of their real applications spent 3-5% of CPU time in copy operations, and some spent as much as 30%. They built on this to show that even a comparatively small improvement (from better use of cache) translated into real performance gains for those applications (in one case, an enormous improvement).

Given this, it's reasonable to assume that a 2.33x speedup to all copy/set operations would be felt in real applications built in JS if they make any serious use of typed arrays; if the speedup can actually be as much as 6-7x as previously claimed, it's a no-brainer.
Whiteboard: [DocArea=JS]
You need to log in before you can comment on or make changes to this bug.