730873 - (memmove) ArrayBufferView.prototype.move

Reporter

Description

•

13 years ago

We are considering adding a copyRange method to ArrayBufferViews for doing efficient memmoves. We would like to see how much more efficient a built-in method would be than implementing it by hand in JS. First cut at the intended behavior: calling a.move(start, end, dest) should be equivalent to: start = start >>> 0; // ToUint32 end = end >>> 0; // ToUint32 dest = dest >>> 0; // ToUint32 this.set(new this.constructor(this.buffer.slice(start * this.BYTES_PER_ELEMENT, end * this.BYTES_PER_ELEMENT)), dest); That is, copy the *initial values* of the elements from [start, end) to dest, even if the destination range overlaps with the source range. More efficiently, this might be done something like: start = start >>> 0; // ToUint32 end = end >>> 0; // ToUint32 dest = dest >>> 0; // ToUint32 let length = end - start, destEnd = dest + length; if (dest >= start && dest < end) { for (let i = length - 1; i >= 0; i--) { this[start + i] = this[dest + i]; } } else if (destEnd >= start && destEnd < end) { for (let i = 0; i < length; i++) { this[start + i] = this[dest + i]; } } else { memcpy(this.buffer.[[InternalBuffer]] + dest * this.BYTES_PER_ELEMENT, this.buffer.[[InternalBuffer]] + start * this.BYTES_PER_ELEMENT, length * this.BYTES_PER_ELEMENT); } IOW, if the destination overlaps the source to the right, copy right-to-left; if it overlaps to the left, copy left-to-right; and if there's no overlap just do a memcpy. (I'm not sure if there are ways to still do chunked copying even when there's overlap?) Dave

Dave Herman [:dherman]

Reporter

Comment 1

•

13 years ago

Oops, I wrote `copyRange` and then `move`. Not sure what the best name for the method is. Dave

Boris Zbarsky [:bzbarsky]

Comment 2

•

13 years ago

> I'm not sure if there are ways to still do chunked copying even when there's overlap? Like "memmove"? Or are you looking for something else?

Dave Herman [:dherman]

Reporter

Comment 3

•

13 years ago

> Like "memmove"? Or are you looking for something else? Oh, I guess since there's guaranteed not to be any need for conversion, memmove should just work out of the box. So all that **** above is unnecessary. Dave

Steve Fink [:sfink] [:s:]

Assignee

Comment 4

•

13 years ago

(In reply to Dave Herman [:dherman] from comment #1) > Oops, I wrote `copyRange` and then `move`. Not sure what the best name for > the method is. Well, in the version I started working on over the weekend, I called the internal function "moveRange" :) The question that came up is when I can optimize it to memmove(). This will reveal my limited grasp of Javascript yet again, but in what situations is it invalid to simply move the memory around? eg if there's a getter for all indexed elements, or a setter, or element #3723 has been set non-writable, or ...? Oh. In experimenting, Spidermonkey tells me that typedarrays are not extensible. So can I *always* use memmove?

Steve Fink [:sfink] [:s:]

Assignee

Comment 5

•

13 years ago

(and where in the spec does it say that? I searched for "extensible" in http://www.khronos.org/registry/typedarray/specs/latest/ and didn't find it.)

Dave Herman [:dherman]

Reporter

Comment 6

•

13 years ago

(In reply to Steve Fink [:sfink] from comment #4) > Oh. In experimenting, Spidermonkey tells me that typedarrays are not > extensible. So can I *always* use memmove? > (and where in the spec does it say that? I searched for "extensible" in > http://www.khronos.org/registry/typedarray/specs/latest/ and didn't find it.) Hm, this is a great question. SpiderMonkey and V8 currently disagree on Object.isExtensible(new Uint8Array(8)) The typed arrays spec does not, AFAIK, address this. WebIDL doesn't even seem to have much support for talking about object extensibility. If it's important for optimization, it's acceptable to me for typed arrays to be non-extensible. Dave

Alon Zakai (:azakai)

Comment 7

•

13 years ago

I would suggest calling the method 'copy' or 'copyRange', since logically we are copying here (the original data is left there, not moved somewhere else). 'move' only comes from the libc function memmove I think, which is only called that because memcpy was already a taken name.

Jeff Walden [:Waldo]

Comment 8

•

13 years ago

I made typed arrays and array buffers non-extensible just because it kind of matched the semantics for them that we'd already implemented, and it seemed like less of a lie to be consistent. Once those puppies can have named properties defined on them, it might be worth marking them as extensible. But in any case, that choice wasn't a super-rationalized/reasoned decision.

Steve Fink [:sfink] [:s:]

Assignee

Comment 9

•

13 years ago

(In reply to Jeff Walden (:Waldo) (remove +bmo to email) from comment #8) > I made typed arrays and array buffers non-extensible just because it kind of > matched the semantics for them that we'd already implemented, and it seemed > like less of a lie to be consistent. Once those puppies can have named > properties defined on them, it might be worth marking them as extensible. > But in any case, that choice wasn't a super-rationalized/reasoned decision. So it seems like it would be good for the typed array spec to disallow doing any problematic weird stuff. The semantics of copyRange or whatever this is to be called are pretty murky when a getter in the middle of the range mutates the array (or its ArrayBuffer, eg neutering it halfway through the operation). Allowing getters on indexed elements at all seems like it defeats much of the point of typed arrays.

Kenneth Russell

Comment 10

•

13 years ago

(In reply to Steve Fink [:sfink] from comment #9) > (In reply to Jeff Walden (:Waldo) (remove +bmo to email) from comment #8) > > I made typed arrays and array buffers non-extensible just because it kind of > > matched the semantics for them that we'd already implemented, and it seemed > > like less of a lie to be consistent. Once those puppies can have named > > properties defined on them, it might be worth marking them as extensible. > > But in any case, that choice wasn't a super-rationalized/reasoned decision. > > So it seems like it would be good for the typed array spec to disallow doing > any problematic weird stuff. The semantics of copyRange or whatever this is > to be called are pretty murky when a getter in the middle of the range > mutates the array (or its ArrayBuffer, eg neutering it halfway through the > operation). Allowing getters on indexed elements at all seems like it > defeats much of the point of typed arrays. Changes to the typed array spec would be welcome to avoid nasty corner cases like you describe. The goal of this bug and bug 730880 are to prototype a couple of additions to the spec to see if they provide significant speedups for use cases like Emscripten. Please assume for the time being that the proposed entry points can be implemented trivially; this is the intent.

Steve Fink [:sfink] [:s:]

Assignee

Comment 11

•

13 years ago

Attached patch Implement ArrayBufferView.prototype.move (obsolete) — Details — Splinter Review

Not ready for review yet, and applied on top of DataView and Transferable patches, but more or less functioning.

Steve Fink [:sfink] [:s:]

Assignee

Comment 12

•

13 years ago

Attached patch Implement ArrayBufferView.prototype.move (obsolete) — Details — Splinter Review

No reason to hold this up on the transferable stuff.

Attachment #630321 - Flags: review?(jwalden+bmo)

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Attachment #601054 - Attachment is obsolete: true

Jeff Walden [:Waldo]

Comment 13

•

13 years ago

Comment on attachment 630321 [details] [diff] [review] Implement ArrayBufferView.prototype.move Review of attachment 630321 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jstypedarray.cpp @@ +768,5 @@ > > bool > +TypedArray::memmoveData(JSContext *cx, JSObject *tarray, uint32_t byteDest, uint32_t byteSrc, uint32_t byteSize) > +{ > + JS_ASSERT(tarray); Make the argument a JSObject& to syntactically indicate non-nullness. @@ +776,5 @@ > + > + JS_ASSERT(byteDest <= getByteLength(tarray)); > + JS_ASSERT(byteSrc <= getByteLength(tarray)); > + JS_ASSERT(byteDest + byteSize <= getByteLength(tarray)); > + JS_ASSERT(byteSrc + byteSize <= getByteLength(tarray)); Also assert that |byteDest + byteSize >= byteDest| and similar for src, too, to detect overflows -- logically not possible because typed arrays are INT32_MAX long at most, but it requires some implementation knowledge to verify. (See also below.) @@ +781,5 @@ > + > + uint8_t *data = static_cast<uint8_t*>(getDataOffset(tarray)); > + memmove(&data[byteDest], &data[byteSrc], byteSize); > + > + return true; This should be a void method, as it always succeeds. @@ +1620,5 @@ > + return ok; > + > + JSObject *tarray = getTypedArray(obj); > + if (!tarray) > + return JS_TRUE; How can this ever be the case? You've got a fastClass() object by this point. @@ +1624,5 @@ > + return JS_TRUE; > + > + if (args.length() < 3) { > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS); > + return JS_FALSE; false/true throughout. @@ +1629,5 @@ > + } > + > + int32_t srcBegin; > + int32_t srcEnd; > + int32_t dest; Make these all uint32_t, and change ToClampedIndex to take in/output uint32_t. I don't see anything that makes it tricky to do this -- every user uses the output int32_t in a uint32_t context, and every user casts a uint32_t input length to int32_t to satisfy typing. @@ +1631,5 @@ > + int32_t srcBegin; > + int32_t srcEnd; > + int32_t dest; > + > + int32_t length = int32_t(getLength(tarray)); ...like this one too. :-) @@ +1635,5 @@ > + int32_t length = int32_t(getLength(tarray)); > + if (!ToClampedIndex(cx, args[0], length, &srcBegin) > + || !ToClampedIndex(cx, args[1], length, &srcEnd) > + || !ToClampedIndex(cx, args[2], length, &dest) > + || srcBegin > srcEnd) || at end of lines. @@ +1643,5 @@ > + } > + > + int32_t nelts = srcEnd - srcBegin; > + > + if (dest + nelts > length) { Assert |dest + nelts >= dest|, because it takes a little thinking (and knowledge that typed array lengths in our implementation are never greater than INT32_MAX) to verify correctness. @@ +1648,5 @@ > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS); > + return JS_FALSE; > + } > + > + if (!memmoveData(cx, tarray, dest * sizeof(NativeType), srcBegin * sizeof(NativeType), nelts * sizeof(NativeType))) So, memmoveData is a method at the level of typed arrays. It's not a method at the level of array buffers. For that reason, perhaps these parameters shouldn't be sizeof(NativeType)-multiplied, and that should happen in memmoveData? Seems more consistent with all the other methods for indexes and such to be in element-sized values, not byte-sized values. @@ +1650,5 @@ > + } > + > + if (!memmoveData(cx, tarray, dest * sizeof(NativeType), srcBegin * sizeof(NativeType), nelts * sizeof(NativeType))) > + return JS_FALSE; > + args.rval().setNull(); setUndefined() surely?

Attachment #630321 - Flags: review?(jwalden+bmo) → review+

Steve Fink [:sfink] [:s:]

Assignee

Comment 14

•

13 years ago

Attached patch Implement ArrayBufferView.prototype.move — Details — Splinter Review

I ended up touching a high enough percentage of the code that I'd like a re-review. (See, it doesn't matter if you give me r+ or r-. I'll pester you again anyway.) The tests were hopelessly wrong and minimal besides, so I fixed them and beefed them up a bunch. When I attempted to convert TypedArray::memmoveData or whatever it was to talk in terms of elements instead of bytes, the multiplications by sizeof(NativeType) got sufficiently verbose that I ended up manually inlining the darn thing into its caller.

Attachment #632125 - Flags: review?(jwalden+bmo)

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Attachment #630321 - Attachment is obsolete: true

Jeff Walden [:Waldo]

Comment 15

•

13 years ago

Comment on attachment 632125 [details] [diff] [review] Implement ArrayBufferView.prototype.move Review of attachment 632125 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jstypedarray.cpp @@ +78,5 @@ > > /* > * Convert |v| to an array index for an array of length |length| per > * the Typed Array Specification section 7.0, |subarray|. If successful, > * the output value is in the range [0, length). Nuh-uh! It's in the range [0, length]. @@ +81,5 @@ > * the Typed Array Specification section 7.0, |subarray|. If successful, > * the output value is in the range [0, length). > */ > static bool > +ToClampedIndex(JSContext *cx, const Value &v, uint32_t length, uint32_t *out) I noted on IRC that there's an issue with this algorithm, in implementations which don't limit typed array/arraybuffer/etc. lengths to less than 2**31 (note: we are not such an implementation): you can't directly refer to indexes which are above 2**31 - 1, only by negative indexes. In other words, this produces a slice containing the second-to-last element: var ta = new Uint8Array(Math.pow(2, 31) + 1); ta.slice(-2, -1); but this doesn't: var ta = new Uint8Array(Math.pow(2, 31) + 1); ta.slice(Math.pow(2, 31) - 1, Math.pow(2, 31)); (Modulo any off-by-one errors there.) dherman says he's taking this to the khronos list -- just noting it here in passing, so others are aware of it. @@ +1622,5 @@ > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS); > + return false; > + } > + > + int32_t nelts = srcEnd - srcBegin; Make this a uint32_t, please. @@ +1628,5 @@ > + if (dest + nelts > length) { > + JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, JSMSG_TYPED_ARRAY_BAD_ARGS); > + return false; > + } > + JS_ASSERT(dest + nelts >= dest); I'd prefer the assert before the if, so that when you hit the if reading code you know the assert-condition's true (assuming no bugs).

Attachment #632125 - Flags: review?(jwalden+bmo) → review+

Matt Brubeck (:mbrubeck)

Comment 16

•

13 years ago

Backed out on inbound because of build errors: https://hg.mozilla.org/integration/mozilla-inbound/rev/2963d4018347 https://tbpl.mozilla.org/php/getParsedLog.php?id=13052793&tree=Mozilla-Inbound

Steve Fink [:sfink] [:s:]

Assignee

Comment 17

•

13 years ago

Second attempt: https://hg.mozilla.org/integration/mozilla-inbound/rev/fb6b64407b31

Assignee: general → sphink

Ed Morley [:emorley]

Comment 18

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/fb6b64407b31

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla16

Tom S. (please needinfo tschuster)

Comment 19

•

13 years ago

This seems experimental, but still dev-doc-needed ;)

Keywords: dev-doc-needed

Kenneth Russell

Comment 20

•

13 years ago

Great to see this progress. Two questions. First, are there any performance measurements of speedups of Emscripten compiled code using this new primitive? Second, is it wise to expose this method un-prefixed? I would have thought it would be exposed under "mozMove" or similar until it is actually in the specification.

Alon Zakai (:azakai)

Comment 21

•

13 years ago

This can produce nice speedups. Raw memcpy/memmove becomes 6-7x faster (measured on mozilla-central now), so how much it speeds up compiled code depends on how much that code depends on memcpy/memmove. Not a big issue in most codebases probably, but in some cases it would be significant. Not sure about the prefixing policy, now that you mention it I'm curious about that too.

Kenneth Russell

Comment 22

•

13 years ago

Alon, can you do some measurements on representative Emscripten compiled code (maybe ammo.js, box2d.js, others?) and see what the speedup is in practice? Per our last discussion, the reason this was added was to see whether it speeds up real world scenarios, and is therefore worth adding to the typed array spec. Thanks.

Dave Herman [:dherman]

Reporter

Comment 23

•

13 years ago

We've already seen results of 6x - 7x speedups in benchmarks, and it's a simple, well-understood API with just a single method. This is a clear win; I see no downside to adding this to the spec. Dave

Kenneth Russell

Comment 24

•

13 years ago

Where are these benchmarks? Can they be run publicly? Have you thought through all of the consequences of adding this method to ArrayBufferView rather than for example ArrayBuffer? For example, because of Web IDL semantics, overloads taking arrays do not behave intuitively -- I am not sure whether overloads taking ArrayBufferView will. See https://www.khronos.org/webgl/public-mailing-list/archives/1204/msg00223.html and http://www.khronos.org/webgl/public-mailing-list/archives/1204/threads.html#00026 . Adding it to ArrayBufferView may end up being the correct decision, but it isn't clear this has been considered. Please add the "moz" prefix to this method until this method has been added to the typed array specification.

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Depends on: 786386

Alon Zakai (:azakai)

Comment 25

•

13 years ago

Attached file little benchmark — Details

Here's an example little benchmark. The last three lines in the inner loop determine if it uses manual move or move() or set(). Comparing the two on IonMonkey, I get a 2.33x speedup with move() compared to manual move (set() is slightly slower still). This might be lower than previous tests because IonMonkey does better on the manual move than JM+TI. This is an artificial benchmark, but it does show the raw operation is faster. I am not currently aware of a real-world codebase that stresses memcpy or memmove enough for this to be noticeable, looks like typically optimized code avoids copies in various ways. Still, I think it's very good to be able to be sure to not become slower when someone does actually compile such a project.

Katelyn Gadd (:kael)

Comment 26

•

12 years ago

When it comes to justifying the performance gains from offering high-performance memset/memmove intrinsics, this paper may be useful: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40679.pdf Google ran an experiment where they instrumented uses of memcpy/memmove (and i think strcpy) across their codebase to identify cases where they could use a slightly-faster variant of the copy (basically, bypassing the CPU cache). Their instrumentation showed that many of their real applications spent 3-5% of CPU time in copy operations, and some spent as much as 30%. They built on this to show that even a comparatively small improvement (from better use of cache) translated into real performance gains for those applications (in one case, an enormous improvement). Given this, it's reasonable to assume that a 2.33x speedup to all copy/set operations would be felt in real applications built in JS if they make any serious use of typed arrays; if the speedup can actually be as much as 6-7x as previously claimed, it's a no-brainer.

Florian Scholz (Open Web Docs)

Updated

•

12 years ago

Whiteboard: [DocArea=JS]

Florian Scholz (Open Web Docs)

Comment 27

•

11 years ago

This has been documented, but with bug 1021379 in mind (move is now copyWithin). https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/move

Keywords: dev-doc-needed → dev-doc-complete

Implement ArrayBufferView.prototype.move 13 years ago Steve Fink [:sfink] [:s:] 5.21 KB, patch		Details \| Diff \| Splinter Review
Implement ArrayBufferView.prototype.move 13 years ago Steve Fink [:sfink] [:s:] 6.25 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Implement ArrayBufferView.prototype.move 13 years ago Steve Fink [:sfink] [:s:] 9.08 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
little benchmark 13 years ago Alon Zakai (:azakai) 589 bytes, text/plain		Details