856850 - Add a PodOperations.h including all the Pod* methods from jsutil.h

Assignee

Description

•

13 years ago

I'm seeing way too much memset and memcmp in Gecko patches I'm looking at, and it makes me sad. A quick overview of the operations, for people who haven't seen them before: template <class T> void PodZero(T *t): memset(t, 0, sizeof(T)) template <class T> void PodZero(T *t, size_t nelem): memset(t, 0, sizeof(T) * nelem) // Should PodZero(array) zero the first element, or the entire thing? // Given the ambiguity, force users to use PodArrayZero to clear the // entire thing, and &array[0] to clear the first element template <class T, size_t N> static void PodZero(T (&)[N]); /* undefined */ template <class T, size_t N> static void PodZero(T (&)[N], size_t); /* undefined */ template <class T, size_t N> void PodArrayZero(T (&t)[N]): memset(t, 0, sizeof(T) * N) template <class T> void PodAssign(T *dst, const T *src): memcpy(dst, src, sizeof(T)) template <class T> void PodCopy(T *dst, const T *src, size_t nelem): memcpy(dst, src, nelem * sizeof(T)) template <class T> bool PodEqual(T *one, T *two, size_t len): len * sizeof(T) bytes at |one| are the same as the bytes at |two|

Jeff Walden [:Waldo]

Assignee

Comment 1

•

13 years ago

Attached patch Move the methods, convert the users — Details — Splinter Review

Mostly mechanical changes, nothing difficult modulo typos, and forgetting to hg add the new header in multiple try-pushes. :-\ https://tbpl.mozilla.org/?tree=Try&rev=e96c1e5741aa https://tbpl.mozilla.org/?tree=Try&rev=8060ab8797b4

Attachment #732435 - Flags: review?(Ms2ger)

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 2

•

13 years ago

Comment on attachment 732435 [details] [diff] [review] Move the methods, convert the users Review of attachment 732435 [details] [diff] [review]: ----------------------------------------------------------------- lgtm ::: js/src/ion/shared/Assembler-shared.h @@ +148,4 @@ > Address(Register base, int32_t offset) : base(base), offset(offset) > { } > > + Address() { mozilla::PodZero(this); } I see there's no consistency at all in newlines for this pattern ::: js/src/jsiter.cpp @@ +47,5 @@ > using namespace js::gc; > > using mozilla::ArrayLength; > +#ifdef JS_MORE_DETERMINISTIC > +using mozilla::PodCopy; Dunno if I'd ifdef this. ::: mfbt/PodOperations.h @@ +72,5 @@ > +PodAssign(T* dst, const T* src) > +{ > + MOZ_ASSERT(dst != src); > + MOZ_ASSERT_IF(src < dst, PointerRangeSize(src, static_cast<const T*>(dst)) >= 1); > + MOZ_ASSERT_IF(dst < src, PointerRangeSize(static_cast<const T*>(dst), src) >= 1); >0? @@ +94,5 @@ > + * Avoid using operator= in this loop, as it may have been > + * intentionally deleted by the POD type. > + */ > + for (const T* srcend = src + nelem; src < srcend; src++, dst++) > + PodAssign(dst, src); We don't do {}s? Boo.

Attachment #732435 - Flags: review?(Ms2ger) → review+

Chris Peterson [:cpeterson]

Comment 3

•

13 years ago

Comment on attachment 732435 [details] [diff] [review] Move the methods, convert the users Review of attachment 732435 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/PodOperations.h @@ +24,5 @@ > +/** Set the contents of |t| to 0. */ > +template<typename T> > +static void > +PodZero(T* t) > +{ If PodZero() for one element took a T& reference instead of a pointer, it might help identify mistakes where callers inadvertently call PodZero(buffer) instead of PodZero(buffer, bufferLength). Same suggestion for PodAssign(). @@ +32,5 @@ > +/** Set the contents of |nelem| elements starting at |t| to 0. */ > +template<typename T> > +static void > +PodZero(T* t, size_t nelem) > +{ Do you want to handle the nelem == 0 case? Or can you assert nelem > 0? Same question for PodCopy() and PodEqual().

Attachment #732435 - Flags: review+ → review?(Ms2ger)

Chris Peterson [:cpeterson]

Comment 4

•

13 years ago

Ms2ger: sorry I accidentally erased your r+!

Jeff Walden [:Waldo]

Assignee

Comment 5

•

13 years ago

PodZero(T& t) would just work with T = int[5] and such, so I don't think it'd be much different from the array/length confusion already worked around here. I'm pretty sure PodZero gets used for array contents of various sizes including 0, so it probably wants to stay as it is in that regard. PodEqual gets used in similar places, I think for checking JS string characters' equality, so 0 is useful there too. PodCopy also gets used in 0-taking places. I'm not necessarily *opposed* to a method that would assert non-zero, but definitely the precedent here is for zero being a reasonable count to handle.

Jeff Walden [:Waldo]

Assignee

Comment 6

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/de6afab8b383

Target Milestone: --- → mozilla23

Jeff Walden [:Waldo]

Assignee

Comment 7

•

13 years ago

(In reply to :Ms2ger from comment #2) > > using mozilla::ArrayLength; > > +#ifdef JS_MORE_DETERMINISTIC > > +using mozilla::PodCopy; > > Dunno if I'd ifdef this. Vague paranoia on my part about compilers warning for unused |using| declarations. Figured it couldn't hurt. > >0? No particular preference here, left as-is out of laziness.

(no longer active)

Comment 8

•

13 years ago

So, PodOperations.h seems to mimic memcpy/memset but not close enough. PodZero seems to never call memset for an array, and PodCopy doesn't use memcpy if n<128. Are these limitations expected? I find it quite odd for this kind of abstraction to break memcpy/memset promises.

Jeff Walden [:Waldo]

Assignee

Comment 9

•

13 years ago

The observable effect is the same as those: there's no breaking of memset/memcpy promises. But the actual way that effect is implemented is based on observation of how those actual calls *do* perform when compiled, if we used them. These aren't "limitations", they're performance optimizations based on how compilers behave when you do the naive memsets. Calling directly into memset involves undesirable cross-library call overhead, which makes it not ideal for sufficiently small amounts of memory being set. The memset function in libc also has to deal with any particular amount written -- 2, 4, 8, 16, but also 7, 33, 17, 42. Giving the compiler enough information to know that the amount written will be a multiple of some size -- say, the size of a pointer or whatever -- allows it to emit architecture-specific instructions to write zeroes a short/int/long/long long at a time, rather than a byte at a time. It's true memset can detect these cases, but it's extra overhead to do so -- to set leading unaligned bytes, then check for the largest size to write out each iteration, then handle trailing unaligned bytes. (Compilers could recognize these cases where T* is being specified, and multiples of sizeof(T) bytes are being set, but it seems they don't do it now.) Make the non-genericity evident in the code, through T* and memsetting by sizeof(T) each time through the loop, as in these Pod* operations, and the compiler can generate the optimal code without need for special/edge-case handling to satisfy spec requirements. memcmp has similar issues regarding its being required to accept pointers of any alignment, and compare any number of bytes at each pointer. To be fair, at least some of these optimization observations were made with older compilers -- I think some, at least, with OS X gcc. It's possible some could be revisited wrt newer compilers.

Ed Morley [:emorley]

Comment 10

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/de6afab8b383

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

(no longer active)

Comment 11

•

13 years ago

(In reply to comment #9) > To be fair, at least some of these optimization observations were made with > older compilers -- I think some, at least, with OS X gcc. It's possible some > could be revisited wrt newer compilers. That's sort of my point. Until we have evidence that these functions will not be slower than their libc versions, I don't think that exposing them in MFBT is a good idea, since I don't want people to have to make a choice between fast code or more beautiful code.

Nathan Froyd [:froydnj]

Comment 12

•

13 years ago

Attached file mem{set,cpy,cmp} checking script — Details

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #9) > It's true memset can detect these cases, but it's extra overhead to do so -- > to set leading unaligned bytes, then check for the largest size to write out > each iteration, then handle trailing unaligned bytes. (Compilers could > recognize these cases where T* is being specified, and multiples of > sizeof(T) bytes are being set, but it seems they don't do it now.) > > To be fair, at least some of these optimization observations were made with > older compilers -- I think some, at least, with OS X gcc. It's possible > some could be revisited wrt newer compilers. I was curious about this, so I wrote some tests. I checked for optimizing away functions for up to 128 elements under different optimization levels. Individual table cells are the number of times the function was optimized away for a particular datatype; a "perfect" compiler would have |128| in every table cell. Results for clang 3.2 trunk, x86-64: Os char short int long memset 59 59 128 128 memcpy 27 27 128 128 memcmp 4 4 4 4 O2 char short int long memset 123 123 128 128 memcpy 59 59 128 128 memcmp 4 4 4 4 O3 char short int long memset 123 123 128 128 memcpy 59 59 128 128 memcmp 4 4 4 4 Results for ubuntu gcc 4.6.3, x86-64: Os char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 O2 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 O3 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 Results for debian gcc 4.4.5: Os char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 O2 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 128 128 128 128 O3 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 128 128 128 128 Results for android ndk6b gcc 4.4.3, arm: Os char short int long memset 2 4 7 7 memcpy 1 2 64 64 memcmp 1 1 1 1 O2 char short int long memset 14 28 55 55 memcpy 1 2 64 64 memcmp 1 1 1 1 O3 char short int long memset 14 28 55 55 memcpy 1 2 64 64 memcmp 1 1 1 1 Results for android ndk6b gcc 4.4.3, x86: Os char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 O2 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 O3 char short int long memset 128 128 128 128 memcpy 128 128 128 128 memcmp 1 1 1 1 clang's numbers are a bit curious, especially because it "misses" some inlining; it will inline memsets of, say, 22 and 24 elements, but not 23 elements. gcc 4.6 is very unwilling to optimize memcmp; x86-64 gcc 4.4 appears much more aggressive on that front. Take the ARM numbers with a grain of salt; I didn't try tweaking the particular CPU and ARM CPUs can vary widely in their tuning in GCC. So the ARM numbers might not be representative of what, say, our production compilers would actually see. The benchmark is possibly too simple, what with only dealing with function parameters; things might change with local variables, etc. Anyway, run the script with |bash check-memfoo.sh <compiler>|; I'd be interested to see what the old OS X gcc does.

(no longer active)

Comment 13

•

13 years ago

The OSX gcc4.2 is not interesting as it's explicitly not supported and can no longer build Firefox. What's more interesting is MSVC. :-)

Jeff Walden [:Waldo]

Assignee

Comment 14

•

13 years ago

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #11) Methinks you doth protest too much. The overhead being optimized away here is small, and it may not matter in many to most cases. It *did* matter to the JS engine, and it motivated wrappers for memset/memcmp/memcpy (cleaner ones, too). Worries about this being slower are not well-founded, I think, unless you can demonstrate specific cases where it matters. But really, most bad-perf cases aren't going to be gated on mem* overheads, so even if these weren't fully optimal, it shouldn't be a game-changing difference the vast majority of the time. And when it is, people should be benchmarking the particular algorithm in question to determine that.

(no longer active)

Comment 15

•

13 years ago

OK, fine, I guess.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

13 years ago

Attachment #732435 - Flags: review?(Ms2ger) → review+

Karl Tomlinson (:karlt)

Comment 16

•

12 years ago

Sorry, these changesets should have been marked with bug 933151: https://hg.mozilla.org/integration/mozilla-inbound/rev/9d6f42dd7e54 https://hg.mozilla.org/integration/mozilla-inbound/rev/14c14da01456

Carsten Book [:Tomcat]

Comment 17

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/9d6f42dd7e54 https://hg.mozilla.org/mozilla-central/rev/14c14da01456

Chris Peterson [:cpeterson]

Updated

•

12 years ago

Depends on: 935370

Move the methods, convert the users 13 years ago Jeff Walden [:Waldo] 39.28 KB, patch	Ms2ger : review+	Details \| Diff \| Splinter Review
mem{set,cpy,cmp} checking script 13 years ago Nathan Froyd [:froydnj] 1.36 KB, text/plain		Details