Closed
Bug 856850
Opened 13 years ago
Closed 13 years ago
Add a PodOperations.h including all the Pod* methods from jsutil.h
Categories
(Core :: MFBT, enhancement)
Core
MFBT
Tracking
()
RESOLVED
FIXED
mozilla23
People
(Reporter: Waldo, Assigned: Waldo)
References
Details
Attachments
(2 files)
|
39.28 KB,
patch
|
Ms2ger
:
review+
|
Details | Diff | Splinter Review |
|
1.36 KB,
text/plain
|
Details |
I'm seeing way too much memset and memcmp in Gecko patches I'm looking at, and it makes me sad. A quick overview of the operations, for people who haven't seen them before:
template <class T> void PodZero(T *t):
memset(t, 0, sizeof(T))
template <class T> void PodZero(T *t, size_t nelem):
memset(t, 0, sizeof(T) * nelem)
// Should PodZero(array) zero the first element, or the entire thing?
// Given the ambiguity, force users to use PodArrayZero to clear the
// entire thing, and &array[0] to clear the first element
template <class T, size_t N> static void PodZero(T (&)[N]); /* undefined */
template <class T, size_t N> static void PodZero(T (&)[N], size_t); /* undefined */
template <class T, size_t N> void PodArrayZero(T (&t)[N]):
memset(t, 0, sizeof(T) * N)
template <class T> void PodAssign(T *dst, const T *src):
memcpy(dst, src, sizeof(T))
template <class T> void PodCopy(T *dst, const T *src, size_t nelem):
memcpy(dst, src, nelem * sizeof(T))
template <class T> bool PodEqual(T *one, T *two, size_t len):
len * sizeof(T) bytes at |one| are the same as the bytes at |two|
| Assignee | ||
Comment 1•13 years ago
|
||
Mostly mechanical changes, nothing difficult modulo typos, and forgetting to hg add the new header in multiple try-pushes. :-\
https://tbpl.mozilla.org/?tree=Try&rev=e96c1e5741aa
https://tbpl.mozilla.org/?tree=Try&rev=8060ab8797b4
Attachment #732435 -
Flags: review?(Ms2ger)
Comment 2•13 years ago
|
||
Comment on attachment 732435 [details] [diff] [review]
Move the methods, convert the users
Review of attachment 732435 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
::: js/src/ion/shared/Assembler-shared.h
@@ +148,4 @@
> Address(Register base, int32_t offset) : base(base), offset(offset)
> { }
>
> + Address() { mozilla::PodZero(this); }
I see there's no consistency at all in newlines for this pattern
::: js/src/jsiter.cpp
@@ +47,5 @@
> using namespace js::gc;
>
> using mozilla::ArrayLength;
> +#ifdef JS_MORE_DETERMINISTIC
> +using mozilla::PodCopy;
Dunno if I'd ifdef this.
::: mfbt/PodOperations.h
@@ +72,5 @@
> +PodAssign(T* dst, const T* src)
> +{
> + MOZ_ASSERT(dst != src);
> + MOZ_ASSERT_IF(src < dst, PointerRangeSize(src, static_cast<const T*>(dst)) >= 1);
> + MOZ_ASSERT_IF(dst < src, PointerRangeSize(static_cast<const T*>(dst), src) >= 1);
>0?
@@ +94,5 @@
> + * Avoid using operator= in this loop, as it may have been
> + * intentionally deleted by the POD type.
> + */
> + for (const T* srcend = src + nelem; src < srcend; src++, dst++)
> + PodAssign(dst, src);
We don't do {}s? Boo.
Attachment #732435 -
Flags: review?(Ms2ger) → review+
Comment 3•13 years ago
|
||
Comment on attachment 732435 [details] [diff] [review]
Move the methods, convert the users
Review of attachment 732435 [details] [diff] [review]:
-----------------------------------------------------------------
::: mfbt/PodOperations.h
@@ +24,5 @@
> +/** Set the contents of |t| to 0. */
> +template<typename T>
> +static void
> +PodZero(T* t)
> +{
If PodZero() for one element took a T& reference instead of a pointer, it might help identify mistakes where callers inadvertently call PodZero(buffer) instead of PodZero(buffer, bufferLength). Same suggestion for PodAssign().
@@ +32,5 @@
> +/** Set the contents of |nelem| elements starting at |t| to 0. */
> +template<typename T>
> +static void
> +PodZero(T* t, size_t nelem)
> +{
Do you want to handle the nelem == 0 case? Or can you assert nelem > 0? Same question for PodCopy() and PodEqual().
Attachment #732435 -
Flags: review+ → review?(Ms2ger)
Comment 4•13 years ago
|
||
Ms2ger: sorry I accidentally erased your r+!
| Assignee | ||
Comment 5•13 years ago
|
||
PodZero(T& t) would just work with T = int[5] and such, so I don't think it'd be much different from the array/length confusion already worked around here.
I'm pretty sure PodZero gets used for array contents of various sizes including 0, so it probably wants to stay as it is in that regard. PodEqual gets used in similar places, I think for checking JS string characters' equality, so 0 is useful there too. PodCopy also gets used in 0-taking places. I'm not necessarily *opposed* to a method that would assert non-zero, but definitely the precedent here is for zero being a reasonable count to handle.
| Assignee | ||
Comment 6•13 years ago
|
||
Target Milestone: --- → mozilla23
| Assignee | ||
Comment 7•13 years ago
|
||
(In reply to :Ms2ger from comment #2)
> > using mozilla::ArrayLength;
> > +#ifdef JS_MORE_DETERMINISTIC
> > +using mozilla::PodCopy;
>
> Dunno if I'd ifdef this.
Vague paranoia on my part about compilers warning for unused |using| declarations. Figured it couldn't hurt.
> >0?
No particular preference here, left as-is out of laziness.
Comment 8•13 years ago
|
||
So, PodOperations.h seems to mimic memcpy/memset but not close enough. PodZero seems to never call memset for an array, and PodCopy doesn't use memcpy if n<128. Are these limitations expected? I find it quite odd for this kind of abstraction to break memcpy/memset promises.
| Assignee | ||
Comment 9•13 years ago
|
||
The observable effect is the same as those: there's no breaking of memset/memcpy promises. But the actual way that effect is implemented is based on observation of how those actual calls *do* perform when compiled, if we used them. These aren't "limitations", they're performance optimizations based on how compilers behave when you do the naive memsets.
Calling directly into memset involves undesirable cross-library call overhead, which makes it not ideal for sufficiently small amounts of memory being set.
The memset function in libc also has to deal with any particular amount written -- 2, 4, 8, 16, but also 7, 33, 17, 42. Giving the compiler enough information to know that the amount written will be a multiple of some size -- say, the size of a pointer or whatever -- allows it to emit architecture-specific instructions to write zeroes a short/int/long/long long at a time, rather than a byte at a time.
It's true memset can detect these cases, but it's extra overhead to do so -- to set leading unaligned bytes, then check for the largest size to write out each iteration, then handle trailing unaligned bytes. (Compilers could recognize these cases where T* is being specified, and multiples of sizeof(T) bytes are being set, but it seems they don't do it now.) Make the non-genericity evident in the code, through T* and memsetting by sizeof(T) each time through the loop, as in these Pod* operations, and the compiler can generate the optimal code without need for special/edge-case handling to satisfy spec requirements.
memcmp has similar issues regarding its being required to accept pointers of any alignment, and compare any number of bytes at each pointer.
To be fair, at least some of these optimization observations were made with older compilers -- I think some, at least, with OS X gcc. It's possible some could be revisited wrt newer compilers.
Comment 10•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 11•13 years ago
|
||
(In reply to comment #9)
> To be fair, at least some of these optimization observations were made with
> older compilers -- I think some, at least, with OS X gcc. It's possible some
> could be revisited wrt newer compilers.
That's sort of my point. Until we have evidence that these functions will not be slower than their libc versions, I don't think that exposing them in MFBT is a good idea, since I don't want people to have to make a choice between fast code or more beautiful code.
Comment 12•13 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #9)
> It's true memset can detect these cases, but it's extra overhead to do so --
> to set leading unaligned bytes, then check for the largest size to write out
> each iteration, then handle trailing unaligned bytes. (Compilers could
> recognize these cases where T* is being specified, and multiples of
> sizeof(T) bytes are being set, but it seems they don't do it now.)
>
> To be fair, at least some of these optimization observations were made with
> older compilers -- I think some, at least, with OS X gcc. It's possible
> some could be revisited wrt newer compilers.
I was curious about this, so I wrote some tests. I checked for optimizing away functions for up to 128 elements under different optimization levels. Individual table cells are the number of times the function was optimized away for a particular datatype; a "perfect" compiler would have |128| in every table cell.
Results for clang 3.2 trunk, x86-64:
Os char short int long
memset 59 59 128 128
memcpy 27 27 128 128
memcmp 4 4 4 4
O2 char short int long
memset 123 123 128 128
memcpy 59 59 128 128
memcmp 4 4 4 4
O3 char short int long
memset 123 123 128 128
memcpy 59 59 128 128
memcmp 4 4 4 4
Results for ubuntu gcc 4.6.3, x86-64:
Os char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
O2 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
O3 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
Results for debian gcc 4.4.5:
Os char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
O2 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 128 128 128 128
O3 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 128 128 128 128
Results for android ndk6b gcc 4.4.3, arm:
Os char short int long
memset 2 4 7 7
memcpy 1 2 64 64
memcmp 1 1 1 1
O2 char short int long
memset 14 28 55 55
memcpy 1 2 64 64
memcmp 1 1 1 1
O3 char short int long
memset 14 28 55 55
memcpy 1 2 64 64
memcmp 1 1 1 1
Results for android ndk6b gcc 4.4.3, x86:
Os char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
O2 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
O3 char short int long
memset 128 128 128 128
memcpy 128 128 128 128
memcmp 1 1 1 1
clang's numbers are a bit curious, especially because it "misses" some inlining; it will inline memsets of, say, 22 and 24 elements, but not 23 elements. gcc 4.6 is very unwilling to optimize memcmp; x86-64 gcc 4.4 appears much more aggressive on that front. Take the ARM numbers with a grain of salt; I didn't try tweaking the particular CPU and ARM CPUs can vary widely in their tuning in GCC. So the ARM numbers might not be representative of what, say, our production compilers would actually see.
The benchmark is possibly too simple, what with only dealing with function parameters; things might change with local variables, etc.
Anyway, run the script with |bash check-memfoo.sh <compiler>|; I'd be interested to see what the old OS X gcc does.
Comment 13•13 years ago
|
||
The OSX gcc4.2 is not interesting as it's explicitly not supported and can no longer build Firefox. What's more interesting is MSVC. :-)
| Assignee | ||
Comment 14•13 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #11)
Methinks you doth protest too much. The overhead being optimized away here is small, and it may not matter in many to most cases. It *did* matter to the JS engine, and it motivated wrappers for memset/memcmp/memcpy (cleaner ones, too). Worries about this being slower are not well-founded, I think, unless you can demonstrate specific cases where it matters. But really, most bad-perf cases aren't going to be gated on mem* overheads, so even if these weren't fully optimal, it shouldn't be a game-changing difference the vast majority of the time. And when it is, people should be benchmarking the particular algorithm in question to determine that.
Comment 15•13 years ago
|
||
OK, fine, I guess.
Updated•13 years ago
|
Attachment #732435 -
Flags: review?(Ms2ger) → review+
Comment 16•12 years ago
|
||
Sorry, these changesets should have been marked with bug 933151:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d6f42dd7e54
https://hg.mozilla.org/integration/mozilla-inbound/rev/14c14da01456
Comment 17•12 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•