Last Comment Bug 798172 - add mfbt/Endian.h
: add mfbt/Endian.h
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: MFBT (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla23
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 853646
Blocks: dieprtypesdie
  Show dependency treegraph
 
Reported: 2012-10-04 18:56 PDT by Nathan Froyd [:froydnj]
Modified: 2013-04-07 02:13 PDT (History)
7 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
strawman Endian.h file (6.01 KB, text/plain)
2012-10-04 18:56 PDT, Nathan Froyd [:froydnj]
no flags Details
strawman Endian.h file (6.12 KB, text/plain)
2012-10-04 20:15 PDT, Nathan Froyd [:froydnj]
no flags Details
strawman Endian.h file (7.03 KB, text/plain)
2012-10-05 07:36 PDT, Nathan Froyd [:froydnj]
no flags Details
strawman Endian.h file, v2 (14.64 KB, text/plain)
2012-10-26 17:55 PDT, Nathan Froyd [:froydnj]
jwalden+bmo: feedback+
Details
part 1 - add mfbt/Endian.h (16.03 KB, patch)
2013-01-30 13:52 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 2 - add tests for mfbt/Endian.h (16.80 KB, patch)
2013-01-30 13:53 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 3 - convert SHA1.cpp to use Endian.h (1.55 KB, patch)
2013-01-30 13:53 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 4 - convert the jsclone bits to use Endian.h (7.04 KB, patch)
2013-01-30 13:54 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 5 - convert xdr bits to use Endian.h (5.42 KB, patch)
2013-01-30 13:56 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 1 - add mfbt/Endian.h (17.05 KB, patch)
2013-02-15 14:00 PST, Nathan Froyd [:froydnj]
jwalden+bmo: review-
Details | Diff | Review
part 2 - add tests for mfbt/Endian.h (16.39 KB, patch)
2013-02-15 14:01 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 1 - add mfbt/Endian.h (20.77 KB, patch)
2013-03-22 04:37 PDT, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 2 - add tests for mfbt/Endian.h (16.53 KB, patch)
2013-03-22 04:38 PDT, Nathan Froyd [:froydnj]
jwalden+bmo: review+
Details | Diff | Review
Part 1, revised - add mfbt/Endian.h (20.41 KB, patch)
2013-03-26 16:59 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
nfroyd: review+
Details | Diff | Review
part 3 - convert SHA1.cpp to use Endian.h (3.46 KB, patch)
2013-04-03 09:38 PDT, Nathan Froyd [:froydnj]
jwalden+bmo: review+
Details | Diff | Review
part 4 - convert the jsclone bits to use Endian.h (7.38 KB, patch)
2013-04-03 09:38 PDT, Nathan Froyd [:froydnj]
jwalden+bmo: review+
Details | Diff | Review
part 5 - convert xdr bits to use Endian.h (5.49 KB, patch)
2013-04-03 09:39 PDT, Nathan Froyd [:froydnj]
jwalden+bmo: review+
Details | Diff | Review
workaround-clang-__builtin_bswap16.patch (1.69 KB, patch)
2013-04-03 23:56 PDT, Chris Peterson [:cpeterson]
jwalden+bmo: review+
Details | Diff | Review

Description Nathan Froyd [:froydnj] 2012-10-04 18:56:09 PDT
Created attachment 668276 [details]
strawman Endian.h file

There's quite a bit of gnarly byte-swapping code lurking in the code base, most of it reliant on some form of IS_BIG_ENDIAN or IS_LITTLE_ENDIAN.  As those macros come from #include'ing prtypes.h (prtypes.h includes MDCPUCFG, which provides said macros), and we want prtypes.h to go away, we need to consolidate all of these definitions somehow, into something that doesn't rely on prtypes.h.

There's also offhand comments in mfbt/SHA1.h about how we should more carefully define MOZ_LITTLE_ENDIAN and/or MOZ_BIG_ENDIAN in an Endian.h header.  As such macros are still useful for defining structure layouts, some low-level gfx bits, and so forth, we should provide them.

I've attached a strawman Endian.h file; there may be bugs.  It needs a little more documentation and possibly some optimization of the actual read/write of different endiannesses (clang will optimize the Read and Write functions fine; gcc has trouble; I'm not sure what MSVC does with them).  The MOZ_{LITTLE,BIG}_ENDIAN bits I plan to borrow liberally from jscpucfg.h.

I'm not entirely happy with the API yet.  The idealistic part of me (http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html) would like people to just use the Read/Write functions:

  m16Bit = LittleEndian::ReadUInt16(buffer);
  m32Bit = BigEndian::ReadUInt32(buffer + sizeof(m16Bit));

and so forth.  But for the sake of obviousness and not tediously rewriting a bunch of code, the Swap* functions may actually be more useful:

  m16Bit = NativeEndian::SwapFromLittleEndian(m16Bit);
  m32Bit = NativeEndian::SwapFromBigEndian(m32Bit);

or somesuch.  Suggestions welcome.
Comment 1 Joshua Cranmer [:jcranmer] 2012-10-04 19:30:15 PDT
Your Write function has a nasty overflow bug lying in wait:
(UINT8_MAX << (8 * i));

C99 appears to guarantee that UINT8_MAX would be only an int, which means the left-shift will produce undefined behavior for 32-bit (0xFF << 24 is actually undefined behavior!). Not to mention you forgot to right-shift the values back into 8 bits :-P.

Thinking harder, your Read function also has a subtle undefined behavior bug as well too (hint: left-shifting a 1 into the sign bit of a signed number is undefined behavior).

As for your mention of the blog post, I'll point out that some of his points are just plain wrong:
1. Hide the code in a standard function/macro (you should do this with any code which is mind-numbingly simple and also very easy to mess up), and there's no code-difference either way.
2. 95% of the use cases are already in situations where things are properly aligned (at least for 32-bit values).
3. uint32_t.
4. Modern optimizers can probably spot the byte-swap pattern anyways, and reduce it to explicit byte swap instructions on architectures that have them. Granted, they should probably also be able to optimize the native-endian-read pattern as well.
5. I do agree here, actually.
6. I don't do enough network/binary file programming to comment here.

I will point out, expounding on my point #4, it's probably better not to rely on the compiler to first unroll the loop and then do pattern-recognition for byte swapping (this may be why gcc doesn't pick up on optimization).
Comment 2 Nathan Froyd [:froydnj] 2012-10-04 19:51:27 PDT
(In reply to Joshua Cranmer [:jcranmer] from comment #1)
> Your Write function has a nasty overflow bug lying in wait:
> (UINT8_MAX << (8 * i));
> 
> C99 appears to guarantee that UINT8_MAX would be only an int, which means
> the left-shift will produce undefined behavior for 32-bit (0xFF << 24 is
> actually undefined behavior!). Not to mention you forgot to right-shift the
> values back into 8 bits :-P.

Doh!  Casting UINT8_MAX to the proper type should DTRT.  And shifting the byte back would certainly help. :)

> Thinking harder, your Read function also has a subtle undefined behavior bug
> as well too (hint: left-shifting a 1 into the sign bit of a signed number is
> undefined behavior).

The left shift of a signed value is always potentially undefined, then?  Ugh.  Well, that's easily taken care of.

> 1. Hide the code in a standard function/macro (you should do this with any
> code which is mind-numbingly simple and also very easy to mess up), and
> there's no code-difference either way.

There should be some helper functions lying somewhere around.  I am not really sure this is here or there, though.

> 2. 95% of the use cases are already in situations where things are properly
> aligned (at least for 32-bit values).

I am not convinced of this.  It may be properly aligned, but I wouldn't want to bet on it for everything that's not an x86 (and maybe some ARM phones nowadays).  I am unsure of what the standard requires here, but it may very well be that the compiler will do *(int *)voidptr with byte loads and shifts anyway, which kind of kills the idea.

> I will point out, expounding on my point #4, it's probably better not to
> rely on the compiler to first unroll the loop and then do
> pattern-recognition for byte swapping (this may be why gcc doesn't pick up
> on optimization).

gcc doesn't pick up on the optimization here because it doesn't understand the memory reads well enough: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42587  (Recent-ish versions DTRT if you're manually byte swapping integers.)  Anyway, __builtin_bswap* remains a live option when compiling with GCC, especially since we've dropped 4.2 support.
Comment 3 Joshua Cranmer [:jcranmer] 2012-10-04 19:57:25 PDT
(In reply to Nathan Froyd (:froydnj) from comment #2)
> > 2. 95% of the use cases are already in situations where things are properly
> > aligned (at least for 32-bit values).
> 
> I am not convinced of this.  It may be properly aligned, but I wouldn't want
> to bet on it for everything that's not an x86 (and maybe some ARM phones
> nowadays).  I am unsure of what the standard requires here, but it may very
> well be that the compiler will do *(int *)voidptr with byte loads and shifts
> anyway, which kind of kills the idea.

What I meant was that, in most modern binary protocols/files, a 32-bit value would be stored at an offset that is aligned to a 32-bit boundary. So the byte buffer we're reading from is usually at an address that makes the potential for misaligned access a non-issue.
Comment 4 Nathan Froyd [:froydnj] 2012-10-04 20:01:28 PDT
(In reply to Joshua Cranmer [:jcranmer] from comment #3)
> What I meant was that, in most modern binary protocols/files, a 32-bit value
> would be stored at an offset that is aligned to a 32-bit boundary. So the
> byte buffer we're reading from is usually at an address that makes the
> potential for misaligned access a non-issue.

Ah, that's fair.  Though for library code, I'm not sure we want to assume that.  (And I think having Aligned{Read,Write} might be asking for trouble.)  The perf difference on a Mozilla scale is probably not significant, either.
Comment 5 Nathan Froyd [:froydnj] 2012-10-04 20:15:42 PDT
Created attachment 668301 [details]
strawman Endian.h file

Updated version addressing Joshua's comments for Write and leaving a note for Read.
Comment 6 Nathan Froyd [:froydnj] 2012-10-04 20:28:30 PDT
Worth noting that the Read/Write functions would do a better job in translating things like js/src/vm/Xdr.h, possibly other places.  So maybe we really do need both sets of APIs.  Whether they need to be conjoined in a single class is open to debate.
Comment 7 Nathan Froyd [:froydnj] 2012-10-05 07:36:00 PDT
Created attachment 668437 [details]
strawman Endian.h file

New version with MOZ_{LITTLE,BIG}_ENDIAN and using mozilla::detail.  Asking Waldo for some feedback as MFBT maintainer; feel free to bounce it to someone else.
Comment 8 Jeff Walden [:Waldo] (remove +bmo to email) 2012-10-17 16:09:13 PDT
Comment on attachment 668437 [details]
strawman Endian.h file

This looks pretty decent.  Having read/write for everything, and not providing swap, might almost make sense, but I can imagine too many places where existing interfaces would hand off data in big blocks at a time, in a way only amenable to post-swapping.  So I think we probably do need both.

Looping versus reading an int at a time and whatever, due to misaligned cases, seems like something we should punt on, til some user can demonstrate (with a profile) that well-aligned fast paths are needed.  And in some places compilers should already know a pointer's aligned, or so I'd hope.

Methods in classes are camelCaps, so these should be Endian::readUint16 and so on.

It seems like it'd be nice to have, in addition to swap{To,From}{Big,Little}Endian, versions that use "network" nomenclature.  It'd read better in protocol implementation code to see "network" rather than "big", I think.

>    template<typename T>
>    static T Read(const void *p) {

Does this even work?  I didn't think you could have overloads distinguished only by return type.  Or at the very least, if you can, you'd have to explicitly parametrize calls to them.

>      uint8_t buffer[sizeof(T)];
>      memcpy(buffer, p, sizeof(T));
>
>      /* FIXME: instances of T here really need to be Unsigned<T>::type. */

C++11's <type_traits> has make_signed<T>::type and make_unsigned<T>::type, so we'd want to add mozilla::MakeSigned<T>::Type and mozilla::MakeUnsigned<T>::Type, assuming we can't work around this some other way.  I'm not seeing a way offhand, doesn't mean one doesn't exist.  It's unfortunate signed arithmetic invokes undefined behavior so readily.  :-(

>    template<typename T>
>    static void Write(void *p, T val) {

Calls to function templates in C++98 have to have <> in them, even if all template arguments are defaulted.  Hmm, I guess that applies to Read as well.

>typedef detail::Endian<false> LittleEndian;
>typedef detail::Endian<true> BigEndian;

It's a little confusing to have a BigEndian typedef and the bool BigEndian, even if they're in different namespaces.  enum detail::Endianness { Big, Little } should read nicer than not-obvious-at-use-sites bool anyway, I think.  (And would accord with mfbt style to use enums instead of bools whenever the meaning isn't completely obvious, see mfbt/STYLE.)

typedef doesn't create new types, so people are going to be seeing detail::Endian<false> in stuff like debuggers.  Make these, and the Endian typedef, into |class T : public Endian<...> { };| so you have a distinct type for purposes of mangling and such.

So the interface this would all expose (ignoring detail) would be:

namespace mozilla {
class LittleEndian;
class BigEndian;
class NativeEndian;
}

And each of these classes would include the exact same set of methods in it, wouldn't it?  Even in the cases where that doesn't make any sense, like LittleEndian::swapToLittleEndian(T*).  It might be a little bit of repetition, but I think we should avoid that.  If we have all three be separate classes, we can do this easily enough -- just make each *privately* inherit detail::Endian, then |using Endian<T>::swap| each desired method.  A little ugly, but I think in a tradeoff between implementation and interface simplicity, the balance should be toward interface simplicity.

This may be the idealist in me speaking.  But what do you envision as the use case for exposing MOZ_BIG_ENDIAN and MOZ_LITTLE_ENDIAN?  What purpose would these serve, that couldn't be served by the read/write/swap functions, more explicitly, without preprocessing #ifdefs to make the user code harder to understand?  Right now I'm kind of thinking these would be an attractive nuisance, and we shouldn't expose them.

It'll be good to see all this in place, definitely -- byte swapping is just horrible to read duplicated everywhere.
Comment 9 Nathan Froyd [:froydnj] 2012-10-17 19:54:58 PDT
Thanks for the feedback.  There's clearly still some distance to go.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #8)
> So the interface this would all expose (ignoring detail) would be:
> 
> namespace mozilla {
> class LittleEndian;
> class BigEndian;
> class NativeEndian;
> }
> 
> And each of these classes would include the exact same set of methods in it,
> wouldn't it?  Even in the cases where that doesn't make any sense, like
> LittleEndian::swapToLittleEndian(T*).  It might be a little bit of
> repetition, but I think we should avoid that.  If we have all three be
> separate classes, we can do this easily enough -- just make each *privately*
> inherit detail::Endian, then |using Endian<T>::swap| each desired method.  A
> little ugly, but I think in a tradeoff between implementation and interface
> simplicity, the balance should be toward interface simplicity.

I am confused.  This sounds like you're saying we should have, e.g. LittleEndian::swapFromBigEndian but not BigEndian::swapFromBigEndian?  Oh, maybe you're saying we should have (handwaving) Endian<LittleEndian>::swapFromBigEndian and then have LittleEndian and NativeEndian support that, but not BigEndian?  That seems reasonable, though I wasn't really thinking about having NativeEndian be a class of its own, but a typedef for one or the other, depending.  And in the typedef case, you'd need BigEndian to support it too.  Just like big-endian platforms still define ntohl and friends, even though they're nops on such platforms.

> This may be the idealist in me speaking.  But what do you envision as the
> use case for exposing MOZ_BIG_ENDIAN and MOZ_LITTLE_ENDIAN?  What purpose
> would these serve, that couldn't be served by the read/write/swap functions,
> more explicitly, without preprocessing #ifdefs to make the user code harder
> to understand?  Right now I'm kind of thinking these would be an attractive
> nuisance, and we shouldn't expose them.

Well, we need something like them to implement NativeEndian, yes?  Maybe you're just thinking of that for header-local use.  The canonical use I see for the macros is graphics code:

http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#101
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#1315
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#4272

Actually, I guess you could implement the CanvasRenderingContext2D code with some thought (NativeEndian::loadBytesN?, not sure of the best approach here).  It might be the lateness of the hour, but I don't think there's a good way to implement the thebes code; ABGR format is just wacky, apparently.  There's also gems like:

http://mxr.mozilla.org/mozilla-central/source/content/media/gstreamer/nsGStreamerReader.cpp#133
Comment 10 Nathan Froyd [:froydnj] 2012-10-26 17:55:07 PDT
Created attachment 675771 [details]
strawman Endian.h file, v2

Here's a revised version.  I think I have addressed your comments from the last version.  I've also eliminated the issues with signed types by rewriting things in terms of the primitive |swap|, possibly less elegantly, but also in a way that's easier to let the compiler figure out what's going on.

I left in the preprocessor macros for reasons previously discussed; if you look for IS_LITTLE_ENDIAN instances across the code base, I think you'll quickly find several instances where the preprocessor macros are necessary (JS, graphics code, image decoders, and intl/ at the very least).

The one thing I especially would like your feedback on is the "bulk swap" functions (swap{To,From}*InPlace and copyAndSwap{To,From}*).  On the one hand, they're not exactly part of a parsimonious API.  On the other, they're useful for clearly showing the compiler that particular operations can be nops and/or should compile as a lower-level primitive (e.g. memcpy).

I realize the swap functions probably some do something clever with the moral equivalent of std::is_integral; I'd ask to implement that in a followup bug.

I think once we've sorted out the bulk operation functions yea or nay and I've written some more tests, this will be ready for review.
Comment 11 Mike Hommey [:glandium] 2012-10-29 07:47:09 PDT
Comment on attachment 675771 [details]
strawman Endian.h file, v2

>    template<typename T>
>    static T read(const void *p) {
>      union {
>        T val;
>        uint8_t buffer[sizeof(T)];
>      } u;
>      memcpy(u.buffer, p, sizeof(T));

Wouldn't it be better to avoid memcpy if the address is aligned? Actually, it would probably be better to avoid memcpy if the address is unaligned and the architecture supports reading unaligned words.
Comment 12 Nathan Froyd [:froydnj] 2012-10-29 08:26:00 PDT
(In reply to Mike Hommey [:glandium] from comment #11)
> Comment on attachment 675771 [details]
> strawman Endian.h file, v2
> 
> >    template<typename T>
> >    static T read(const void *p) {
> >      union {
> >        T val;
> >        uint8_t buffer[sizeof(T)];
> >      } u;
> >      memcpy(u.buffer, p, sizeof(T));
> 
> Wouldn't it be better to avoid memcpy if the address is aligned? Actually,
> it would probably be better to avoid memcpy if the address is unaligned and
> the architecture supports reading unaligned words.

GCC, at least (x86{,-64}, recent-ish ARM, MIPS, PPC, etc.) is smart enough to use the appropriate instructions to load directly from |p| instead of calling memcpy when possible.  I would expect clang to understand this and I'd be disappointed in MSVC if it didn't understand this.  If that's the case, we only care about the older GCC on ARM case, which we currently have to deal with, but I'd hope that we're upgrading our ARM GCC soon anyway...
Comment 13 Jeff Walden [:Waldo] (remove +bmo to email) 2012-10-30 13:58:03 PDT
More comments on the second attachment forthcoming, just want to get this out there first...

(In reply to Nathan Froyd (:froydnj) from comment #9)
> I am confused.  This sounds like you're saying we should have, e.g.
> LittleEndian::swapFromBigEndian but not BigEndian::swapFromBigEndian?  Oh,
> maybe you're saying we should have (handwaving)
> Endian<LittleEndian>::swapFromBigEndian and then have LittleEndian and
> NativeEndian support that, but not BigEndian?

Yes, this is what I was thinking.

> That seems reasonable, though I wasn't really thinking about having
> NativeEndian be a class of its own, but a typedef for one or the other,
> depending.

Obviously.  :-)

> > This may be the idealist in me speaking.  But what do you envision as the
> > use case for exposing MOZ_BIG_ENDIAN and MOZ_LITTLE_ENDIAN?  What purpose
> > would these serve, that couldn't be served by the read/write/swap functions,
> > more explicitly, without preprocessing #ifdefs to make the user code harder
> > to understand?  Right now I'm kind of thinking these would be an attractive
> > nuisance, and we shouldn't expose them.
> 
> Well, we need something like them to implement NativeEndian, yes?  Maybe
> you're just thinking of that for header-local use.

Yes, that's what I was thinking of.

> The canonical use I see for the macros is graphics code:

The format of ARGB/BGRA or whatever seems somewhat related to endianness, but mostly it seems orthogonal to me.  Wouldn't low-level color consumption/generation be better served by color-specific APIs?  (Probably they'd depend on endianness info internally, tho.)

> http://mxr.mozilla.org/mozilla-central/source/content/media/gstreamer/
> nsGStreamerReader.cpp#133

Lovely.  Dunno what to do about that, exactly -- and it doesn't handle mixed-endian either, does it.  Maybe I'll have a bright idea after looking at the feedback request.  Or something.
Comment 14 Jeff Walden [:Waldo] (remove +bmo to email) 2012-11-02 20:22:40 PDT
Comment on attachment 675771 [details]
strawman Endian.h file, v2

Yeah, this looks like what I had in mind.  What are you thinking as far as next steps?  Convert the XDR code or something to use it, so there's a demonstrably-working user, then get a full review of both parts and land it?
Comment 15 Nathan Froyd [:froydnj] 2013-01-30 13:52:28 PST
Created attachment 708306 [details] [diff] [review]
part 1 - add mfbt/Endian.h

Here's a version of things which I believe is ready for review.  MFBT's
tests pass locally with this, including the tests which I'm about to
attach.  I'll also attach patches that tweak interesting parts of the
JS engine; I have not tried the JS tests yet.
Comment 16 Nathan Froyd [:froydnj] 2013-01-30 13:53:14 PST
Created attachment 708307 [details] [diff] [review]
part 2 - add tests for mfbt/Endian.h

The tests are verbose, but I think I more-or-less covered everything.  This
stuff is repetitive enough that it definitely needs a second pair of eyes.
Comment 17 Nathan Froyd [:froydnj] 2013-01-30 13:53:36 PST
Created attachment 708308 [details] [diff] [review]
part 3 - convert SHA1.cpp to use Endian.h

This is trivial.
Comment 18 Nathan Froyd [:froydnj] 2013-01-30 13:54:39 PST
Created attachment 708311 [details] [diff] [review]
part 4 - convert the jsclone bits to use Endian.h

This is the first interesting use of Endian.h; a lot of the other uses I
have patched in locally are simply s/IS_LITTLE_ENDIAN/MOZ_LITTLE_ENDIAN/,
which is pretty boring.
Comment 19 Nathan Froyd [:froydnj] 2013-01-30 13:56:19 PST
Created attachment 708313 [details] [diff] [review]
part 5 - convert xdr bits to use Endian.h

...and here's the other interesting part in the JS engine.

The typed array bits could be converted, but we'd need an API for just
swapping bytes, or some cleverness with MOZ_LITTLE_ENDIAN and friends.
Comment 20 Nathan Froyd [:froydnj] 2013-01-31 07:47:50 PST
Comment on attachment 708313 [details] [diff] [review]
part 5 - convert xdr bits to use Endian.h

Review of attachment 708313 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Xdr.cpp
@@ +74,3 @@
>      } else {
>          const uint8_t *ptr = buf.read(nbytes);
> +        mozilla::NativeEndian::copyAndSwapFromLittleEndian(chars, ptr, nchars);

It turns out that this bit is not quite right: copyAndSwapFromLittleEndian is:

template<T>
void copyAndSwapFromLittleEndian(void*, const T*, uint32_t);

and so we wind up copying bytes when we really wanted to copy jschars.  The easy solution is to change the prototype:

template<T>
void copyAndSwapFromLittleEndian(T*, const void*, uint32_t);

along with several other changes.  But the ease of screwing this up gives me a little pause about the API...
Comment 21 Nathan Froyd [:froydnj] 2013-02-15 14:00:47 PST
Created attachment 714581 [details] [diff] [review]
part 1 - add mfbt/Endian.h

Fixed Endian.h with copyTo adjusted appropriately
Comment 22 Nathan Froyd [:froydnj] 2013-02-15 14:01:23 PST
Created attachment 714582 [details] [diff] [review]
part 2 - add tests for mfbt/Endian.h

...and fixed tests for copyTo/copyFrom differences
Comment 23 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-21 17:37:51 PST
Comment on attachment 714581 [details] [diff] [review]
part 1 - add mfbt/Endian.h

Review of attachment 714581 [details] [diff] [review]:
-----------------------------------------------------------------

I like a lot of this, mostly it's nits and slight tweaks to go.  But it's a bunch of them, and I doubtless have those more-important reviews backing up again after I cleared them out yesterday, so let's get this out so we can keep working independently til the next iteration's done.  Subsequent patches (excepting the test file -- that I'll look at once we're done with this patch) look basically reasonable, with a couple comments I'll spew out (figured I had to look at those patches to grasp the implications of this one).

Regarding the "a little pause" bit.  Fundamentally the problem is we're handing in a value of the wrong type.  With a cast, or a value of the right type from the start, we'd be good.  Maybe I'm missing something, but it seems like the only real workaround is to not use templates for the src/dest type, and to have N methods for everything.  Then it'd be explicit at the call site what you were using, independent of the type of the pointers being passed in.  That seems like it'd be a huge API, tho -- instead of the 18 methods we have now (and those just in the public interface!), having over a hundred?  No thanks.  :-\

In that particular case we'd be doing byte-swapping, so just not having specializations for uint8_t/int8_t would address the problem.  Given the serialization-to-bytes cases seem the likeliest to have possible mis-swapping failures, I think that makes a good case for removing byte-only-swapping by only having 16/32/64 versions of these methods.  Was SCOutput::writeArray the one use case you found for swapping on bytes?  If it's just that one case, I'd be inclined to have that one place have its own templatized Swap method with a |sizeof(T) == 1| specialization.  I think we need more than one user to justify no-op swapping.

This looks pretty great except for surface issues, and it'll be awesome to have to done, once, when it lands.

::: mfbt/Endian.h
@@ +7,5 @@
> +
> +#ifndef mozilla_Endian_h_
> +#define mozilla_Endian_h_
> +
> +#include "mozilla/Assertions.h"

Particularly given all the private/protected inheritance and usings and disconnects between the final implementation classes and the actual interface exposed, this file's going to need a hefty documentation comment laying out the overall interface and providing examples.  The comments throughout here are good for explaining stuff to someone reading and really understanding this all, but I'm pretty sure someone who isn't writing or reviewing the patch is going to be hopelessly lost.  :-(

I don't want to declare this fixed before it's truly usable, but docs wording also seems like a good way to hold up the entire thing here if we let it.  So perhaps a followup patch in this bug?

@@ +13,5 @@
> +#include "mozilla/StandardInteger.h"
> +
> +#include <string.h>
> +
> +#if _MSC_VER >= 1300

|defined(_MSC_VER) && _MSC_VER >= 1300| so as not to make people double-take.  (Technically it's all well-defined that this'll expand to 0 >= 1300, but it's not what the reader would expect if he hadn't read specs about it, or seen the pattern in action before.)

@@ +15,5 @@
> +#include <string.h>
> +
> +#if _MSC_VER >= 1300
> +#include <stdlib.h>
> +#pragma intrinsic(_byteswap_ushort)

#if _MSC_VER >= 1300
#  include <stdlib.h>
#  pragma intrinsic(_byteswap_ushort)
#  pragma ...
#  pragma ...
#endif

to make clearer the control(ish) flow.

@@ +34,5 @@
> +#    define MOZ_LITTLE_ENDIAN 1
> +#  elif __BIG_ENDIAN__
> +#    define MOZ_BIG_ENDIAN 1
> +#  endif
> +/* Some versions of GCC provide architecture independent macros for this.  */

architecture-independent

Please move this inside the #elif that it annotates.  Having it above like this is a little funky, because it's logically within the previous #elif.  Probably you want one comment for both this and the "Yes, there are...", //-style to save space.

@@ +35,5 @@
> +#  elif __BIG_ENDIAN__
> +#    define MOZ_BIG_ENDIAN 1
> +#  endif
> +/* Some versions of GCC provide architecture independent macros for this.  */
> +#elif defined(__BYTE_ORDER__) && defined(__ORDER_LITTLE_ENDIAN__)

Probably a good idea to add defined(__ORDER_BIG_ENDIAN__) to be safe.

@@ +41,5 @@
> +#  if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +#    define MOZ_LITTLE_ENDIAN 1
> +#  elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +#    define MOZ_BIG_ENDIAN 1
> +#  endif

An #else / #error "Can't handle mixed-endian systems right now." seems like it'd be slightly nicer for people who try compiling on those systems.

@@ +45,5 @@
> +#  endif
> +/*
> + * Can't include useful headers like <endian.h> or <sys/isa_defs.h> here
> + * because they're not present on all platforms.  Instead have a big
> + * conditional that hopefully catches all the interesting cases.

"We can't".

Also, *awesome*.  I love C++.

@@ +66,5 @@
> +#endif
> +
> +#if MOZ_BIG_ENDIAN
> +#  define MOZ_LITTLE_ENDIAN 0
> +#elif MOZ_LITTLE_ENDIAN

We're still exposing MOZ_BIG_ENDIAN and MOZ_LITTLE_ENDIAN?  I think I was slightly worried about that at one point, but you had reasons to expose it that weren't necessarily *wrong*, just using endianness to solve a related-but-different problem.  I'd like to have a followup to at least examine all the cases and determine if something else makes more sense in them (like a color-encoding header, for the argb/bgra issue I remember seeing here).

@@ +72,5 @@
> +#else
> +#  error "Cannot determine endianness"
> +#endif
> +
> +#ifdef __cplusplus

I think we can get rid of the __cplusplus guard now, as SpiderMonkey has a C++ API now.  Please do that?

@@ +81,5 @@
> +
> +enum Endianness {
> +  Little,
> +  Big
> +};

I'd one-liner this, myself.

@@ +86,5 @@
> +
> +#if MOZ_BIG_ENDIAN
> +#  define MOZ_NATIVE_ENDIANNESS mozilla::detail::Big
> +#else
> +#  define MOZ_NATIVE_ENDIANNESS mozilla::detail::Little

These can be detail::*, no need for the mozilla:: prefix, which helps a little with understanding the usable scope for the macro.

@@ +89,5 @@
> +#else
> +#  define MOZ_NATIVE_ENDIANNESS mozilla::detail::Little
> +#endif
> +
> +template<enum Endianness Endianness>

There were some issues in CheckedInt with a template parameter that had the same name as a type -- can't remember where, exactly.  Even if they were tier-3 or whatever, tho, this is just confusing.  :-)  I don't have a good suggestion for a name here -- maybe <Endianness ThisEndian>?

@@ +145,5 @@
> +    /*
> +     * Write a 16-bit unsigned integer to p in the endianness given by
> +     * Endianness.
> +     */
> +    static void writeUint16(void *p, uint16_t val) {

|void* p| here and a bunch of other places.

@@ +194,5 @@
> +     * native-endian format and you need it to appear in little-endian
> +     * format for transmission.
> +     */
> +    template<typename T>
> +    MOZ_WARN_UNUSED_RESULT static T swapToLittleEndian(T value) {

So, right now everything's purely T-generic, which seems a bit generous.  There's really no safe way to encode or decode anything that's not a fixed-size {u,}int{8,16,32,64}_t.  So I think we want to restrict these to taking *only* those types, and no others.  It's a little messy to write out, but it shouldn't be too bad.  How about we start mfbt/CustomTraits.h and seed it with this (FalseType/TrueType are in a patch awaiting review in bug 835542, hopefully it'll be landed by the next time I see this):

#include "mozilla/TypeTraits.h"

namespace mozilla {

/** TrueType if T is {u,}int{8,16,32,64}_t, FalseType otherwise. */
template<typename T>
struct IsFixedSizeType : FalseType {};

template<> struct IsFixedSizeType<uint8_t> : TrueType {};
template<> struct IsFixedSizeType<int8_t> : TrueType {};
template<> struct IsFixedSizeType<uint16_t> : TrueType {};
template<> struct IsFixedSizeType<int16_t> : TrueType {};
template<> struct IsFixedSizeType<uint32_t> : TrueType {};
template<> struct IsFixedSizeType<int32_t> : TrueType {};
template<> struct IsFixedSizeType<uint64_t> : TrueType {};
template<> struct IsFixedSizeType<int64_t> : TrueType {};

} // namespace mozilla

And then in this header, have

template<typename T>
struct EndianType
{
    typedef typename EnableIf<IsFixedSizeType<T>::value, T>::Type;
};

and instead of T in the methods, use |typename detail::EndianType<T>::Type|.  Maybe only in one place in the signature (return type perhaps), not all, to slightly aid readability.

We have the stdint tests encoded duplicatively in several places now, so having one common implementation in one place will benefit more than just this code.  CheckInt.h has some now, mozilla::Abs will have some when it lands, so it seems worth having one instance of it now.

@@ +198,5 @@
> +    MOZ_WARN_UNUSED_RESULT static T swapToLittleEndian(T value) {
> +      return swap(value, Little);
> +    }
> +    /*
> +     * Copies count values of type T starting at src to dst, converting

Let's use |dest| to read slightly better and be consistent with memcpy/memmove/etc. manpages and such.

@@ +200,5 @@
> +    }
> +    /*
> +     * Copies count values of type T starting at src to dst, converting
> +     * them to little-endian format if Endianness is Big.
> +     * As with memcpy, dst and src should not overlap.

must not overlap, for RFC-compliance.  :-)

@@ +204,5 @@
> +     * As with memcpy, dst and src should not overlap.
> +     */
> +    template<typename T>
> +    static void copyAndSwapToLittleEndian(void* dst, const T* src,
> +                                          unsigned int count = 1) {

size_t for count.  I think we should make it non-optional, too -- for clarity everywhere, and because this would look like it does the wrong thing for arrays (using up only one element, not all of them).

@@ +261,5 @@
> +    }
> +
> +    /*
> +     * Converts a value of type T (an integral type) from little-endian format.
> +     * Conversion is only done if Endianness is Big.

The "only done if" part is implicit in the API, seems to me -- I wouldn't mention it.

@@ +270,5 @@
> +    }
> +    /*
> +     * Copies count values of type T starting at src to dst, converting
> +     * them to little-endian format if Endianness is Big.
> +     * As with memcpy, dst and src should not overlap.

must not, again

@@ +342,5 @@
> +    /*
> +     * Read a value of type T from p in the endianness given by other.
> +     */
> +    template<typename T>
> +    static T readAndConvert(const void* p, enum Endianness other) {

Get rid of the |enum | prefix here, and everywhere else -- we're C++, don't need the qualifier.

I think it would be better if |Endianness other| were passed as a template parameter, not as an argument, to all the functions that currently take an Endianness argument.  It's a compile-time constant, and we really want the endian-specific code to be inlined at the call site.  Throwing an extra argument in, with branches and such that have to be dead-code-eliminated, makes enough extra work for compilers (and readers, I think) that we just shouldn't do it.

Also, "other" is not quite illuminating enough.  What about |Endianness DestEndian| for the endianness of the destination?

@@ +369,5 @@
> +      memcpy(p, &tmp, sizeof(T));
> +    }
> +
> +    template<typename T>
> +    static void swapInPlace(T* p, unsigned int count, enum Endianness other) {

I find myself wondering if these looping methods are going to ever want to be out-of-lined at some point.  I suppose let's stick with inline versions, then if we see bloat we can move the implementations into separate out-of-line functions called inline.

@@ +373,5 @@
> +    static void swapInPlace(T* p, unsigned int count, enum Endianness other) {
> +      if (Endianness == other || sizeof(T) == 1)
> +        return;
> +
> +      for (unsigned int i = 0; i < count; ++i)

size_t.  In fact it looks like almost every |unsigned int| in the file should be a |size_t| -- double-check 'em all.  You'll need a #include <stddef.h> for size_t, if memory serves.

@@ +379,5 @@
> +    }
> +
> +    template<typename T>
> +    static void copyAndSwapTo(void* dst, const T* src, unsigned int count,
> +                              enum Endianness other) {

Assert that dst and src don't overlap, or people will accidentally do it at some point.

@@ +385,5 @@
> +        memcpy(dst, src, count * sizeof(T));
> +        return;
> +      }
> +
> +      uint8_t* byteDstPtr = static_cast<uint8_t*>(dst);

Why not |T* destT = static_cast<T*>(dest);| so you can just do |destT++| in the method?  Less sizeof(T) seems better ideally.  Something about having misaligned pointers or so, that results in actual wrong code by spec maybe?

@@ +400,5 @@
> +        memcpy(dst, src, count * sizeof(T));
> +        return;
> +      }
> +
> +      const uint8_t* byteSrcPtr = static_cast<const uint8_t*>(src);

Why not use |src| here?  Obviously I'm missing something about this.  :-)

@@ +420,5 @@
> +      // It is sometimes useful to byteswap single-byte values.
> +      // To aid in such cases, we handle 1-byte sizes here.
> +      switch (sizeof(T)) {
> +      case 1: return value;
> +#if defined(__GNUC__)

This picks up both clang and gcc, but mfbt tries to handle the two compilers distinctly.  Could you copy this into an #if defined(__clang__), change the |case 2| to a |__builtin_bswap16(value)|, and then have this section as an #elif defined(__GNUC__)?

@@ +430,5 @@
> +      case 4: return _byteswap_ulong(value);
> +      case 8: return _byteswap_uint64(value);
> +#else
> +      case 2:
> +        return ((value & 0x00ff) << 8) | ((value & 0xff00) >> 8);

Smack a T() around the whole thing, please.  Same for the 4/8 cases as well.

@@ +435,5 @@
> +      case 4:
> +        return ((value & 0x000000ffU) << 24) |
> +               ((value & 0x0000ff00U) << 8) |
> +               ((value & 0x00ff0000U) >> 8) |
> +               ((value & 0xff000000U) >> 24);

All these bit-ands, shifts, bit-ors, etc. worry me as far as compiler warnings go.  It's uncompiled code, in theory, but it's also part of the function.  And we've had bugs because of uncompiled code in a compiled function triggering warnings, so this isn't simply theoretical.  I think we need this:

namespace detail {

template<typename T, size_t Size = sizeof(T)>
inline T
Swap(T value);

template<typename T>
inline T
Swap<T, 1>(T value)
{
  return value;
}

template<typename T>
inline T
Swap<T, 2>(T value)
{
#if defined(__clang__)
  return T(__builtin_bswap16(value));
#elif defined(_MSC_VER)
  return T(_byteswap_ushort(value));
#else
  return T(((value & 0x00ff) << 8) | ((value & 0xff00) >> 8));
#endif
}

template<typename T>
inline T
Swap<T, 4>(T value)
{
#if defined(__clang__) || defined(__GNUC__)
  return T(__builtin_bswap32(value));
#elif defined(_MSC_VER)
  return T(_byteswap_ulong(value));
#else
  return T(((value & 0x000000ffU) << 24) |
           ((value & 0x0000ff00U) << 8) |
           ((value & 0x00ff0000U) >> 8) |
           ((value & 0xff000000U) >> 24)));
#endif
}

template<typename T>
inline T
Swap<T, 8>(T value)
{
#if defined(__clang__) || defined(__GNUC__)
  return T(__builtin_bswap64(value));
#elif defined(_MSC_VER)
  return T(_byteswap_uint64(value));
#else
  return T(((value & 0x00000000000000ffULL) << 56) |
           ((value & 0x000000000000ff00ULL) << 40) |
           ((value & 0x0000000000ff0000ULL) << 24) |
           ((value & 0x00000000ff000000ULL) << 8) |
           ((value & 0x000000ff00000000ULL) >> 8) |
           ((value & 0x0000ff0000000000ULL) >> 24) |
           ((value & 0x00ff000000000000ULL) >> 40) |
           ((value & 0xff00000000000000ULL) >> 56));
#endif
}

} // namespace detail

Then you'd have:

template<Endianness DestEndian, typename T>
static T swap(T value) {
  if (ThisEndian == DestEndian)
    return t;
  return Swap(value);
}

and calls would be swap<Big>(value), swap<Little>(value), or swap(MOZ_NATIVE_ENDIANNESS>(value).

This also happens to eliminate bad-size cases currently under a not-reached.

@@ +459,5 @@
> +
> +template<enum Endianness Endianness>
> +class EndianReadWrite : public Endian<Endianness>
> +{
> +private:

template<Endianness ThisEndian>
class EndianReadWrite : public Endian<ThisEndian>
{
  private:

Basically same changes as for NativeEndian later.

@@ +481,5 @@
> +} /* namespace detail */
> +
> +class LittleEndian MOZ_FINAL : public detail::EndianReadWrite<detail::Little>
> +{
> +};

Feel free to put {} on the same line with LittleEndian and BigEndian.

@@ +483,5 @@
> +class LittleEndian MOZ_FINAL : public detail::EndianReadWrite<detail::Little>
> +{
> +};
> +
> +class BigEndian MOZ_FINAL : public detail::EndianReadWrite<detail::Big>

Probably we should have |typedef BigEndian NetworkEndian;| too, if we're having the synonyms among the class methods too.

@@ +488,5 @@
> +{
> +};
> +
> +class NativeEndian MOZ_FINAL : public detail::Endian<MOZ_NATIVE_ENDIANNESS>
> +{

The contents of this class should be indented another two spaces:

class NativeEndian
{
  private:
    typedef ...

  public:
    /* These functions are... */

@@ +496,5 @@
> +public:
> +  /*
> +   * These functions are intended for cases where you have data in your
> +   * native-endian format and you need the data to appear in the appropriate
> +   * endianness for transmission.

for transmission, serialization, etc.  (I wouldn't call writing to disk "transmission", just want to not exclude actual use cases here.  I'm not too picky about the exact wording so long as it's not missing cases -- "such as" language would be cool too.)
Comment 24 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-21 17:38:33 PST
Comment on attachment 708308 [details] [diff] [review]
part 3 - convert SHA1.cpp to use Endian.h

Review of attachment 708308 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/SHA1.cpp
@@ +19,5 @@
>  
>  static inline unsigned
>  SHA_HTONL(unsigned x)
>  {
> +  return mozilla::NativeEndian::swapToBigEndian(x);

Put a using mozilla::NativeEndian; at the top, and just inline NativeEndian::swapToBigEndian(x) calls at the call sites.
Comment 25 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-21 17:39:30 PST
Comment on attachment 708311 [details] [diff] [review]
part 4 - convert the jsclone bits to use Endian.h

Review of attachment 708311 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsclone.cpp
@@ +103,5 @@
>  {
>      const uint64_t *point = data;
>      const uint64_t *end = data + nbytes / 8;
>  
> +    uint64_t u = mozilla::LittleEndian::readUint64(point++);

Put a using mozilla::LittleEndian at the top of the file, please, and drop the mozilla:: prefix everywhere.

@@ +254,5 @@
>      size_t nwords = JS_HOWMANY(nelems, sizeof(uint64_t) / sizeof(T));
>      if (nelems + sizeof(uint64_t) / sizeof(T) - 1 < nelems || nwords > size_t(end - point))
>          return eof();
>  
> +    mozilla::NativeEndian::copyAndSwapFromLittleEndian(p, point, nelems);

using mozilla::NativeEndian and drop mozilla:: everywhere too.
Comment 26 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-21 17:49:18 PST
Comment on attachment 708313 [details] [diff] [review]
part 5 - convert xdr bits to use Endian.h

Review of attachment 708313 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Xdr.cpp
@@ +74,3 @@
>      } else {
>          const uint8_t *ptr = buf.read(nbytes);
> +        mozilla::NativeEndian::copyAndSwapFromLittleEndian(chars, ptr, nchars);

It would also help if both pointers were the same type (ignoring constness of the pointee).  I guess the reason not to, for that, is that serialization is often going to be to unsigned char/uint8_t/char pointer.  But then again, as the general rule goes, explicit is better than implicit.  Maybe the current API is an exception, but unless C++'s memory model doesn't like having unaligned pointers (even if the underlying stuff is never accessed directly through them, only through uint8_t*).  Not sure.  Comments appreciated.

::: js/src/vm/Xdr.h
@@ -87,5 @@
>      uint8_t     *cursor;
>      uint8_t     *limit;
>  };
>  
> -/* We use little-endian byteorder for all encoded data */

We definitely want to preserve this comment somewhere -- probably in an overview comment for XDRState maybe like so:

/*
 * XDR serialization/deserialization state.  All data is encoded in little-endian format.
 */

@@ +129,3 @@
>          } else {
> +            const uint8_t *ptr = buf.read(sizeof *n);
> +            *n = mozilla::LittleEndian::readUint16(ptr);

I was almost going to say that using LittleEndian both places is wrong, and one should be NativeEndian, but because of read* not being on NativeEndian that would obviously have been wrong at compile time.  I count this as a suggestion we're on the right track with the big/little/native interface separation.  :-)
Comment 27 Nathan Froyd [:froydnj] 2013-02-22 13:03:25 PST
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #23)
> I like a lot of this, mostly it's nits and slight tweaks to go.  But it's a
> bunch of them, and I doubtless have those more-important reviews backing up
> again after I cleared them out yesterday, so let's get this out so we can
> keep working independently til the next iteration's done.

Thanks for the review.  I think the only things I really need to comment on are:
 
> In that particular case we'd be doing byte-swapping, so just not having
> specializations for uint8_t/int8_t would address the problem.  Given the
> serialization-to-bytes cases seem the likeliest to have possible
> mis-swapping failures, I think that makes a good case for removing
> byte-only-swapping by only having 16/32/64 versions of these methods.  Was
> SCOutput::writeArray the one use case you found for swapping on bytes?  If
> it's just that one case, I'd be inclined to have that one place have its own
> templatized Swap method with a |sizeof(T) == 1| specialization.  I think we
> need more than one user to justify no-op swapping.

I think it was.  I'd have to grovel through my patch queue to confirm.

> @@ +385,5 @@
> > +        memcpy(dst, src, count * sizeof(T));
> > +        return;
> > +      }
> > +
> > +      uint8_t* byteDstPtr = static_cast<uint8_t*>(dst);
> 
> Why not |T* destT = static_cast<T*>(dest);| so you can just do |destT++| in
> the method?  Less sizeof(T) seems better ideally.  Something about having
> misaligned pointers or so, that results in actual wrong code by spec maybe?

Yes, exactly.

> @@ +400,5 @@
> > +        memcpy(dst, src, count * sizeof(T));
> > +        return;
> > +      }
> > +
> > +      const uint8_t* byteSrcPtr = static_cast<const uint8_t*>(src);
> 
> Why not use |src| here?  Obviously I'm missing something about this.  :-)

Maybe you are missing that |src| is of type |void*| here?  So we have the same concerns as the |void* dst| case above.

> @@ +430,5 @@
> > +      case 4: return _byteswap_ulong(value);
> > +      case 8: return _byteswap_uint64(value);
> > +#else
> > +      case 2:
> > +        return ((value & 0x00ff) << 8) | ((value & 0xff00) >> 8);
> 
> Smack a T() around the whole thing, please.  Same for the 4/8 cases as well.

Is this just "explicit is better than implicit" (implicit being the conversion the compiler will perform to the return type) in action?  If so, why not T() for the compiler intrinsic cases earlier?

> @@ +435,5 @@
> > +      case 4:
> > +        return ((value & 0x000000ffU) << 24) |
> > +               ((value & 0x0000ff00U) << 8) |
> > +               ((value & 0x00ff0000U) >> 8) |
> > +               ((value & 0xff000000U) >> 24);
> 
> All these bit-ands, shifts, bit-ors, etc. worry me as far as compiler
> warnings go.  It's uncompiled code, in theory, but it's also part of the
> function.  And we've had bugs because of uncompiled code in a compiled
> function triggering warnings, so this isn't simply theoretical.  I think we
> need this:
> 
> namespace detail {
> 
> template<typename T, size_t Size = sizeof(T)>
> inline T
> Swap(T value);

I like the elimination of the not-reached warnings.  I guess we'll not raise any concerns about template instantiation slowing down compile-time yet. :)
Comment 28 Jeff Walden [:Waldo] (remove +bmo to email) 2013-02-22 13:13:45 PST
(In reply to Nathan Froyd (:froydnj) from comment #27)
> > Why not |T* destT = static_cast<T*>(dest);| so you can just do |destT++| in
> > the method?  Less sizeof(T) seems better ideally.  Something about having
> > misaligned pointers or so, that results in actual wrong code by spec maybe?
> 
> Yes, exactly.

So does using T* result in actually wrong code?

> > Smack a T() around the whole thing, please.  Same for the 4/8 cases as well.
> 
> Is this just "explicit is better than implicit" (implicit being the
> conversion the compiler will perform to the return type) in action?  If so,
> why not T() for the compiler intrinsic cases earlier?

The Swap thing I spewed out has T() everywhere.  Mostly it's about explicitness, but partly it's that it's unclear to me that the builtins return T, exactly.

>I guess we'll not raise any concerns about template instantiation slowing
> down compile-time yet. :)

There are much more complex template instantiations a lot of other places, trust me.  :-)
Comment 29 Nathan Froyd [:froydnj] 2013-02-22 13:54:57 PST
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #28)
> (In reply to Nathan Froyd (:froydnj) from comment #27)
> > > Why not |T* destT = static_cast<T*>(dest);| so you can just do |destT++| in
> > > the method?  Less sizeof(T) seems better ideally.  Something about having
> > > misaligned pointers or so, that results in actual wrong code by spec maybe?
> > 
> > Yes, exactly.
> 
> So does using T* result in actually wrong code?

I think so.  It's hard to tell on x86, because the compiler is smart enough to realize that the chip doesn't care about alignment.  ARM would be a better target, but I don't have ARM development bits and it's entirely possible that we (a) target chips that can fix up misalignments in hardware; and (b) the compiler understands that.  (I freely admit my ignorance here as to the exact targets and/or capabilities of the compiler we use.)  And I don't have a MIPS machine set up, either.

I realize you could code inspect all this, but I suspect a lot of the buffers involved in these operations are allocated by |malloc| or equivalent.  And |malloc| has to allocate buffers appropriately aligned for the largest alignment of the platform.  So you wind up with a bunch of pointers that are |unsigned char*| but at runtime happen to have alignments appropriate for |int*| or similar.  Code inspection might then turn up lots of problematic cases that *do* actually work at runtime, even on strict alignment platforms.

I think we should be safe and assume the worst about the alignment of the pointers involved.  We'll DTRT on all the x86oids, the ARM compilers can probably fix everything up on the mobile devices, assuming appropriate targeting, and we won't get complaints about Mozilla programmers being unable to code to the standard. ;)

> > > Smack a T() around the whole thing, please.  Same for the 4/8 cases as well.
> > 
> > Is this just "explicit is better than implicit" (implicit being the
> > conversion the compiler will perform to the return type) in action?  If so,
> > why not T() for the compiler intrinsic cases earlier?
> 
> The Swap thing I spewed out has T() everywhere.  Mostly it's about
> explicitness, but partly it's that it's unclear to me that the builtins
> return T, exactly.

Ah, I cut the Swap implementations from my reply and totally missed all the T()s there.  My mistake.
Comment 30 Nathan Froyd [:froydnj] 2013-03-22 04:37:50 PDT
Created attachment 728150 [details] [diff] [review]
part 1 - add mfbt/Endian.h

Here's a version of Endian.h that I believe addresses your previous review
comments.  It depends on IsIntegral being added; I went with what seems to
me a simpler way to enforce integral types.  Ideally it will provide better
error messages if somebody does something wrong.
Comment 31 Nathan Froyd [:froydnj] 2013-03-22 04:38:26 PDT
Created attachment 728151 [details] [diff] [review]
part 2 - add tests for mfbt/Endian.h

Previous versions of tests had some issues.  These tests at least pass.
Comment 32 Jeff Walden [:Waldo] (remove +bmo to email) 2013-03-26 16:54:31 PDT
Comment on attachment 728150 [details] [diff] [review]
part 1 - add mfbt/Endian.h

Looking at this, I got *really* confused about the meanings of the endianness parameters to readAndConvert, writeAndConvert, and the other internal utility functions.  I think the problem is that ThisEndian has different implicit meanings in different places, and that it would be better for every method taking an endianness to take *both* source and destination endiannesses.

I decided it'd be easier to just make the change than to attempt to explain myself (I'm doubtless unclear from just that description).  Plus I'd be able to actually read the code as I imagined it.  I've done that (and made other minor stylistic changes) and tested it against the tests here (passes little-endian at least, dunno about others), and I think it's definitely better.  I'll post in a sec to see what you have to say about it.
Comment 33 Jeff Walden [:Waldo] (remove +bmo to email) 2013-03-26 16:59:02 PDT
Created attachment 729881 [details] [diff] [review]
Part 1, revised - add mfbt/Endian.h

I think this is much more readable -- every thing that might swap takes both source and destination endianness, so it's obvious at each call site whether the passed endiannesses are correct.  All swapping stuff is in EndianUtils, so there's never a chance of confusion about how ThisEndian interacts with a DestEndian.  Plus it shaves off 25-30 lines or so.  What do you think?
Comment 34 Nathan Froyd [:froydnj] 2013-03-27 06:42:05 PDT
Comment on attachment 729881 [details] [diff] [review]
Part 1, revised - add mfbt/Endian.h

Review of attachment 729881 [details] [diff] [review]:
-----------------------------------------------------------------

You're right, that is more straightforward to keep track of.
Comment 35 Jeff Walden [:Waldo] (remove +bmo to email) 2013-04-02 13:35:10 PDT
Comment on attachment 728151 [details] [diff] [review]
part 2 - add tests for mfbt/Endian.h

Review of attachment 728151 [details] [diff] [review]:
-----------------------------------------------------------------

Remove the various bits of trailing whitespace I see here, please.

::: mfbt/tests/TestEndian.cpp
@@ +6,5 @@
> +#include "mozilla/Endian.h"
> +#include "mozilla/Scoped.h"
> +#include "mozilla/Util.h"
> +
> +using namespace mozilla;

Make this:

using mozilla::BigEndian;
using mozilla::LittleEndian;
using mozilla::NativeEndian;

@@ +40,5 @@
> +}
> +
> +// The standardese around explicit instantiation of templates is not
> +// clear with regards to how Endian.h functions are laid out.  Provide
> +// these wrappers to make things explicit.

Please add a link to the gcc bug report you mentioned on IRC.  (About which, <http://j.mp/XosS6S>.)

@@ +95,5 @@
> +  const size_t arraySize = 2 * count;
> +  const size_t bufferSize = arraySize * sizeof(T);
> +  ScopedDeleteArray<uint8_t> buffer(new uint8_t[bufferSize]);
> +  const uint8_t fillValue = 0xa5;
> +  ScopedDeleteArray<uint8_t> checkBuffer(new uint8_t[bufferSize]);

With count as a template parameter, these can just be |static uint8_t checkBuffer[bufferSize];| and no need for allocations and memory management.

@@ +97,5 @@
> +  ScopedDeleteArray<uint8_t> buffer(new uint8_t[bufferSize]);
> +  const uint8_t fillValue = 0xa5;
> +  ScopedDeleteArray<uint8_t> checkBuffer(new uint8_t[bufferSize]);
> +
> +  MOZ_ASSERT(bufferSize > 2*sizeof(T));

Spaces around *.

@@ +154,5 @@
> +			  nValues * sizeof(T)) == 0);
> +      }
> +      for (size_t i = 0; i < nValues; ++i) {
> +	MOZ_ASSERT(readerFunc(buffer + startPosition + i) == values[i]);
> +      }

This loop shouldn't be braced.

@@ +194,5 @@
> +			  nValues * sizeof(T)) == 0);
> +      }
> +      for (size_t i = 0; i < nValues; ++i) {
> +	MOZ_ASSERT(readerFunc(buffer + startPosition + i) == values[i]);
> +      }

Nor this one.

@@ +221,5 @@
> +SPECIALIZE_READER(int64_t, readInt64)
> +
> +template<typename T>
> +void
> +TestBulkSwap(const T* bytes, size_t count)

template<typename T, size_t Count>
void
TestBulkSwap(const T (&arr)[Count])

and keep propagating the array, as an array, and the count as a template parameter.

Same for the other TestBulk* methods, looks like.

@@ +285,5 @@
> +{
> +  static const uint8_t unsigned_bytes[16] = { 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
> +					      0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8 };
> +  static const uint8_t signed_bytes[16] = { 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8,
> +					    0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8 };

This should be int8_t.  And those initializers aren't going to fit in int8_t.  I'd probably s/f/-7/g so that you can avoid casts.  Which admittedly might make a bunch of the subsequent math a lot harder.  :-\  You could also do s/f/7/g instead, although this wouldn't test the negative-value case.  Which might be worth doing, but it's likely not worth the trouble to require it for an r+.

@@ +319,5 @@
> +	     
> +  LittleEndian::writeUint64(&buffer[0], 0x807060504030201ULL);
> +  MOZ_ASSERT(memcmp(&unsigned_bytes[0], &buffer[0], 8) == 0);
> +  BigEndian::writeUint64(&buffer[0], 0x102030405060708ULL);
> +  MOZ_ASSERT(memcmp(&unsigned_bytes[0], &buffer[0], 8) == 0);

Please make the last argument to all these memcmps a sizeof(relevant type), for readability.
Comment 36 Nathan Froyd [:froydnj] 2013-04-03 09:38:36 PDT
Created attachment 732885 [details] [diff] [review]
part 3 - convert SHA1.cpp to use Endian.h

Review comments applied.
Comment 37 Nathan Froyd [:froydnj] 2013-04-03 09:38:59 PDT
Created attachment 732886 [details] [diff] [review]
part 4 - convert the jsclone bits to use Endian.h

Review comments applied.
Comment 38 Nathan Froyd [:froydnj] 2013-04-03 09:39:25 PDT
Created attachment 732887 [details] [diff] [review]
part 5 - convert xdr bits to use Endian.h

Review comments applied.
Comment 39 Jeff Walden [:Waldo] (remove +bmo to email) 2013-04-03 11:33:40 PDT
Comment on attachment 732886 [details] [diff] [review]
part 4 - convert the jsclone bits to use Endian.h

Review of attachment 732886 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsclone.cpp
@@ +18,5 @@
>  #include "vm/StringObject-inl.h"
>  
>  using namespace js;
> +using mozilla::LittleEndian;
> +using mozilla::NativeEndian;

Blank line between the |using namespace js;| and the |using Foo;| here.

@@ -299,5 @@
>      size_t nwords = JS_HOWMANY(nelems, sizeof(uint64_t) / sizeof(T));
>      if (nelems + sizeof(uint64_t) / sizeof(T) - 1 < nelems || nwords > size_t(end - point))
>          return eof();
>  
> -    if (sizeof(T) == 1) {

Given the last Endian.h supported only T = {u,}int{16,32,64}_t, you probably need a little T-templated function to handle sizeof(T) == 1 via specialization.  Same for writeArray too.  Should be a simple enough addition, I don't need to see it.

@@ +364,5 @@
>  
>      buf.back() = 0;  /* zero-pad to an 8-byte boundary */
>  
>      T *q = (T *) &buf[start];
> +    NativeEndian::copyAndSwapToLittleEndian(q, p, nelems);

Incidentally, point/p/q as pointer names for this stuff is just awful.  :-(  Not that this should be changed here, and not that I want to think of better names for the stuff now.
Comment 40 Nathan Froyd [:froydnj] 2013-04-03 11:36:13 PDT
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #39)
> @@ -299,5 @@
> >      size_t nwords = JS_HOWMANY(nelems, sizeof(uint64_t) / sizeof(T));
> >      if (nelems + sizeof(uint64_t) / sizeof(T) - 1 < nelems || nwords > size_t(end - point))
> >          return eof();
> >  
> > -    if (sizeof(T) == 1) {
> 
> Given the last Endian.h supported only T = {u,}int{16,32,64}_t, you probably
> need a little T-templated function to handle sizeof(T) == 1 via
> specialization.  Same for writeArray too.  Should be a simple enough
> addition, I don't need to see it.

Argh, I haven't imported the new Endian.h into my tree.  Will do that...
Comment 42 Chris Peterson [:cpeterson] 2013-04-03 22:17:01 PDT
This code fails to compile for me on OSX with Apple's default clang 4.2:

In file included from mozilla/inbound/mfbt/tests/TestEndian.cpp:7:
../../dist/include/mozilla/Endian.h:163:14: error: use of undeclared identifier '__builtin_bswap16'
return T(__builtin_bswap16(value));
Comment 43 Boris Zbarsky [:bz] (Out June 25-July 6) 2013-04-03 22:19:50 PDT
At first google, clang didn't add __builtin_bswap16 until rev 165362.  That happens to be earlier than what our builders are using right now (170890), but is significantly later than what most people building on Mac are using and what we claim to support, no?
Comment 44 Chris Peterson [:cpeterson] 2013-04-03 23:56:24 PDT
Created attachment 733199 [details] [diff] [review]
workaround-clang-__builtin_bswap16.patch

This clang workaround has some tricky multilevel #ifdefs for MOZ_HAVE_BUILTIN_BYTESWAP16 because (AFAICT) the C preprocessor does not shortcut #if expressions. clang doesn't #define MOZ_GCC_VERSION_AT_LEAST() and gcc doesn't define __has_builtin(), so the MOZ_GCC_VERSION_AT_LEAST() and __has_builtin() checks must be on compiler-specific #if code paths.

The alternative was duplication of `return T(((value & 0x00ff) << 8) | ((value & 0xff00) >> 8))` for clang and gcc code paths within the Swapper::swap() function.
Comment 45 Chris Peterson [:cpeterson] 2013-04-03 23:56:43 PDT
https://tbpl.mozilla.org/?tree=Try&rev=a358daa1c077
Comment 46 Nathan Froyd [:froydnj] 2013-04-04 05:22:38 PDT
Thanks for fixing up after me, Chris!

https://hg.mozilla.org/integration/mozilla-inbound/rev/e02f86260dad
Comment 48 :Ms2ger 2013-04-07 02:13:35 PDT
Any reason js_memcpy turned into plain memcpy in <https://hg.mozilla.org/mozilla-central/rev/fa1191b1d320>?

Note You need to log in before you can comment on or make changes to this bug.