Last Comment Bug 689433 - Align nsAutoTArray<E>'s elements to their natural alignment
: Align nsAutoTArray<E>'s elements to their natural alignment
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: unspecified
: All All
: -- normal with 1 vote (vote)
: mozilla10
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
Depends on:
Blocks: 681755
  Show dependency treegraph
 
Reported: 2011-09-26 20:45 PDT by Justin Lebar (not reading bugmail)
Modified: 2011-10-07 04:02 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (27.97 KB, patch)
2011-09-28 15:11 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v2 (29.18 KB, patch)
2011-09-28 15:14 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v3 (29.18 KB, patch)
2011-09-29 12:56 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v4 (29.03 KB, patch)
2011-10-05 07:15 PDT, Justin Lebar (not reading bugmail)
justin.lebar+bug: review-
Details | Diff | Splinter Review
Patch v5 (29.07 KB, patch)
2011-10-05 12:43 PDT, Justin Lebar (not reading bugmail)
roc: review+
emorley: checkin+
Details | Diff | Splinter Review
Remove MOZ_STATIC_ASSERT (998 bytes, patch)
2011-10-06 07:37 PDT, Justin Lebar (not reading bugmail)
roc: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2011-09-26 20:45:51 PDT
nsAutoTArray<E> aligns its elements to 8-byte boundaries.  Or at least, it tries to do this; it apparently aligns to 4-bytes on Mac (see bug 681755) and maybe on Linux.

This wastes 4 bytes of space on Windows 32 for each nsAutoTArray, which is annoying.  But it also means that nsAutoTArray<E> itself must be aligned to 8-byte boundaries, which increases the size of structs containing auto arrays.
Comment 1 Justin Lebar (not reading bugmail) 2011-09-28 15:11:26 PDT
Created attachment 563195 [details] [diff] [review]
Patch v1
Comment 2 Justin Lebar (not reading bugmail) 2011-09-28 15:14:56 PDT
Created attachment 563198 [details] [diff] [review]
Patch v2

Oops; v1 is an old version of the patch.
Comment 3 Justin Lebar (not reading bugmail) 2011-09-28 15:16:50 PDT
The change nsBaseAppShell.cpp is to prevent a build error.  ipdl includes its own set of types and suppresses their redeclaration.  If we don't make the ipdl header the first one, nsTArray pulls in mfbt/Util.h and we redeclare uint8 and friends.
Comment 4 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-09-28 15:59:42 PDT
As I understand it, there are really only two auto-array layouts we care about in 32-bit systems, and one layout for 64-bit.

On 64-bit we have
  Header* mHdr;   // 0-7
  Header  header; // 8-15
  T[]     data;   // 16-
Everything's 8-byte aligned and we don't support 16-byte alignment, so we're all good.

On 32-bit we want
  Header* mHdr;   // 0-3
  Header  header; // 4-11
  T[]     data;   // 12-
when alignof(T) is <= 4, otherwise
  Header* mHdr;   // 0-3
  Header  header; // 4-11
  char[4] padding;// 12-15
  T[]     data;   // 16-
when alignof(T) is 8.

So what if we passed the alignment as an extra template parameter to nsAutoArrayBase, statically assert that it's 1, 2, 4, or 8, and then for the "32-bit platform, align-8" case, have a specialized implementation that aligns mAutoBuf appropriately? And, add the alignment as a parameter to nsTArray_base as well and specialize GetAutoArrayBufferUnsafe() based on it, to add 4 bytes to the offset in the 32-bit platform, align-8 case?

It seems to me that would be a bit simpler, and less reliant on the compiler optimizing away lots of checks in GetAutoArrayBufferUnsafe().
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-09-28 16:01:23 PDT
Actually we don't need to add any new parameters to nsTArray_base. We should be able to write an expression using align_of on the element type.
Comment 6 Justin Lebar (not reading bugmail) 2011-09-28 16:21:06 PDT
> It seems to me that would be a bit simpler, and less reliant on the compiler optimizing away lots of 
> checks in GetAutoArrayBufferUnsafe().

Just to be clear, I don't think there's a speed issue.  We don't call GetAutoArrayBuffer except in expensive operations like ShrinkCapacity and SwapArrayElements.

> Actually we don't need to add any new parameters to nsTArray_base. We should be able to write an 
> expression using align_of on the element type.

I don't understand what you're suggesting here.
Comment 7 Justin Lebar (not reading bugmail) 2011-09-28 16:25:46 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> Actually we don't need to add any new parameters to nsTArray_base. We should
> be able to write an expression using align_of on the element type.

Do you mean just that we could get rid of the |elemAlign == 1| and |elemAlign == 2| checks in GetAutoArrayBuffer?  I think we could.  I left them in because I thought removing them was a premature optimization; I was trying to make nsTArray_base rely on as few implementation details of nsAutoArrayBase as possible, although of course I didn't get very far with that goal.  :)
Comment 8 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-09-28 19:46:17 PDT
(In reply to Justin Lebar [:jlebar] from comment #6)
> > It seems to me that would be a bit simpler, and less reliant on the compiler
> > optimizing away lots of checks in GetAutoArrayBufferUnsafe().
> 
> Just to be clear, I don't think there's a speed issue.  We don't call
> GetAutoArrayBuffer except in expensive operations like ShrinkCapacity and
> SwapArrayElements.

Er, yeah. Right.

> > Actually we don't need to add any new parameters to nsTArray_base. We should be
> > able to write an expression using align_of on the element type.
> 
> I don't understand what you're suggesting here.

Yeah, it made no sense.

I mean change AutoArray to be
  struct AutoArray {
    Header *mHdr;
    PRUint32 maybeAligned;
  };

and then
  Header* GetAutoArrayBufferUnsafe(PRUint32 aAlignment) {
    PRUint32* buf = &(reinterpret_cast<AutoArray*>(&mHdr))->maybeAligned;
#ifdef IS_32BIT
    // 4 bytes of padding needed for 8-byte-aligned data on 32-bit systems
    buf += aAlignment == 8;
#endif
    return reinterpret_cast<Header*>(buf));
  }
Comment 9 Justin Lebar (not reading bugmail) 2011-09-29 10:27:52 PDT
I'm happy to do it this way, but isn't this basically the same as dropping the unnecessary elemAlign checks?

+nsTArrayHeader* nsTArray_base<Alloc>::GetAutoArrayBufferUnsafe(size_t elemAlign) {
+# define GET_AUTOBUF(align) \
+    &(reinterpret_cast<nsAutoArrayBase<nsTArray<mozilla::AlignedElem<align> >, 1>*>(this)->mAutoBuf);
+
+  NS_ASSERTION(elemAlign <= 8, "Unsupported alignment.");
+  void* autoBuf = nsnull;
+  if (sizeof(void*) == 4 && elemAlign <= 4) {
+    autoBuf = GET_AUTOBUF(4);
+  }
+  else {
+    autoBuf = GET_AUTOBUF(8);
+  }
Comment 10 Justin Lebar (not reading bugmail) 2011-09-29 12:56:14 PDT
Created attachment 563529 [details] [diff] [review]
Patch v3
Comment 11 Justin Lebar (not reading bugmail) 2011-10-04 19:23:26 PDT
Roc, I'm not in a rush here, but I wanted to make sure this patch didn't fall through the cracks.
Comment 12 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-04 19:39:04 PDT
Comment on attachment 563529 [details] [diff] [review]
Patch v3

Review of attachment 563529 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/nsTArray-inl.h
@@ +76,5 @@
> +
> +  if (MOZ_ALIGNOF(void*) == 4 && elemAlign <= 4) {
> +    return reinterpret_cast<Header*>(&THIS_AS_AUTOARRAY(4)->mAutoBuf);
> +  }
> +  return reinterpret_cast<Header*>(&THIS_AS_AUTOARRAY(8)->mAutoBuf);

I had a comment which I lost ...

Here, I think you can just make nsAutoArrayBase assume 4-byte element alignment and write
  
  ... get pointer to elements in nsAutoArrayBase ...
  if (sizeof(void*) == 4 && elemAlign == 8) {
    ... add 4 to pointer ...
  }

You don't actually need AlignedElem.
Comment 13 Justin Lebar (not reading bugmail) 2011-10-04 20:02:38 PDT
> You don't actually need AlignedElem.

I see.  This concerns me some because AIUI |alignof(void*) == 4 && elemAlign == 8| does not necessarily mean we'll have an extra 4 bytes of structure padding to skip over on all architectures.  I'm not sure we have many auto arrays with 8-byte-aligned elements, so code wouldn't necessarily break spectacularly.

We could add a test to the TArray tests, which is probably good enough, so long as they don't get turned off again...

Unfortunately we still need AlignedElem in nsAutoArrayBase because the argument to __attribute__((aligned(foo)) must be a parse-time constant (or something like that, anyway).

+  // Declare mAutoBuf aligned to the maximum of the header's alignment and
+  // elem_type's alignment.  We need to use a union rather than
+  // MOZ_ALIGNED_DECL because GCC is picky about what goes into
+  // __attribute__((aligned(foo))).
   union {
-    char mAutoBuf[sizeof(Header) + N * sizeof(elem_type)];
-    PRUint64 dummy;
+    char mAutoBuf[sizeof(nsTArrayHeader) + N * sizeof(elem_type)];
+    mozilla::AlignedElem<PR_MAX(MOZ_ALIGNOF(Header), MOZ_ALIGNOF(elem_type))> mAlign;
+  }
Comment 14 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-04 20:20:47 PDT
(In reply to Justin Lebar [:jlebar] from comment #13)
> > You don't actually need AlignedElem.
> 
> I see.  This concerns me some because AIUI |alignof(void*) == 4 && elemAlign
> == 8| does not necessarily mean we'll have an extra 4 bytes of structure
> padding to skip over on all architectures.

This is true in principle, but not in practice.

> I'm not sure we have many auto
> arrays with 8-byte-aligned elements, so code wouldn't necessarily break
> spectacularly.

You can statically assert that on a 32-bit architecture, sizeof(nsAutoTArray<double,1>) is exactly what we expect it to be.

> Unfortunately we still need AlignedElem in nsAutoArrayBase because the
> argument to __attribute__((aligned(foo)) must be a parse-time constant (or
> something like that, anyway).

I don't know why you need __attribute__((aligned(foo)).
Comment 15 Justin Lebar (not reading bugmail) 2011-10-05 05:52:19 PDT
> You can statically assert that on a 32-bit architecture, sizeof(nsAutoTArray<double,1>) 
> is exactly what we expect it to be.

Sold.

> I don't know why you need __attribute__((aligned(foo)).

When we're declaring storage for the auto buffer, we need to align the storage to 4 or 8 bytes.

We could do this directly, as

  __attribute__((aligned(PR_MAX(...))) char mAutoBuf[N * sizeof...]

except that what goes inside __attribute__((aligned)) has to be a parse-time constant, or something.

So instead, we need to use a union.  Which means, I think, we need to use template specialization to choose a type to union with which has alignment 4 or 8 bytes.  That is, we need to write

  union {
    char mAutoBuf[...];
    AlignedType<PR_MAX(...)> mAlign;
  };

So we don't need the general-purpose mozilla::AlignedElem specifically, but we do need something similar, which is defined for 4 and 8 byte alignment.
Comment 16 Justin Lebar (not reading bugmail) 2011-10-05 07:15:06 PDT
Created attachment 564837 [details] [diff] [review]
Patch v4
Comment 17 Justin Lebar (not reading bugmail) 2011-10-05 11:59:24 PDT
Comment on attachment 564837 [details] [diff] [review]
Patch v4

This is unexpectedly burning on mac...
Comment 18 Justin Lebar (not reading bugmail) 2011-10-05 12:39:12 PDT
...and it's burning because MOZ_ALIGNOF(double) == 4 on *nix x86-32.
Comment 19 Justin Lebar (not reading bugmail) 2011-10-05 12:43:24 PDT
Created attachment 564962 [details] [diff] [review]
Patch v5
Comment 20 Justin Lebar (not reading bugmail) 2011-10-05 12:44:42 PDT
https://tbpl.mozilla.org/?tree=Try&rev=15aa916a0a58
Comment 21 Justin Lebar (not reading bugmail) 2011-10-05 13:47:23 PDT
Comment on attachment 564962 [details] [diff] [review]
Patch v5

Latest push seems to be fine on try.  But I'm not sure if this is what you're looking for, Roc.
Comment 22 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-05 14:20:00 PDT
Comment on attachment 564962 [details] [diff] [review]
Patch v5

Review of attachment 564962 [details] [diff] [review]:
-----------------------------------------------------------------

This'll do!
Comment 23 Justin Lebar (not reading bugmail) 2011-10-05 16:48:31 PDT
Thanks!  Inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/9adcf4cbfc71
Comment 24 Joe Drew (not getting mail) 2011-10-05 19:32:42 PDT
You added a bunch of PR_STATIC_ASSERTs, but also a definition for MOZ_STATIC_ASSERT. Did you mean to?
Comment 25 Justin Lebar (not reading bugmail) 2011-10-06 07:15:54 PDT
I'd been using the MOZ_STATIC_ASSERT in mfbt/Util.h, but it later became unnecessary.  I did mean to use PR_STATIC_ASSERT in the TArray code.

I can get rid of MOZ_STATIC_ASSERT; yagni and all that.
Comment 26 Justin Lebar (not reading bugmail) 2011-10-06 07:37:22 PDT
Created attachment 565219 [details] [diff] [review]
Remove MOZ_STATIC_ASSERT
Comment 27 Ed Morley [:emorley] 2011-10-06 08:48:09 PDT
Comment on attachment 564962 [details] [diff] [review]
Patch v5

https://hg.mozilla.org/mozilla-central/rev/9adcf4cbfc71
Comment 28 Ed Morley [:emorley] 2011-10-06 08:48:49 PDT
Leaving open for remaining patch.
Comment 29 Ed Morley [:emorley] 2011-10-07 04:02:47 PDT
Followup: https://hg.mozilla.org/mozilla-central/rev/831fcf93a1fe

Note You need to log in before you can comment on or make changes to this bug.