Closed Bug 689433 Opened 8 years ago Closed 8 years ago

Align nsAutoTArray<E>'s elements to their natural alignment

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(2 files, 4 obsolete files)

nsAutoTArray<E> aligns its elements to 8-byte boundaries.  Or at least, it tries to do this; it apparently aligns to 4-bytes on Mac (see bug 681755) and maybe on Linux.

This wastes 4 bytes of space on Windows 32 for each nsAutoTArray, which is annoying.  But it also means that nsAutoTArray<E> itself must be aligned to 8-byte boundaries, which increases the size of structs containing auto arrays.
Assignee: nobody → justin.lebar+bug
Attached patch Patch v1 (obsolete) — Splinter Review
Attachment #563195 - Flags: review?(roc)
Attached patch Patch v2 (obsolete) — Splinter Review
Oops; v1 is an old version of the patch.
Attachment #563198 - Flags: review?(roc)
Attachment #563195 - Attachment is obsolete: true
Attachment #563195 - Flags: review?(roc)
The change nsBaseAppShell.cpp is to prevent a build error.  ipdl includes its own set of types and suppresses their redeclaration.  If we don't make the ipdl header the first one, nsTArray pulls in mfbt/Util.h and we redeclare uint8 and friends.
As I understand it, there are really only two auto-array layouts we care about in 32-bit systems, and one layout for 64-bit.

On 64-bit we have
  Header* mHdr;   // 0-7
  Header  header; // 8-15
  T[]     data;   // 16-
Everything's 8-byte aligned and we don't support 16-byte alignment, so we're all good.

On 32-bit we want
  Header* mHdr;   // 0-3
  Header  header; // 4-11
  T[]     data;   // 12-
when alignof(T) is <= 4, otherwise
  Header* mHdr;   // 0-3
  Header  header; // 4-11
  char[4] padding;// 12-15
  T[]     data;   // 16-
when alignof(T) is 8.

So what if we passed the alignment as an extra template parameter to nsAutoArrayBase, statically assert that it's 1, 2, 4, or 8, and then for the "32-bit platform, align-8" case, have a specialized implementation that aligns mAutoBuf appropriately? And, add the alignment as a parameter to nsTArray_base as well and specialize GetAutoArrayBufferUnsafe() based on it, to add 4 bytes to the offset in the 32-bit platform, align-8 case?

It seems to me that would be a bit simpler, and less reliant on the compiler optimizing away lots of checks in GetAutoArrayBufferUnsafe().
Actually we don't need to add any new parameters to nsTArray_base. We should be able to write an expression using align_of on the element type.
> It seems to me that would be a bit simpler, and less reliant on the compiler optimizing away lots of 
> checks in GetAutoArrayBufferUnsafe().

Just to be clear, I don't think there's a speed issue.  We don't call GetAutoArrayBuffer except in expensive operations like ShrinkCapacity and SwapArrayElements.

> Actually we don't need to add any new parameters to nsTArray_base. We should be able to write an 
> expression using align_of on the element type.

I don't understand what you're suggesting here.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> Actually we don't need to add any new parameters to nsTArray_base. We should
> be able to write an expression using align_of on the element type.

Do you mean just that we could get rid of the |elemAlign == 1| and |elemAlign == 2| checks in GetAutoArrayBuffer?  I think we could.  I left them in because I thought removing them was a premature optimization; I was trying to make nsTArray_base rely on as few implementation details of nsAutoArrayBase as possible, although of course I didn't get very far with that goal.  :)
(In reply to Justin Lebar [:jlebar] from comment #6)
> > It seems to me that would be a bit simpler, and less reliant on the compiler
> > optimizing away lots of checks in GetAutoArrayBufferUnsafe().
> 
> Just to be clear, I don't think there's a speed issue.  We don't call
> GetAutoArrayBuffer except in expensive operations like ShrinkCapacity and
> SwapArrayElements.

Er, yeah. Right.

> > Actually we don't need to add any new parameters to nsTArray_base. We should be
> > able to write an expression using align_of on the element type.
> 
> I don't understand what you're suggesting here.

Yeah, it made no sense.

I mean change AutoArray to be
  struct AutoArray {
    Header *mHdr;
    PRUint32 maybeAligned;
  };

and then
  Header* GetAutoArrayBufferUnsafe(PRUint32 aAlignment) {
    PRUint32* buf = &(reinterpret_cast<AutoArray*>(&mHdr))->maybeAligned;
#ifdef IS_32BIT
    // 4 bytes of padding needed for 8-byte-aligned data on 32-bit systems
    buf += aAlignment == 8;
#endif
    return reinterpret_cast<Header*>(buf));
  }
I'm happy to do it this way, but isn't this basically the same as dropping the unnecessary elemAlign checks?

+nsTArrayHeader* nsTArray_base<Alloc>::GetAutoArrayBufferUnsafe(size_t elemAlign) {
+# define GET_AUTOBUF(align) \
+    &(reinterpret_cast<nsAutoArrayBase<nsTArray<mozilla::AlignedElem<align> >, 1>*>(this)->mAutoBuf);
+
+  NS_ASSERTION(elemAlign <= 8, "Unsupported alignment.");
+  void* autoBuf = nsnull;
+  if (sizeof(void*) == 4 && elemAlign <= 4) {
+    autoBuf = GET_AUTOBUF(4);
+  }
+  else {
+    autoBuf = GET_AUTOBUF(8);
+  }
Attached patch Patch v3 (obsolete) — Splinter Review
Attachment #563529 - Flags: review?(roc)
Attachment #563198 - Attachment is obsolete: true
Attachment #563198 - Flags: review?(roc)
Roc, I'm not in a rush here, but I wanted to make sure this patch didn't fall through the cracks.
Comment on attachment 563529 [details] [diff] [review]
Patch v3

Review of attachment 563529 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/glue/nsTArray-inl.h
@@ +76,5 @@
> +
> +  if (MOZ_ALIGNOF(void*) == 4 && elemAlign <= 4) {
> +    return reinterpret_cast<Header*>(&THIS_AS_AUTOARRAY(4)->mAutoBuf);
> +  }
> +  return reinterpret_cast<Header*>(&THIS_AS_AUTOARRAY(8)->mAutoBuf);

I had a comment which I lost ...

Here, I think you can just make nsAutoArrayBase assume 4-byte element alignment and write
  
  ... get pointer to elements in nsAutoArrayBase ...
  if (sizeof(void*) == 4 && elemAlign == 8) {
    ... add 4 to pointer ...
  }

You don't actually need AlignedElem.
> You don't actually need AlignedElem.

I see.  This concerns me some because AIUI |alignof(void*) == 4 && elemAlign == 8| does not necessarily mean we'll have an extra 4 bytes of structure padding to skip over on all architectures.  I'm not sure we have many auto arrays with 8-byte-aligned elements, so code wouldn't necessarily break spectacularly.

We could add a test to the TArray tests, which is probably good enough, so long as they don't get turned off again...

Unfortunately we still need AlignedElem in nsAutoArrayBase because the argument to __attribute__((aligned(foo)) must be a parse-time constant (or something like that, anyway).

+  // Declare mAutoBuf aligned to the maximum of the header's alignment and
+  // elem_type's alignment.  We need to use a union rather than
+  // MOZ_ALIGNED_DECL because GCC is picky about what goes into
+  // __attribute__((aligned(foo))).
   union {
-    char mAutoBuf[sizeof(Header) + N * sizeof(elem_type)];
-    PRUint64 dummy;
+    char mAutoBuf[sizeof(nsTArrayHeader) + N * sizeof(elem_type)];
+    mozilla::AlignedElem<PR_MAX(MOZ_ALIGNOF(Header), MOZ_ALIGNOF(elem_type))> mAlign;
+  }
(In reply to Justin Lebar [:jlebar] from comment #13)
> > You don't actually need AlignedElem.
> 
> I see.  This concerns me some because AIUI |alignof(void*) == 4 && elemAlign
> == 8| does not necessarily mean we'll have an extra 4 bytes of structure
> padding to skip over on all architectures.

This is true in principle, but not in practice.

> I'm not sure we have many auto
> arrays with 8-byte-aligned elements, so code wouldn't necessarily break
> spectacularly.

You can statically assert that on a 32-bit architecture, sizeof(nsAutoTArray<double,1>) is exactly what we expect it to be.

> Unfortunately we still need AlignedElem in nsAutoArrayBase because the
> argument to __attribute__((aligned(foo)) must be a parse-time constant (or
> something like that, anyway).

I don't know why you need __attribute__((aligned(foo)).
> You can statically assert that on a 32-bit architecture, sizeof(nsAutoTArray<double,1>) 
> is exactly what we expect it to be.

Sold.

> I don't know why you need __attribute__((aligned(foo)).

When we're declaring storage for the auto buffer, we need to align the storage to 4 or 8 bytes.

We could do this directly, as

  __attribute__((aligned(PR_MAX(...))) char mAutoBuf[N * sizeof...]

except that what goes inside __attribute__((aligned)) has to be a parse-time constant, or something.

So instead, we need to use a union.  Which means, I think, we need to use template specialization to choose a type to union with which has alignment 4 or 8 bytes.  That is, we need to write

  union {
    char mAutoBuf[...];
    AlignedType<PR_MAX(...)> mAlign;
  };

So we don't need the general-purpose mozilla::AlignedElem specifically, but we do need something similar, which is defined for 4 and 8 byte alignment.
Attached patch Patch v4 (obsolete) — Splinter Review
Attachment #564837 - Flags: review?(roc)
Attachment #563529 - Attachment is obsolete: true
Attachment #563529 - Flags: review?(roc)
Comment on attachment 564837 [details] [diff] [review]
Patch v4

This is unexpectedly burning on mac...
Attachment #564837 - Flags: review?(roc) → review-
...and it's burning because MOZ_ALIGNOF(double) == 4 on *nix x86-32.
Attached patch Patch v5Splinter Review
Attachment #564837 - Attachment is obsolete: true
Comment on attachment 564962 [details] [diff] [review]
Patch v5

Latest push seems to be fine on try.  But I'm not sure if this is what you're looking for, Roc.
Attachment #564962 - Flags: review?(roc)
Comment on attachment 564962 [details] [diff] [review]
Patch v5

Review of attachment 564962 [details] [diff] [review]:
-----------------------------------------------------------------

This'll do!
Attachment #564962 - Flags: review?(roc) → review+
You added a bunch of PR_STATIC_ASSERTs, but also a definition for MOZ_STATIC_ASSERT. Did you mean to?
I'd been using the MOZ_STATIC_ASSERT in mfbt/Util.h, but it later became unnecessary.  I did mean to use PR_STATIC_ASSERT in the TArray code.

I can get rid of MOZ_STATIC_ASSERT; yagni and all that.
Attachment #565219 - Flags: review?(roc)
Leaving open for remaining patch.
Status: NEW → ASSIGNED
Whiteboard: [inbound]
Target Milestone: --- → mozilla10
Followup: https://hg.mozilla.org/mozilla-central/rev/831fcf93a1fe
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.