nsAutoTArray aligns buffer to 32 bits, causing SIGBUS when array entry type requires 64-bit alignment

RESOLVED FIXED in mozilla1.9.3a4

Status

()

Core
XPCOM
--
critical
RESOLVED FIXED
10 years ago
7 years ago

People

(Reporter: Raúl Porcel, Assigned: glandium)

Tracking

(Blocks: 1 bug, {crash})

Trunk
mozilla1.9.3a4
Sun
Linux
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Reporter)

Description

10 years ago
See debian bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483949

I'll attach the backtrace using Gentoo and Ubuntu. Using that patch from Debian, it just delays the sigbus 45mins the first time, so it needs a proper fix.
(Reporter)

Comment 1

10 years ago
Created attachment 331850 [details]
gdb-firefox.txt

Updated

10 years ago
Severity: normal → critical
Keywords: crash
(Assignee)

Comment 2

10 years ago
It would be useful to have the backtrace with the patch applied. I doubt it to be the same.
(Reporter)

Comment 3

10 years ago
Created attachment 331976 [details]
gdb-firefox-withpatch.txt

There we go
(Assignee)

Comment 4

10 years ago
Confirmed, it is a different one
See bug 469276, it discusses probably the same issue.
(Reporter)

Updated

9 years ago
Duplicate of this bug: 469276
(Reporter)

Comment 7

9 years ago
On trunk,  it now fails due to unaligned access on sqlite, which have been handled upstream. See http://www.sqlite.org/cvstrac/tktview?tn=3777

Adding 3.6.13 bug to depend, one small step for sparc...
Depends on: 487664
(Reporter)

Comment 8

9 years ago
Created attachment 372899 [details] [diff] [review]
align.patch

Patch by Fridrich Oslage <bluebird at gentoo dot org>
Attachment #372899 - Flags: review?(timeless)
Attachment #372899 - Flags: review?(timeless) → review?(tony)
Attachment #372899 - Flags: review?(dcamp)

Comment 9

9 years ago
Comment on attachment 372899 [details] [diff] [review]
align.patch

Seems fine to me.
Attachment #372899 - Flags: review?(tony) → review+

Comment 10

9 years ago
Comment on attachment 372899 [details] [diff] [review]
align.patch

Actually, does this still compiled on MSVC?
Attachment #372899 - Flags: review?(dcamp)
Assignee: nobody → armin76
Keywords: checkin-needed
(Reporter)

Comment 11

9 years ago
(In reply to comment #10)
> (From update of attachment 372899 [details] [diff] [review])
> Actually, does this still compiled on MSVC?

I don't have a system to test that.

Updated

9 years ago
Depends on: 489442
(Reporter)

Comment 13

9 years ago
Well, i've been told that that is only a gcc function, so either it has an ifdef or you guys come up with a better solution.
(Reporter)

Comment 14

9 years ago
Created attachment 374897 [details] [diff] [review]
alignv2.patch

Using an ifdef GNUC
Attachment #372899 - Attachment is obsolete: true
Attachment #374897 - Flags: review?(tony)
(Assignee)

Comment 15

9 years ago
(In reply to comment #14)
> Created an attachment (id=374897) [details]
> alignv2.patch
> 
> Using an ifdef GNUC

If you're depending on gcc specific stuff, why not use __attribute__((aligned(some_value)), instead?

Comment 16

9 years ago
I have two (potentially stupid) questions: did anyone investigate
 (a) why mID is misaligned in the first place (misuse of packed somewhere or compiler bug?)
 (b) why does forcing this alignement make the patch suggested to solve #469276 not necessary any more?

IMHO (but of course I'm biased) that patch was better, and sufficient to solve the problem for me, at least. But then I do not understand the bigger picture here...
(Reporter)

Comment 17

9 years ago
(In reply to comment #15)
> If you're depending on gcc specific stuff, why not use
> __attribute__((aligned(some_value)), instead?
I'm sorry but i have no clue about C, i just did what i saw on other part of the code. Feel free to improve the patch, i can test it if needed.

(In reply to comment #16)
> I have two (potentially stupid) questions: did anyone investigate
>  (a) why mID is misaligned in the first place (misuse of packed somewhere or
> compiler bug?)
Same answer as avobe. The guy who did the patch said he does not know Mozilla's code enough to find where its being unaligned.

>  (b) why does forcing this alignement make the patch suggested to solve #469276
> not necessary any more?
Your patch, or some other almost the same if not the same, was already checked-in on trunk on other bug.
See http://hg.mozilla.org/mozilla-central/rev/4151690668c7 and bug 477727

Comment 18

9 years ago
Comment on attachment 374897 [details] [diff] [review]
alignv2.patch

I agree with Martin that we should find out why mId is getting mis-aligned before patching.
Attachment #374897 - Flags: review?(tony) → review-
(Reporter)

Comment 19

9 years ago
So who's going to do it?

Updated

9 years ago
Depends on: 493560
(Reporter)

Comment 20

8 years ago
Hi everyone,

the problem is still present on all the 1.9.1.x releases. I've found that Debian doesn't have that issue anymore, so i looked at why.

On gentoo we first did this to compile firefox:
export BUILD_OFFICIAL=1
export MOZILLA_OFFICIAL=1

Debian doesn't do that, so i removed that. Then i compared our version and debian's, and found that the thing that made firefox sigbus or not sigbus(apart from removing thise two variables) is the string on application.ini of firefox.

I have this on application.ini:
[App]
Vendor=Mozilla
Name=Firefox
Version=3.5.5
BuildID=20091109192036

Replacing Firefox with Firefoxa (or whatever not being Firefox, f.ex debian has Iceweasel), it doesn't sigbus anymore.

Have no clue why.
(Assignee)

Comment 21

8 years ago
(In reply to comment #16)
> I have two (potentially stupid) questions: did anyone investigate
>  (a) why mID is misaligned in the first place (misuse of packed somewhere or
> compiler bug?)

I got some time (and machine) to investigate this issue, and here is what I found:
mID is misaligned because of 2 nsAutoTArray<nsUrlClassifierEntry, 5> declarations, which allocate these nsAutoTArray on stack.
It happens that from the compiler perspective the nsAutoTArrays look like this struct:
struct array {
  Header *mHdr; // its value is &mAutoBuf
  char mAutoBuf[sizeof(Header) + 5 * sizeof(nsUrlClassifierEntry)];
}

The compiler can't thus know the alignment requirements, and aligns for the Header* pointer and nothing better.

Now, as I didn't want to try to find the combination of unions and modifications to GetAutoArrayBuffer and possibly deeper changes to nsTArray to have a proper alignment, I figured replacing these nsAutoTArray with simple nsTArray works. Patch coming.
(Assignee)

Comment 22

8 years ago
Created attachment 425467 [details] [diff] [review]
Patch

See comment 21.
Assignee: armin76 → mh+mozilla
Attachment #374897 - Attachment is obsolete: true
Attachment #425467 - Flags: review?(benjamin)

Comment 23

8 years ago
Comment on attachment 425467 [details] [diff] [review]
Patch

Working around the broken class is certainly expedient, but will probably cause problems reappearing frequently. dbaron, how should this actually be solved?
Attachment #425467 - Flags: review?(benjamin) → review?(dbaron)
I presume the underlying requirement in this case is that the entries require 8-byte alignment.  However, in theory, entries could require more than that.   (I think there may be such things if you're using SSE2 instructions or similar; I recall issues with stack frame alignment on the Mac at one point when we weren't 16-byte aligning stack frames in our xptcall assembly.)

Right now, nsTArray gives 8-byte alignment and nsAutoTArray gives 4-byte alignment.  They should really determine the alignment from the type being allocated.

Fixing nsTArray (which isn't the current problem, but could be a problem in theory) is relatively straightforward, since nsTArray is using the allocator to allocate Header + elements, and you can assume that malloc() returns memory aligned to the most strict alignment requirement on the system.  To do this, we could make the way the array blocks (Header + elements) are allocated allocate sizeof(S) + (N - 1) * sizeof(elem_type) given struct S { elem_type e; Header h; }, and start the first element at the array at sizeof(S) - sizeof(elem_type) rather than at sizeof(Header).  (sizeof() on a struct is guaranteed to round up to that struct's alignment requirements so that you can allocate the structs in an array.)

It's also relatively straightforward to make nsAutoTArray in a similar manner (especially if we're willing to take the hit of extra padding in two different places).  However, that would only fix the use of nsAutoTArray on the heap; I don't think it would force the compiler to correctly align an nsAutoTArray on the stack (though I could be wrong).  Not sure who else might have ideas about how to fix this, but cc:ing a few candidates.
Summary: nsUrlClassifierDBService has bad alignment, causes SIGBUS → nsUrlClassifierDBService has bad alignment due to nsAutoTArray<nsUrlClassifierEntry, N>, causes SIGBUS
(In reply to comment #24)
> Right now, nsTArray gives 8-byte alignment and nsAutoTArray gives 4-byte
> alignment.

Er, I should have said that nsAutoTArray gives alignment
  min(max(sizeof(void*), alignment requirement of PRUint32), 8 bytes),
which is 4 bytes on a system with 32-bit pointers.

Comment 26

8 years ago
Two members of a union start at the same address, so one way to get nsAutoTArray::mAutoBuf to be aligned for elem_type would be:

  union {
    char mAutoBuf[...];
    elem_type mAlignmentDummy;
  };

Unfortunately, this only works if elem_type is a POD.  I can't think of any way to build a POD with the same alignment as elem_type, so a more brute-force strategy would be to pick a POD type that has the greatest alignment on the platform (lets say it was long double) and do:

  struct FatPOD { long double _; };
  union {
    char mAutoBuf[...];
    FatPOD mAlignmentDummy;
  };

Perhaps that would work?
(Assignee)

Comment 27

8 years ago
(In reply to comment #26)
>   struct FatPOD { long double _; };
>   union {
>     char mAutoBuf[...];
>     FatPOD mAlignmentDummy;
>   };
> 
> Perhaps that would work?

I would have gone this way if that didn't mean modifying GetAutoArrayBuffer.
(Assignee)

Comment 28

8 years ago
(In reply to comment #27)
> I would have gone this way if that didn't mean modifying GetAutoArrayBuffer.

and Elements() and other similar functions.
For nsTArray itself, how about we change it so that the contents of the current nsTArray_base::Header are embedded directly into the nsTArray_base object, and then we have a pointer to the elements?  Like this

class nsTArray_base {
  // ...
protected:
  PRUint32 mLength;
  PRUint32 mCapacity : 31;
  PRUint32 mIsAutoArray : 1;
}; 

template <typename T>
class nsTArray : public nsTArray_base {
  // ...
protected:
  elem_type* mElements;
};

That way the compiler and runtime will DTRT.

For nsAutoTArray, I endorse Luke's suggestion, if we're willing to live with always having enough padding in there for the maximum required alignment, which should be fine (it's just another few bytes of wasted space on the stack...)  If we really, really need to squeeze out that space, well, have a look at boost's aligned_storage.hpp and imagine what it would take to import all the relevant compiler-specific hacks.  (Many of them have to do with MSVC, unfortunately.)
(Assignee)

Comment 30

8 years ago
There is the same problem in extensions/cookie/nsPermissionManager.h: nsHostEntry has a mPermissions member defined as nsAutoTArray<nsPermissionEntry, 1>, and nsPermissionEntry has a 64-bits member.

As for the fix, I do agree fixing nsTArray would be better, but I'd need directions, especially regarding GetAutoArrayBuffer. If we are to let the compiler decide the correct alignment, the assumptions in GetAutoArrayBuffer and other places are wrong, and especially in the case of GetAutoArrayBuffer, it is impossible to keep the function the way it is. The best I can think of for it would be to check if the buffer is at most a word size after the header.
(Assignee)

Comment 31

8 years ago
(In reply to comment #29)
> For nsTArray itself, how about we change it so that the contents of the current
> nsTArray_base::Header are embedded directly into the nsTArray_base object, and
> then we have a pointer to the elements?

I don't think this is possible, this could break binary compatibility with external components.
(Assignee)

Comment 32

8 years ago
After further consideration, it looks like it should be safe to break binary compatibility, as the nsTArray (and nsTPtrArray that derives from it) is in the xpcom glue. I can be wrong, but I don't think a component would share a pointer to a nsTArray with another component.

Other that that, I care more about fixing the current problem being the 8-bytes alignment requirements than fixing all possible alignment problems that hypothetically could happen. I really doubt there is currently any use of nsTArray that would need more than 8-bytes alignment.
I think we could fix nsAutoTArray by having a little extra buffer size for the cases where we *might* need it depending on the alignment of the stack allocation, and then (in those cases) dynamically changing the pointer to the beginning of the internal stack buffer.
(Assignee)

Comment 34

8 years ago
(In reply to comment #29)
> If we really, really need to squeeze out that space, well, have a look at
> boost's aligned_storage.hpp and imagine what it would take to import all the
> relevant compiler-specific hacks.  (Many of them have to do with MSVC,
> unfortunately.)

Note there is apparently a TR1 standard implementation of aligned_storage, that seems to be available at least under MS Visual Studio and gcc: std::tr1::aligned_storage. I don't know if that could be used in the mozilla code base.

(In reply to comment #33)
> I think we could fix nsAutoTArray by having a little extra buffer size for the
> cases where we *might* need it depending on the alignment of the stack
> allocation, and then (in those cases) dynamically changing the pointer to the
> beginning of the internal stack buffer.

Since that pretty much needs a change in GetAutoArrayBuffer anyways, I think there is actually no need to make the nsAutoTArray buffer bigger. I'm currently testing a patch that doesn't care about squeezing space and aligns for a 8-bytes word, more on that when it is validated to work.
(Assignee)

Comment 35

8 years ago
Created attachment 427953 [details] [diff] [review]
Patch for 8-byte alignment of nsAutoT{Ptr,}Array buffer

The idea here is to align at 8-byte boundaries in nsAutoTArray and 4 or 8-byte boundaries in nsAutoTPtrArray (depending on the pointer size ; this is necessary for sparc64).
Considering the alignment constraints, there is only going to be either 0 or 4 bytes between the end of the nsTArray_base contents and the Header in the mAutoBuf. The idea then is to only keep track if we need to adjust for alignment (add 4 bytes) or not, which is 1 bit of information. As the Capacity is already only 31 bits, I tought it is possible to take a bit off the Length to store that information.
In the worst case this implementation wastes 7 + 4 bytes: 4 bytes for alignment, and 7 bytes if the type of data stored in the array is char and the array is created with 1 element only.
Attachment #425467 - Attachment is obsolete: true
Attachment #427953 - Flags: review?
Attachment #425467 - Flags: review?(dbaron)
(Assignee)

Updated

8 years ago
Attachment #427953 - Flags: review? → review?(dbaron)
It seems like you're assuming that meeting the alignment requirements of PRUint64 will force an extra word of padding to cause alignment, but I don't think that's guaranteed.  And if it doesn't happen, I think the approach you're taking will cause writing and reading 32 bits past the end of the array when it does adjust alignment.  I think you could fix this and simplify the patch at the same time by just making GetAutoArrayBuffer static_cast this to an nsAutoTArray, and then just skip mAdjustAlignment.

(I eventually need to look a little more closely, though; this was just a relatively quick look.)

That said, I'd like to see the more correct approach that (a) uses the real alignment requirements of the type and (b) doesn't add the extra padding when it's not needed.
Er, never mind about the array bounds violation.  I was crossing what you were actually doing with what I was expecting to see (and what we'd need to do for the full fix).

However, you could still simplify the patch a good bit and avoid stealing the extra bit by using a static_cast.  (It might require moving the inline function's body down to the end of the file, with just a declaration as explicitly |inline| within the class definition.)
(Assignee)

Comment 38

8 years ago
(In reply to comment #37)
> However, you could still simplify the patch a good bit and avoid stealing the
> extra bit by using a static_cast.  (It might require moving the inline
> function's body down to the end of the file, with just a declaration as
> explicitly |inline| within the class definition.)

That can't work unless the function is moved in either nsTArray or nsTAutoArray. The latter would be painful because the function is used on nsTArrays in nsTArray.cpp, it would also mean having to either reimplement it in nsTPtrAutoArray or make nsTPtrAutoArray inherit from nsTAutoArray too.

As for real alignment requirements of the type, that can't work if the type is not a POD.
(Assignee)

Comment 39

8 years ago
(In reply to comment #38)
> That can't work unless the function is moved in either nsTArray or
> nsTAutoArray. The latter would be painful because the function is used on
> nsTArrays in nsTArray.cpp, it would also mean having to either reimplement it
> in nsTPtrAutoArray or make nsTPtrAutoArray inherit from nsTAutoArray too.

Actually, that can't work at all, because nsTArray.cpp implements nsTArray_base and uses both UsesAutoArrayBuffer and GetAutoArrayBuffer in various functions. Moving these functions in the nsTArray template would be silly.
Comment on attachment 427953 [details] [diff] [review]
Patch for 8-byte alignment of nsAutoT{Ptr,}Array buffer

I think this will be acceptable (although not the ideal solution) if you:
 * drop mAdjustAlignment
 * make both auto array classes use the same extra thing for alignment (probably better as void* than PRInt64 to have less effect on 32-bit machines)
 * make GetAutoArrayBuffer use a static_cast to either nsTAutoArray or (given what you say above) a struct nested inside nsTArray that looks just like it


Depending on the alignment requirements of the type *is* possible using the approach I described above, which is different from the approach you're taking in this patch.
Attachment #427953 - Flags: review?(dbaron) → review-
(In reply to comment #40)
> Depending on the alignment requirements of the type *is* possible using the
> approach I described above, which is different from the approach you're taking
> in this patch.

(By "above", I mean comment 24 plus comment 33.)
(Assignee)

Comment 42

8 years ago
(In reply to comment #40)
> (From update of attachment 427953 [details] [diff] [review])
> I think this will be acceptable (although not the ideal solution) if you:
>  * drop mAdjustAlignment
>  * make both auto array classes use the same extra thing for alignment
> (probably better as void* than PRInt64 to have less effect on 32-bit machines)

sparc has 32 bits pointers and needs 64 bits alignment for 64-bits ints, which is the original problem.

>  * make GetAutoArrayBuffer use a static_cast to either nsTAutoArray or (given
> what you say above) a struct nested inside nsTArray that looks just like it

The problem is that GetAutoArrayBuffer is in nsTArray_base, and nsTAutoArray is a template... nsTAutoArray<PRInt32> will have different alignment requirements than nsTAutoArray<PRInt64>. What do you want to static_cast to ? The only way this is going to work is to move GetAutoArrayBuffer into the templates, but then, functions in nsTArray.cpp won't be able to work properly.
(In reply to comment #42)
> (In reply to comment #40)
> > (From update of attachment 427953 [details] [diff] [review] [details])
> > I think this will be acceptable (although not the ideal solution) if you:
> >  * drop mAdjustAlignment
> >  * make both auto array classes use the same extra thing for alignment
> > (probably better as void* than PRInt64 to have less effect on 32-bit machines)
> 
> sparc has 32 bits pointers and needs 64 bits alignment for 64-bits ints, which
> is the original problem.

ok, stick to PRInt64

> >  * make GetAutoArrayBuffer use a static_cast to either nsTAutoArray or (given
> > what you say above) a struct nested inside nsTArray that looks just like it
> 
> The problem is that GetAutoArrayBuffer is in nsTArray_base, and nsTAutoArray is
> a template... nsTAutoArray<PRInt32> will have different alignment requirements
> than nsTAutoArray<PRInt64>. What do you want to static_cast to ? The only way
> this is going to work is to move GetAutoArrayBuffer into the templates, but
> then, functions in nsTArray.cpp won't be able to work properly.

Leave it in the base class; just nest the dummy struct in nsTArray_base.
(Assignee)

Comment 44

8 years ago
(In reply to comment #43)
> Leave it in the base class; just nest the dummy struct in nsTArray_base.

Oh I see, so you just want to always align for 64 bits data, not depending on the type.
Yes.  Your approach already does that for the size of the nsAutoTArray, so we may as well do it for the alignment as well.
(Assignee)

Comment 46

8 years ago
One of the reasons I did it the way it currently is was to allow better alignment later (it should be possible to align at 4 or 8 bytes depending on the type being put in the array). If you are fine with always aligning at 8 bytes boundary, we can just go this way.
(Assignee)

Comment 47

8 years ago
Created attachment 429307 [details] [diff] [review]
Patch v2

This should work as expected (currently building on sparc to really make sure).

2 comments:
- I didn't know where would be best to cut the long line in GetAutoArrayBuffer().
- I'm tempted to define a template for the AutoBuf, so that the trick doesn't have to be modified in two different files whenever you feel you want more compact arrays.
Attachment #427953 - Attachment is obsolete: true
(Assignee)

Comment 48

8 years ago
Another one:
- I'm also tempted to define the AutoArray struct using nsTArray_base itself, so that any change in the base class doesn't break everything.
(Assignee)

Comment 49

8 years ago
Comment on attachment 429307 [details] [diff] [review]
Patch v2

Confirmed to work on amd64 and sparc. Please see comment 47 and comment 48.
Attachment #429307 - Flags: review?(dbaron)
Comment on attachment 429307 [details] [diff] [review]
Patch v2

r=dbaron, if you add a comment above |AutoArray| that says it's a dummy struct just to simulate what the compiler does to the alignment of nsAutoTArray and nsAutoTPtrArray.

Sorry for the delay getting to this.
Attachment #429307 - Flags: review?(dbaron) → review+
(Assignee)

Comment 51

8 years ago
Created attachment 432298 [details] [diff] [review]
Patch v2.1

Addressing dbaron's comment 50.
Attachment #429307 - Attachment is obsolete: true
(Assignee)

Updated

8 years ago
Keywords: checkin-needed
Summary: nsUrlClassifierDBService has bad alignment due to nsAutoTArray<nsUrlClassifierEntry, N>, causes SIGBUS → nsAutoTArray aligns buffer to 32 bits, causing SIGBUS when array entry type requires 64-bit alignment
http://hg.mozilla.org/mozilla-central/rev/7f301b0d3385
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Component: Phishing Protection → XPCOM
Keywords: checkin-needed
Product: Firefox → Core
QA Contact: phishing.protection → xpcom
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a4
(Assignee)

Comment 54

8 years ago
I don't see why this would crash. Is there a way to get more information from the crash ?

Comment 55

8 years ago
dao, both of those builds look like known random-orange entirely unrelated to this push.
I didn't spend too much time on it, since I was on the run. What's the bug for the random orange?
(Assignee)

Comment 57

8 years ago
(In reply to comment #55)
> dao, both of those builds look like known random-orange entirely unrelated to
> this push.

To be fair, the second one crashes in nsTArray.h.

Updated

8 years ago
Keywords: checkin-needed
(In reply to comment #57)
> (In reply to comment #55)
> > dao, both of those builds look like known random-orange entirely unrelated to
> > this push.
> 
> To be fair, the second one crashes in nsTArray.h.

Yeah, I saw that and considered the case closed...
I'm told the random orange is bug 542700.
http://hg.mozilla.org/mozilla-central/rev/f72218a3da21
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED

Updated

8 years ago
Duplicate of this bug: 561622
You need to log in before you can comment on or make changes to this bug.