Closed Bug 427602 Opened 15 years ago Closed 15 years ago

Eliminate TArray<nsAutoString> from gfx/thebes - libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy]

Categories

(Core :: Graphics, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: bent.mozilla, Assigned: jtd)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached file Stack trace
libGMalloc is awesome at catching mem corruption bugs on mac, and I can't run firefox under it anymore. It dies here on startup:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xd91f0a44
0x007d2f5c in nsCharTraits<unsigned short>::copy (s1=0xbfffe5a0, s2=0xd91f0a44, n=10) at nsCharTraits.h:224

Stack trace attached.
Flags: blocking1.9?
Grabbing this for investigation.  Scream if you're already working on it.
Assignee: joshmoz → jdaggett
Priority: -- → P2
Wouldn't hold the release for this.  If this becomes an issue, please re-nom.
Flags: blocking1.9? → blocking1.9-
This appears to be a different variation of bug 427941 related to the use of nsTArray<nsAutoString>.  In this case, one of these is used in gfxQuartzFontCache::InitBadUnderlineList with an initial size of 10 elements.  The number of blacklisted fonts is 22, so resizing occurs when the 11th element is inserted and this is when libgmalloc squawks.  As a temporary work around, bump up the size of this array to 30 and everything works fine.  Don't know how anyone uses libgmalloc though, it takes ages for any operation to occur.
Note on usage, to enable libgmalloc use:

export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib

Duplicate of this bug: 428447
(In reply to comment #3)
> related to the use of nsTArray<nsAutoString>

John meant nsAutoTArray<nsAutoString>
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 428465
failing to copy over dependencies and descriptions results in bad triage. :(
Blocks: 428465
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy] → Eliminate TArray<nsAutoString> from gfx/thebes - libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy]
Since this has the potential to cause crashes, we should probably block on this.
Flags: blocking1.9- → blocking1.9?
switch instances of nsTArray<nsAutoString> to nsTArray<nsString> within gfx code.  This includes changes for both Mac and Windows.
argh, remove debugging code...
Attachment #315483 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Component: GFX: Mac → GFX: Thebes
OS: Mac OS X → All
QA Contact: mac → thebes
Hardware: PC → All
(In reply to comment #10)
> Created an attachment (id=315483) [details]
> patch, v.0.1, remove use of nsTArray<nsAutoString>
> 
> switch instances of nsTArray<nsAutoString> to nsTArray<nsString> within gfx
> code.  This includes changes for both Mac and Windows.
> 

Can you give a quick explanation of what this change does?  Is this a real issue exposed by libGMalloc or just an issue specific with libGMalloc? 
(In reply to comment #9)
> Since this has the potential to cause crashes, we should probably block on
> this.

But only for people running under libgMalloc, right? If so, then it still doesn't block. If I'm misinterpreting, please renom. Also, feel free to nom a patch for approval.
Flags: blocking1.9? → blocking1.9-
libGMalloc is a debug library designed to expose memory corruption and incorrect memory usage bugs. The crash is a "feature" that tells us we have a real problem.

This is a real bug that may or may not crash in release builds depending on how the code gets optimized. We should definitely fix this, especially since we know what's wrong and how to make it work correctly.
Flags: blocking1.9- → blocking1.9?
(In reply to comment #13)
> But only for people running under libgMalloc, right? If so, then it still
> doesn't block. If I'm misinterpreting, please renom. Also, feel free to nom a
> patch for approval.

Running with libgmalloc forces stricter memory checking than normally, so if there is any attempt by code to touch memory that is not already explicitly allocated will cause an exception.  It is essentially enforcing rules that should normally be obeyed.  As Jesse put in the description for bug 427941, the problem here is the use of nsAutoTArray<nsAutoString>: 

https://bugzilla.mozilla.org/show_bug.cgi?id=427941#c0

> * nsAutoTArray uses memcpy when it expands.
> * nsAutoString contains an internal pointer to its built-in buffer.
> * Moving using memcpy causes internal pointers to point to the wrong place.

See Jonas' comment from the same bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=427941#c3

> And also add a BIG comment saying that usually putting nsAutoStrings 
> inside an nsTArray is unsafe, but since we do the SetCapacity trick 
> it's ok in this instance. We wouldn't want people to repeat my 
> mistake here.

Frankly, seems to me this usage *should* work but given that the mechanics means that it doesn't, it's safer to change our code to use nsString rather than make changes to a string class at this late stage.

With bug 427941 this usage causes a crash.  Using libgmalloc helps find situations where the code is wrong but we're not necessarily crashing due to luck.

We need to fix this.

I don't know what you mean by "should work". You mean we should make nsTArray use copy construction instead of memcpy in general?

It's also unsafe to use nsTArray<nsAutoTArray<>>, and that burned me once.

I agree this should be fixed for 1.9. The fix is really safe.

We should probably make nsTArray safe by using copy construction by default (and optimizing to memcpy in certain cases).
Due to Comment 14, Comment 15, and Comment 16 (thanks folks - very helpful) moving this to the blocking list.
Flags: blocking1.9? → blocking1.9+
(In reply to comment #16)
> I don't know what you mean by "should work". You mean we should make nsTArray
> use copy construction instead of memcpy in general?

Ideally either it should work or it shouldn't compile.  Having hidden patterns of use in core code like this is deadly.  The usage within GFX/Thebes was taken from code that Stuart wrote originally and he's been working on the project for a long time, if it's something he can miss this than I'm sure others starting work on Gecko can do the same.
That discussion might be more on-topic in bug 428465.
Comment on attachment 315484 [details] [diff] [review]
patch, v.0.2, remove use of nsTArray<nsAutoString> (cleanup)

thanks for putting this together ben!  We should revist this later, perhaps when we can do copy constructors rather than memcpys, or we can get rid of the internal pointer of nsAutoStrings.
Attachment #315484 - Flags: review?(pavlov) → review+
(In reply to comment #20)
> thanks for putting this together ben!

Nah, this was all john :)
Comment on attachment 315484 [details] [diff] [review]
patch, v.0.2, remove use of nsTArray<nsAutoString> (cleanup)

Let's get this in..
Attachment #315484 - Flags: approval1.9+
Keywords: checkin-needed
checked in
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
I backed this out to try to fix test failures.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Was this bug's patch the culprit?
(In reply to comment #26)
> No. Please reland.

Will reland during passing moments of tree non-crispiness...

Whiteboard: [needs landing
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.