Eliminate TArray<nsAutoString> from gfx/thebes - libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy]

RESOLVED FIXED

Status

()

P2
normal
RESOLVED FIXED
11 years ago
11 years ago

People

(Reporter: bent.mozilla, Assigned: jtd)

Tracking

Trunk
Points:
---
Bug Flags:
blocking1.9 +
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Created attachment 314137 [details]
Stack trace

libGMalloc is awesome at catching mem corruption bugs on mac, and I can't run firefox under it anymore. It dies here on startup:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xd91f0a44
0x007d2f5c in nsCharTraits<unsigned short>::copy (s1=0xbfffe5a0, s2=0xd91f0a44, n=10) at nsCharTraits.h:224

Stack trace attached.
Flags: blocking1.9?
(Assignee)

Comment 1

11 years ago
Grabbing this for investigation.  Scream if you're already working on it.
Assignee: joshmoz → jdaggett
(Assignee)

Updated

11 years ago
Priority: -- → P2
Wouldn't hold the release for this.  If this becomes an issue, please re-nom.
Flags: blocking1.9? → blocking1.9-
(Assignee)

Comment 3

11 years ago
This appears to be a different variation of bug 427941 related to the use of nsTArray<nsAutoString>.  In this case, one of these is used in gfxQuartzFontCache::InitBadUnderlineList with an initial size of 10 elements.  The number of blacklisted fonts is 22, so resizing occurs when the 11th element is inserted and this is when libgmalloc squawks.  As a temporary work around, bump up the size of this array to 30 and everything works fine.  Don't know how anyone uses libgmalloc though, it takes ages for any operation to occur.
(Assignee)

Comment 4

11 years ago
Note on usage, to enable libgmalloc use:

export DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib

Updated

11 years ago
Duplicate of this bug: 428447
(In reply to comment #3)
> related to the use of nsTArray<nsAutoString>

John meant nsAutoTArray<nsAutoString>

Updated

11 years ago
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 428465

Comment 8

11 years ago
failing to copy over dependencies and descriptions results in bad triage. :(
Blocks: 428465
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy] → Eliminate TArray<nsAutoString> from gfx/thebes - libGMalloc causes app crash on startup [@nsCharTraits<unsigned short>::copy]
(Assignee)

Comment 9

11 years ago
Since this has the potential to cause crashes, we should probably block on this.
Flags: blocking1.9- → blocking1.9?
(Assignee)

Comment 10

11 years ago
Created attachment 315483 [details] [diff] [review]
patch, v.0.1, remove use of nsTArray<nsAutoString>

switch instances of nsTArray<nsAutoString> to nsTArray<nsString> within gfx code.  This includes changes for both Mac and Windows.
(Assignee)

Comment 11

11 years ago
Created attachment 315484 [details] [diff] [review]
patch, v.0.2, remove use of nsTArray<nsAutoString> (cleanup)

argh, remove debugging code...
Attachment #315483 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Component: GFX: Mac → GFX: Thebes
OS: Mac OS X → All
QA Contact: mac → thebes
Hardware: PC → All

Comment 12

11 years ago
(In reply to comment #10)
> Created an attachment (id=315483) [details]
> patch, v.0.1, remove use of nsTArray<nsAutoString>
> 
> switch instances of nsTArray<nsAutoString> to nsTArray<nsString> within gfx
> code.  This includes changes for both Mac and Windows.
> 

Can you give a quick explanation of what this change does?  Is this a real issue exposed by libGMalloc or just an issue specific with libGMalloc? 
(In reply to comment #9)
> Since this has the potential to cause crashes, we should probably block on
> this.

But only for people running under libgMalloc, right? If so, then it still doesn't block. If I'm misinterpreting, please renom. Also, feel free to nom a patch for approval.
Flags: blocking1.9? → blocking1.9-
libGMalloc is a debug library designed to expose memory corruption and incorrect memory usage bugs. The crash is a "feature" that tells us we have a real problem.

This is a real bug that may or may not crash in release builds depending on how the code gets optimized. We should definitely fix this, especially since we know what's wrong and how to make it work correctly.
Flags: blocking1.9- → blocking1.9?
(Assignee)

Comment 15

11 years ago
(In reply to comment #13)
> But only for people running under libgMalloc, right? If so, then it still
> doesn't block. If I'm misinterpreting, please renom. Also, feel free to nom a
> patch for approval.

Running with libgmalloc forces stricter memory checking than normally, so if there is any attempt by code to touch memory that is not already explicitly allocated will cause an exception.  It is essentially enforcing rules that should normally be obeyed.  As Jesse put in the description for bug 427941, the problem here is the use of nsAutoTArray<nsAutoString>: 

https://bugzilla.mozilla.org/show_bug.cgi?id=427941#c0

> * nsAutoTArray uses memcpy when it expands.
> * nsAutoString contains an internal pointer to its built-in buffer.
> * Moving using memcpy causes internal pointers to point to the wrong place.

See Jonas' comment from the same bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=427941#c3

> And also add a BIG comment saying that usually putting nsAutoStrings 
> inside an nsTArray is unsafe, but since we do the SetCapacity trick 
> it's ok in this instance. We wouldn't want people to repeat my 
> mistake here.

Frankly, seems to me this usage *should* work but given that the mechanics means that it doesn't, it's safer to change our code to use nsString rather than make changes to a string class at this late stage.

With bug 427941 this usage causes a crash.  Using libgmalloc helps find situations where the code is wrong but we're not necessarily crashing due to luck.

We need to fix this.

I don't know what you mean by "should work". You mean we should make nsTArray use copy construction instead of memcpy in general?

It's also unsafe to use nsTArray<nsAutoTArray<>>, and that burned me once.

I agree this should be fixed for 1.9. The fix is really safe.

We should probably make nsTArray safe by using copy construction by default (and optimizing to memcpy in certain cases).

Comment 17

11 years ago
Due to Comment 14, Comment 15, and Comment 16 (thanks folks - very helpful) moving this to the blocking list.
Flags: blocking1.9? → blocking1.9+
(Assignee)

Comment 18

11 years ago
(In reply to comment #16)
> I don't know what you mean by "should work". You mean we should make nsTArray
> use copy construction instead of memcpy in general?

Ideally either it should work or it shouldn't compile.  Having hidden patterns of use in core code like this is deadly.  The usage within GFX/Thebes was taken from code that Stuart wrote originally and he's been working on the project for a long time, if it's something he can miss this than I'm sure others starting work on Gecko can do the same.

Comment 19

11 years ago
That discussion might be more on-topic in bug 428465.
Attachment #315484 - Flags: review?(pavlov)

Comment 20

11 years ago
Comment on attachment 315484 [details] [diff] [review]
patch, v.0.2, remove use of nsTArray<nsAutoString> (cleanup)

thanks for putting this together ben!  We should revist this later, perhaps when we can do copy constructors rather than memcpys, or we can get rid of the internal pointer of nsAutoStrings.
Attachment #315484 - Flags: review?(pavlov) → review+
(In reply to comment #20)
> thanks for putting this together ben!

Nah, this was all john :)

Comment 22

11 years ago
Comment on attachment 315484 [details] [diff] [review]
patch, v.0.2, remove use of nsTArray<nsAutoString> (cleanup)

Let's get this in..
Attachment #315484 - Flags: approval1.9+

Updated

11 years ago
Keywords: checkin-needed
(Assignee)

Comment 23

11 years ago
checked in
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED
I backed this out to try to fix test failures.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Was this bug's patch the culprit?
(Assignee)

Comment 27

11 years ago
(In reply to comment #26)
> No. Please reland.

Will reland during passing moments of tree non-crispiness...

Keywords: checkin-needed

Updated

11 years ago
Whiteboard: [needs landing
(Assignee)

Updated

11 years ago
Status: REOPENED → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing

Updated

11 years ago
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.