Closed Bug 487063 Opened 16 years ago Closed 16 years ago

OS X: malloc: *** error for object 0x...: Non-aligned pointer being freed (2) during mochitest

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: ted, Assigned: jfkthame)

References

()

Details

(4 keywords, Whiteboard: [fixed1.9.1b4] )

Attachments

(2 files, 1 obsolete file)

A recent trunk unittest build on OS X spewed a ton of these errors during mochitest: firefox-bin(16789,0xa0873fa0) malloc: *** error for object 0xbfff6970: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug the leak log got confused at the end and reported negative leaks, as well.
OS: Windows XP → Mac OS X
Whiteboard: [orange]
Josh says he's seen stack alignment stuff like this in the past, due to the 16-byte stack alignment. Don't know if this could be JS? bz says some alignment-related code landed in gfx lately as well. This same changeset cycled green on unittests previously, so it's clearly intermittent whatever it is.
I don't think this has anything to do with stack alignment; I think we're free()ing a bogus pointer. It would be nice to get a stack!
Also slightly more info: *** 43041 INFO Running /tests/layout/base/tests/test_bug441782-1.html... *** 43042 INFO TEST-PASS | /tests/layout/base/tests/test_bug441782-1.html | Rendering of reftest bug441782-1 is not different with bidi.numeral == 0 firefox-bin(16789,0xa0873fa0) malloc: *** error for object 0xbfff6970: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug Looks like it started in a layout mochitest.
(In reply to comment #2) > I don't think this has anything to do with stack alignment; I think we're > free()ing a bogus pointer. It would be nice to get a stack! tada! http://pastebin.mozilla.org/640317
You know what, the unittest machines probably do build with the 10.5 SDK, since they're 10.5 machines and not building universal builds, so they don't explicitly specify an SDK: http://hg.mozilla.org/build/buildbot-configs/file/13c2451c492d/mozilla2/macosx/mozilla-central/unittest/mozconfig
The error is occurring in the destructor of the nsAutoTArray that is declared here: http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/src/gfxCoreTextFonts.cpp#787 When this goes out of scope at the end of the InitTextRun method, we get the valgrind "invalid free() / delete / delete[]" error with a call stack that includes free() PR_Free() NS_Free_P() nsTArray_base::~nsTArray_base() nsTArray<gfxTextRange>::~nsTArray() nsAutoTArray<gfxTextRange, 3u>::~nsAutoTArray() Simply changing nsAutoTArray to nsTArray at line 787 (which corresponds to the code in the ATSUI path) makes the error go away; however, this doesn't explain *why* the nsAutoTArray version is broken. If something is corrupting the array header, we need to know where that is happening, otherwise we're just masking the underlying problem and it will doubtless lead to trouble somewhere else, some day.
Component: General → GFX: Thebes
QA Contact: general → thebes
The damage occurred because of the line at the end of gfxFontGroup::ComputeRanges which is intended to set the ending offset of the last range -- but if the text was zero-length, and so no ranges were created, this will munge element [-1] of the array, or in other words it will stomp on the array header. Calling MakeTextRun for a zero-length string is somewhat anomalous, but can occur in the bidi-numeral case when all the characters of the string ended up getting converted to other-script numerals, and therefore had "transient" runs created for them. Simply checking for zero length at the start of ComputeRanges seems to resolve the problem, at least in initial testing. I wonder if this could also be related to bug 486470, as corrupting the array header here might not trigger an immediate failure but could easily result in some kind of latent heap damage when free() gets called for an invalid pointer value.
Assignee: nobody → jfkthame
Attachment #371453 - Flags: review?(roc)
Comment on attachment 371453 [details] [diff] [review] protect gfxFontGroup::ComputeRanges against zero-length text Let's land this ASAP, thanks a ton Jonathan!
Keywords: checkin-needed
Whiteboard: [orange] → [orange][needs landing]
This doesn't fix bug 486470 unfortunately. The fix looks good but it seems like we should also never get to ComputeRanges if the range is 0 characters long, we should be bailing much sooner.
(In reply to comment #10) > The fix looks good but it seems like we should also never get to ComputeRanges > if the range is 0 characters long, we should be bailing much sooner. If that's the case it might be worth sticking an assertion (or, for now, a warning) at the early return so we can catch when it happens.
The zero-length string arises in an edge case of gfxTextRunWordCache::MakeTextRun, so I guess the right thing to do there is check the string length and call MakeEmptyTextRun instead of MakeTextRun on the gfxFontGroup. This will prevent ComputeRanges being called for a zero-length run in the test case we have seen (caused by bidi numeral processing); however, I think we should still include the check within ComputeRanges itself so that it correctly returns an empty list of ranges for a zero-length input. Review again as this moves the primary fix to a completely different place.
Attachment #371453 - Attachment is obsolete: true
Attachment #371632 - Flags: review?(roc)
(In reply to comment #10) > This doesn't fix bug 486470 unfortunately. > > The fix looks good but it seems like we should also never get to ComputeRanges > if the range is 0 characters long, we should be bailing much sooner. I'm guessing that this new version of the patch may have a better chance of resolving bug 486470, as it will eliminate the "edge-case" textruns that were being created. (It would still be good to figure out where the actual corruption is occuring in that bug, however, and add robustness at the point of failure.)
(In reply to comment #12) > (In reply to comment #10) > > The fix looks good but it seems like we should also never get to ComputeRanges > > if the range is 0 characters long, we should be bailing much sooner. > > If that's the case it might be worth sticking an assertion (or, for now, a > warning) at the early return so we can catch when it happens. I don't think it should be the responsibility of ComputeRanges() to assert or warn in this case; there's nothing inherently wrong with calling it for a zero-length substring of a run, and the correct result (an empty list of ranges) is clear, it just wasn't implemented safely. The appropriate place to add an assertion, I think, would be gfxFontGroup::MakeTextRun (for all its concrete implementations), insisting the text length should be non-zero as we have an alternative method for the special case of an empty text run.
In this patch I've added "aLength > 0" assertions for all the various platform implementations of gfxFontGroup::MakeTextRun; obviously I haven't tested all of these locally. They look pretty safe, though.
Attachment #371636 - Flags: review?(roc)
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [orange][needs landing] → [orange][needs 191 approval]
Comment on attachment 371632 [details] [diff] [review] in addition to making ComputeRanges robust, check for zero-length string in TextRunWordCache::MakeTextRun [Checkin: Comment 17 & 27] I think we might even want this on 1.9.0 too...
Attachment #371632 - Flags: approval1.9.0.10?
(In reply to comment #18) > (From update of attachment 371632 [details] [diff] [review]) > I think we might even want this on 1.9.0 too... This code was added post-1.9.0, based on code in gfxWindowsFonts.cpp. The problem code doesn't exist that code so just trunk/1.9.1 is sufficient.
Target Milestone: --- → mozilla1.9.2a1
Comment on attachment 371632 [details] [diff] [review] in addition to making ComputeRanges robust, check for zero-length string in TextRunWordCache::MakeTextRun [Checkin: Comment 17 & 27] Clearing 1.9.0.10 nomination per comment 19. Please re-nom if you still think we should take this.
Attachment #371632 - Flags: approval1.9.0.10?
Is it possible to reproduce this reliably with a more specific testcase? I guess this didn't actually cause a crash or anything, but it'd be nice to have a reliable testcase.
The mochitest from bug 441782 was reliably giving nsTArray assertions at the end of ComputeRanges, actually, so if we took assertions more seriously all the time, I guess we'd have dealt with this sooner. Are we prepared to move towards a world where assertion failures abort the program? :)
We have fatal assertions already, NS_ABORT_IF_FALSE. I'm all for out-of-bounds array accesses and other errors that are likely to corrupt memory being fatal. I'm not in favour of making assertions that just result in incorrect display or layout fatal, since that just makes it harder to diagnose and prioritize bugs.
The trouble is that we're overloading NS_ASSERTION, with parts of the code using this to flag basic errors that should always abort the program (e.g., index validation in nsTArray.h/.cpp, where continuing to execute means we'll be reading undefined values or corrupting memory; or functions that assert a pointer is non-null before proceeding to use it), while other parts of the are code using it to indicate some kind of inconsistent or unexpected situation that the algorithm probably won't handle properly, but execution will nevertheless continue safely. So should we be going through the codebase and replacing NS_ASSERTION with NS_ABORT_IF_FALSE wherever the intention is to check that we're not heading into undefined/memory-stomping territory? As it is, such critical assertions can easily get swamped by "harmless" assertions from layout, and so they get ignored - as happened with 441782, I think.
NS_ASSERTION always means "we have detected a bug in our code" (or possibly, the code of some library we depend on). If we have merely encountered an unusual or unexpected situation, such as insane input, then we should use NS_WARNING. > So should we be going through the codebase and replacing NS_ASSERTION with > NS_ABORT_IF_FALSE wherever the intention is to check that we're not heading > into undefined/memory-stomping territory? It's probably not worth doing an audit, but when we encounter assertions that indicate imminent memory corruption, we should probably turn them into aborts.
If we have a testcase that already reliably produces an assertion in this case, then I think we can consider it tested. We will hopefully get to a point where we run our unittests on debug builds in the near future, which would catch that.
Attachment #371632 - Flags: approval1.9.1? → approval1.9.1+
Attachment #371636 - Flags: approval1.9.1? → approval1.9.1+
Keywords: checkin-needed
Whiteboard: [orange][needs 191 approval] → [orange][needs 191 landing]
Comment on attachment 371632 [details] [diff] [review] in addition to making ComputeRanges robust, check for zero-length string in TextRunWordCache::MakeTextRun [Checkin: Comment 17 & 27] http://hg.mozilla.org/releases/mozilla-1.9.1/rev/fb4fb9271f41
Attachment #371632 - Attachment description: in addition to making ComputeRanges robust, check for zero-length string in TextRunWordCache::MakeTextRun → in addition to making ComputeRanges robust, check for zero-length string in TextRunWordCache::MakeTextRun [Checkin: Comment 17 & 27]
Depends on: 389074
Whiteboard: [orange][needs 191 landing] → [fixed1.9.1b4] [orange]
Comment on attachment 371636 [details] [diff] [review] add assertions to detect incorrect creation of zero-length text runs [Checkin: Comment 17 & 28] http://hg.mozilla.org/releases/mozilla-1.9.1/rev/96939938ef7a After fixing context for { patching file gfx/thebes/src/gfxAtsuiFonts.cpp Hunk #1 FAILED at 703 1 out of 2 hunks FAILED } And dropping { unable to find 'gfx/thebes/src/gfxCoreTextFonts.cpp' for patching 2 out of 2 hunks FAILED } which depends on bug 389074 .
Attachment #371636 - Attachment description: add assertions to detect incorrect creation of zero-length text runs → add assertions to detect incorrect creation of zero-length text runs [Checkin: Comment 17 & 28]
Whiteboard: [fixed1.9.1b4] [orange] → [fixed1.9.1b4]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: