Closed Bug 609691 Opened 9 years ago Closed 9 years ago

Investigate crash [@ gfxTextRun::IsClusterStart]

Categories

(Core :: Graphics, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: bsterne, Assigned: jfkthame)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sg:dos])

Attachments

(1 file, 2 obsolete files)

The attached testcase causes crash downstream from gfxFontGroup::MakeTextRun due to that function returning a null object which is later dereferenced.  On Linux the stack looks like:

Program received signal SIGSEGV, Segmentation fault.
0x008c11df in gfxTextRun::IsClusterStart (this=0x0, aPos=0) at ../../dist/include/gfxFont.h:1331
1331            NS_ASSERTION(0 <= aPos && aPos < mCharacterCount, "aPos out of range");
(gdb) bt
#0  0x008c11df in gfxTextRun::IsClusterStart (this=0x0, aPos=0) at ../../dist/include/gfxFont.h:1331
#1  0x01c34189 in TextRunWordCache::FinishTextRun (this=0xb47f9d30, aTextRun=0xab61b9a0, aNewRun=0x0, 
    aParams=0xbfff9478, aDeferredWords=..., aSuccessful=0)
    at /build/m-c/mozilla/gfx/thebes/gfxTextRunWordCache.cpp:456
#2  0x01c35d11 in TextRunWordCache::MakeTextRun (this=0xb47f9d30, 
    aText=0xa1100008 'A' <repeats 200 times>..., aLength=134217727, aFontGroup=0xab7a5bc0, 
    aParams=0xbfff9478, aFlags=22282528) at
/build/m-c/mozilla/gfx/thebes/gfxTextRunWordCache.cpp:819
#3  0x01c36650 in gfxTextRunWordCache::MakeTextRun (aText=0xa1100008 'A' <repeats 200 times>..., 
    aLength=134217727, aFontGroup=0xab7a5bc0, aParams=0xbfff9478, aFlags=22282528)
    at /build/m-c/mozilla/gfx/thebes/gfxTextRunWordCache.cpp:1013
#4  0x008acdbf in MakeTextRun (aText=0xa1100008 'A' <repeats 200 times>..., aLength=134217727, 
    aFontGroup=0xab7a5bc0, aParams=0xbfff9478, aFlags=22282528)
    at /build/m-c/mozilla/layout/generic/nsTextFrameThebes.cpp:531
#5  0x008b07f1 in BuildTextRunsScanner::BuildTextRunForFrames (this=0xbfffb99c, aTextBuffer=0xa9100007)
    at /build/m-c/mozilla/layout/generic/nsTextFrameThebes.cpp:1880
Copied from bug 609358 comment 2:

In general, dumping huge strings into the document should eventually lead to an
out-of-memory abort. However, this crash shows a case where we allow an
allocation to fail (without aborting) but then don't handle the failure
properly, hence the null-deref.

When TextRunWordCache::MakeTextRun creates a new run, it's possible that
gfxFontGroup::MakeTextRun will return NULL (if it failed to allocate memory for
the glyphs and text). This will lead to aSuccessful==FALSE in FinishTextRun(),
and in this case we must avoid dereferencing the "source" run pointer within
the FinishTextRun loop, as it may be null.
Comment on attachment 488407 [details] [diff] [review]
patch, check for null return from gfxFontGroup::MakeTextRun

>-        // If the word starts inside a cluster we don't want this word
>-        // in the cache, so we'll remove the associated cache entry
>-        PRBool wordStartsInsideCluster =
>-            !source->IsClusterStart(word->mSourceOffset);
>-        PRBool wordStartsInsideLigature =
>-            !source->IsLigatureGroupStart(word->mSourceOffset);

>+            if (aSuccessful) {
>+                wordStartsInsideCluster =
>+                    !aNewRun->IsClusterStart(word->mSourceOffset);
>+                wordStartsInsideLigature = aSuccessful &&
>+                    !aNewRun->IsLigatureGroupStart(word->mSourceOffset);

>             PRBool stealData = source == aNewRun;
>+            PRBool wordStartsInsideCluster =
>+                !source->IsClusterStart(word->mSourceOffset);
>+            PRBool wordStartsInsideLigature =
>+                !source->IsLigatureGroupStart(word->mSourceOffset);

Is there a reason for splitting wordStartsInside* initialization?
It would be less code to leave the initialization where it is (but still make
it conditional).

>+            PRBool wordStartsInsideCluster = PR_FALSE;
>+            PRBool wordStartsInsideLigature = PR_FALSE;

I prefer not unnecessarily initializing variables with garbage, as it makes it
impossible for valgrind to check that it was initialized correctly.  I know
other people see things differently because compilers are typically not as
clever as valgrind; I don't see that as a valid reason to do this but there
are differing viewpoints, so I let you choose.

>+                rekeyWithFontGroup =
>+                    GetWordFontOrGroup(aNewRun,
>+                                       word->mSourceOffset,
>+                                       word->mLength) != font && !useFontGroup;
>+            }
>             if (!aSuccessful || rekeyWithFontGroup ||

Replacing rekeyWithFontGroup with

  PRBool removeFontKey == !aSuccessful || GetWordFontOrGroup(aNewRun...)

(or similar logic in your favourite style) would make the extra !aSuccessful
test unnecessary.

>-                    source = tmpTextRun;
>-                    sourceOffset = 0;
>-                    stealData = PR_TRUE;
>+                    // If we failed to create the temporary run, just proceed
>+                    // as if nothing happened; we may get the wrong
>+                    // glyphs/shaping but that's better than passing NULL
>+                    // to CopyGlyphDataFrom() below!
>+                    if (tmpTextRun) {
>+                        source = tmpTextRun;
>+                        sourceOffset = 0;
>+                        stealData = PR_TRUE;
>+                    }

I'm worried we might have iterators that assume that the first character of a
run will be a cluster start.  Can you be sure we are not making that
assumption?  Even if we are not yet, it would seem a nice invariant to have.

>@@ -633,21 +652,29 @@ TextRunWordCache::MakeTextRun(const PRUn
>                                                       mBidiNumeral);

I'm surprised to see IBMBIDI_NUMERAL_NOMINAL handling in gfxTextRunWordCache,
which is already very complex, but that's not this bug.
The changes here look correct.

I don't actually see this bug as a high priority.  I think we are fighting a
losing battle if we are trying to protect against OOM plain DoS bugs.

As is, with the first glyph of a run becoming a cluster continuation, there
seems more risk here than reward, though I think we can avoid that.
Updated based on comment #2. This version will discard the current word in the case where tmpTextRun is NULL, rather than potentially break assumptions about cluster boundaries. This is the same failure mode as we'd get if the original MakeTextRun had failed, so that aSuccessful was false. (In practice, it will only happen in extreme-memory-pressure situations where worse problems are likely to be happening anyway.)

I agree that trying to prevent basic DoS attacks from huge string creation is a no-win effort. If these kinds of testcases lead to a safe out-of-memory abort, fine; that's expected. But when a fallible allocation fails, as in the gfxFontGroup::MakeTextRun case, and we don't handle the failure cleanly but end up dereferencing a null pointer, etc., then we should fix those bugs.
Attachment #488407 - Attachment is obsolete: true
Attachment #492431 - Flags: review?(karlt)
Comment on attachment 492431 [details] [diff] [review]
patch, v2 - handle null return from gfxFontGroup::MakeTextRun

>+                (GetWordFontOrGroup(aNewRun,
>+                                    word->mSourceOffset,
>+                                    word->mLength) != font && !useFontGroup);

I lost !useFontGroup in GetWordFontOrGroup.  How about this layout?:

                (GetWordFontOrGroup(aNewRun,
                                    word->mSourceOffset, word->mLength) != font
                 && !useFontGroup);

>@@ -633,21 +651,33 @@ TextRunWordCache::MakeTextRun(const PRUn

>-                DeferredWord word = { numRun, 0, wordStart, length, hash };
>-                deferredWords.AppendElement(word);
>-                transientRuns.AppendElement(numRun);
>-                seenDigitToModify = PR_FALSE;
>-            } else {
>+                // If MakeTextRun failed, numRun will be null, which is bad...
>+                // we'll just pretend there wasn't a digit to process.
>+                // This means we won't have the correct numerals, but at least
>+                // we're not trying to copy glyph data from an invalid source.
>+                // In practice it's unlikely to happen unless we're very close
>+                // to crashing due to OOM.
>+                if (numRun) {
>+                    DeferredWord word = { numRun, 0, wordStart, length, hash };
>+                    deferredWords.AppendElement(word);
>+                    transientRuns.AppendElement(numRun);
>+                } else {
>+                    seenDigitToModify = PR_FALSE;
>+                }
>+            }
>+
>+            if (!seenDigitToModify) {
>+                // didn't need to modify digits (or failed to do so)

I'm guessing the last 3 hunks of this patch were accidentally dropped?
seenDigitToModify is not reset to PR_FALSE when numRun != NULL,
and the PRUint8 MakeTextRun modifications are not in this patch.
Attachment #492431 - Flags: review?(karlt)
> I'm guessing the last 3 hunks of this patch were accidentally dropped?
> seenDigitToModify is not reset to PR_FALSE when numRun != NULL,
> and the PRUint8 MakeTextRun modifications are not in this patch.

Arghhh... not sure what happened there! This version should be correct. And I rearranged that conditional - see if you like this better.
Attachment #492431 - Attachment is obsolete: true
Attachment #492939 - Flags: review?(karlt)
Attachment #492939 - Flags: review?(karlt) → review+
Comment on attachment 492939 [details] [diff] [review]
patch, v2.1 - handle null return from gfxFontGroup::MakeTextRun - corrected!

Requesting approval for mozilla-2.0; this fixes a few places where we don't correctly handle out-of-memory and so may end up dereferencing NULL, and therefore crashing (at least; it's not clear whether worse outcomes than DoS are possible here).

(Testcases that insert arbitrarily long strings into the document will still crash eventually, but they're supposed to abort safely in mozalloc rather that proceed to use invalid pointers and then segfault.)
Attachment #492939 - Flags: approval2.0?
Attachment #492939 - Flags: approval2.0? → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/ee61ea93a2fa
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Backed this out as it made OSX crashtests go permanently orange.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This patch can probably re-land after the tree opens, it looks like it wasn't the cause of the crashtest failures after all.
Re-landed: http://hg.mozilla.org/mozilla-central/rev/8cec27abeda2
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Depends on: 620354
You need to log in before you can comment on or make changes to this bug.