Last Comment Bug 701637 - DOM related GC | Cycle Collector Crash triggered by harfbuzz buffer overrun
: DOM related GC | Cycle Collector Crash triggered by harfbuzz buffer overrun
Status: VERIFIED FIXED
[sg:critical][qa!]
: crash, verified-aurora, verified-beta
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: x86 Windows XP
: -- critical (vote)
: mozilla9
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
http://www.haveeru.com.mv/dhivehi/sou...
: 707095 (view as bug list)
Depends on: 714067
Blocks: 532972
  Show dependency treegraph
 
Reported: 2011-11-11 00:11 PST by Bob Clary [:bc:]
Modified: 2015-10-16 11:40 PDT (History)
10 users (show)
khuey: in‑testsuite-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
+
fixed
+
fixed
+
fixed
unaffected


Attachments
patch, avoid the buffer overrun error (2.18 KB, patch)
2011-11-18 04:18 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Review
patch v2 - rewrite all the loops involving _hb_ot_layout_skip_mark (10.63 KB, patch)
2011-11-22 10:10 PST, Jonathan Kew (:jfkthame)
mozilla: review-
Details | Diff | Review
patch v2.1 - updated for review comments (9.20 KB, patch)
2011-11-22 12:00 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Review
patch v2.2 - fixed indexing in match_input (9.23 KB, patch)
2011-11-23 01:43 PST, Jonathan Kew (:jfkthame)
mozilla: review+
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta+
Details | Diff | Review

Description Bob Clary [:bc:] 2011-11-11 00:11:57 PST
1. http://www.haveeru.com.mv/dhivehi/south_asia/112652
   Windows XP only I think. May need to reload several times.

2. Crash [@ nsAttrValue::EnsureEmptyMiscContainer() ] 

Beta bp-dac1f532-e438-450c-b5d3-5e3c12111110

Aurora bp-67ee699e-b484-4106-bb16-0f7c72111111

Nightly bp-3267b454-6e6c-4a49-9da5-d4bfe2111111

Other urls:
http://www.haveeru.com.mv/dhivehi
http://www.haveeru.com.mv/dhivehi/atolls/112618

See also bug 681118 which is mobile ARM/Android only.

I originally saw this in the crash automation where debug Beta builds on Windows XP were firing crashdumps with _hb_ot_layout_get_glyph_property _hb_ot_layout_skip_mark PairPosFormat2::apply(hb_apply_context_t*) PairPos::apply(hb_apply_context_t*) PosLookupSubTable::apply(hb_apply_context_t*, unsigned int) stacks, but still exiting normally though Ted's exploitable tool rated these as low. Ted, I recently updated all my minidump_stackwalk's and your exploitable tool to tip. Any idea why it would generate dumps for the browser process while the browser would still exit normally?

Making sensitive since Ted's flagged it as low and it is GC/CC related.
Comment 1 Ted Mielczarek [:ted.mielczarek] 2011-11-11 04:37:42 PST
These look like two different stacks. I'm not sure why we'd exit with a normal status and still write a dump, that doesn't ring a bell.
Comment 2 Daniel Veditz [:dveditz] 2011-11-16 17:17:19 PST
Please capture the testcases in the bug in case they go away or change.
Comment 3 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-16 17:18:26 PST
Kyle, can you dig in here?
Comment 4 Bob Clary [:bc:] 2011-11-17 04:17:36 PST
I haven't been able to get a saved test case that crashes, but if you go to the site and start opening links into new tabs you will crash in pretty short order with a variety of signatures.

bp-88b4b9ca-05a9-42e9-bf12-ceb882111117
[@ nsHttpChannel::GetLoadGroup(nsILoadGroup**) ] 

bp-5617f544-f9f9-420e-a14a-da88c2111117
[@ imgRequestProxy::~imgRequestProxy() ] 

bp-32b45b5d-4b7a-431c-bf46-076c72111117
[@ msvcr80.dll@0x173f2 ] 

bp-8fd9ee75-ecf9-4931-a790-57a502111117
[@ gfxTextRun::`vector deleting destructor''(unsigned int) ]
Comment 5 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-17 05:36:34 PST
Sure.  I was able to capture the crash in the record and replay setup and am investigating.
Comment 6 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-17 20:42:22 PST
This appears to be an off by one error in HarfBuzz.

At http://hg.mozilla.org/mozilla-central/annotate/b62e6ee5ba9b/gfx/harfbuzz/src/hb-ot-layout-gpos-private.hh#l621 we have the following:

624    unsigned int end = MIN (c->buffer->len, c->buffer->i + c->context_length);

<snip>

632     unsigned int j = c->buffer->i + 1;
633     while (_hb_ot_layout_skip_mark (c->layout->face, &c->buffer->info[j], c->lookup_props, NULL))
634     {
635       if (unlikely (j == end))
636 	    return false;
637       j++;
638     }

The problem is that c->buffer->len starts counting at 1, while array indices start counting at 0.  It's thus possible to run off the end of the array and into somebody else's memory.  _hb_ot_layout_skip_mark calls _hb_ot_layout_get_glyph_property which writes beyond the end of the loop.  This causes the memory corruption which later brings down the browser.
Comment 7 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-17 20:47:11 PST
I have this in a replay machine if that's needed to proceed.  Otherwise, somebody who knows HarfBuzz should finish this off.
Comment 8 Jonathan Kew (:jfkthame) 2011-11-18 04:18:41 PST
Created attachment 575421 [details] [diff] [review]
patch, avoid the buffer overrun error

It looks to me like the loop-exit test "j == end" needs to happen after j has been incremented in the loop, not before.

This idiom occurs several times in this file, and each occurrence requires the same fix.

Behdad, I realize this file has been renamed in the current HB trunk, but it looks like the same error is still present - please check and verify whether the fix is correct, and apply upstream as necessary. Thanks!
Comment 9 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-18 04:41:52 PST
FWIW, that 'end' still points past the end of the array feels pretty nasty to me, even though the patch should fix the bug.
Comment 10 Behdad Esfahbod 2011-11-18 13:10:38 PST
Lemme check.

Kyle, end pointing to to after the array is a very common idiom, and welldefined by the C spec.
Comment 11 Jonathan Kew (:jfkthame) 2011-11-21 00:25:58 PST
Right, it's equivalent to using the begin() and end() iterators on a container, for example; "end" points immediately after the last item. No problem with that; but the "end" iterator (or index) must not be dereferenced.

Behdad, please confirm the fix here is OK so that we can land it for the code as present in our tree, as well as fixing it upstream in harfbuzz for future versions.
Comment 12 Behdad Esfahbod 2011-11-21 06:42:15 PST
(09:39:12 AM) behdad: jfkthame: so
(09:39:27 AM) behdad: jfkthame: would it make more sense to rewrite the loops as do{}while?
(09:39:48 AM) behdad: we're assuming that at the beginning of the loop j can't be equal to end
(09:40:02 AM) behdad: and that's correct only because the if at the beginning of the functions takes care of it
(09:40:15 AM) behdad: that's too fragile / unclear
(09:42:07 AM) behdad: jfkthame: might even be worth putting that loop into a function
Comment 13 Jonathan Kew (:jfkthame) 2011-11-22 10:10:05 PST
Created attachment 576197 [details] [diff] [review]
patch v2 - rewrite all the loops involving _hb_ot_layout_skip_mark

Here's a more extensive patch - I've tried to check each of the loops involving to _hb_ot_layout_skip_mark and adjusted exit criteria to avoid potential out-of-bounds access. In a number of cases I've changed while() loops into do...while(), where this seemed to fit well with the control flow and conditions needed.

This seems to work OK with a few fonts I've tried, but has not been extensively tested yet.
Comment 14 Behdad Esfahbod 2011-11-22 11:09:33 PST
Comment on attachment 576197 [details] [diff] [review]
patch v2 - rewrite all the loops involving _hb_ot_layout_skip_mark

Review of attachment 576197 [details] [diff] [review]:
-----------------------------------------------------------------

Generally looks good.  I'll do them all as do{}while() upstream, but what you have is fine.  Except for one chunk which looks like optimization, but is wrong.

::: gfx/harfbuzz/src/hb-ot-layout-gpos-private.hh
@@ +622,5 @@
>    {
>      TRACE_APPLY ();
> +    unsigned int j = c->buffer->i;
> +    unsigned int end = MIN (c->buffer->len, j + c->context_length);
> +    if (unlikely (j >= end))

This is harmless, but not equivalent to what was there before.  The old code was accounting for the fact that we need at least two glyphs.  But, this version is simpler.  I like.

::: gfx/harfbuzz/src/hb-ot-layout-gsub-private.hh
@@ +347,3 @@
>      unsigned int count = component.len;
> +    unsigned int end = MIN (c->buffer->len, j + c->context_length);
> +    if (unlikely (j + count >= end))

This however, keeps the count in the new version.  I suggest we drop it and do a simple j >= end here too.  I'm thinking about adding a struct to replace this logic.

@@ +351,5 @@
>  
>      bool first_was_mark = (c->property & HB_OT_LAYOUT_GLYPH_CLASS_MARK);
>      bool found_non_mark = false;
>  
> +    for (i = 1; i < count; i++)

Humm, took me a while to confirm that the new loop correctly skips over the matched glyphs.  But it does, and I like how it reads.  Nice.

@@ +361,1 @@
>  	if (unlikely (j + count - i == end))

Same here.  We can just do j == end.

@@ +398,3 @@
>  	{
> +	  c->buffer->info[k].lig_comp() = i;
> +	  c->buffer->info[k].lig_id() = lig_id;

This is wrong.  'k' is static, whereas in the previous code, buffer->i was advancing.  Suggest backout the k thing.  That's irrelevant to the issue at hand, and wrong.

::: gfx/harfbuzz/src/hb-ot-layout-gsubgpos-private.hh
@@ +140,1 @@
>  	return false;

The LigatureSubst logic is exactly the same as this one and your do{}while version makes much more sense IMO.  In fact, maybe we should make LigatureSubst call this function.

@@ +164,2 @@
>      {
> +      if (unlikely (j < count - i))

Like my other comments, we can simplify this and just id "if (!j)".  What you have is fine though.

@@ +191,5 @@
>    {
>      while (_hb_ot_layout_skip_mark (c->layout->face, &c->buffer->info[j], c->lookup_props, NULL))
>      {
> +      j++;
> +      if (unlikely (j + count - i > end))

Same here.  I like the do{}while() version more.
Comment 15 Jonathan Kew (:jfkthame) 2011-11-22 12:00:48 PST
Created attachment 576223 [details] [diff] [review]
patch v2.1 - updated for review comments

Updated to take account of your comments above - thanks.
Comment 16 Behdad Esfahbod 2011-11-22 12:26:17 PST
Comment on attachment 576223 [details] [diff] [review]
patch v2.1 - updated for review comments

Review of attachment 576223 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, except for one minor issue.

::: gfx/harfbuzz/src/hb-ot-layout-gsubgpos-private.hh
@@ +133,3 @@
>      return false;
>  
> +  j = j + 1;

Do we need this?  Given that j is incremented in the do{}while loop?
Comment 17 Jonathan Kew (:jfkthame) 2011-11-22 14:17:49 PST
Yes, because in the case where count==1, the for-loop body (and hence the do-while loop) is not executed at all, and we'd then end up returning context_length_out as zero, which is wrong.
Comment 18 Behdad Esfahbod 2011-11-22 19:57:53 PST
(In reply to Jonathan Kew (:jfkthame) from comment #17)
> Yes, because in the case where count==1, the for-loop body (and hence the
> do-while loop) is not executed at all, and we'd then end up returning
> context_length_out as zero, which is wrong.

I'm not convinced.  You have a point, but I think the correct fix is to return j - buffer->i + 1 instead.

Two j++'s in a row, without any check on the glyph being skipped is wrong.
Comment 19 Jonathan Kew (:jfkthame) 2011-11-23 01:43:40 PST
Created attachment 576429 [details] [diff] [review]
patch v2.2 - fixed indexing in match_input

(In reply to Behdad Esfahbod from comment #18)
> (In reply to Jonathan Kew (:jfkthame) from comment #17)
> > Yes, because in the case where count==1, the for-loop body (and hence the
> > do-while loop) is not executed at all, and we'd then end up returning
> > context_length_out as zero, which is wrong.
> 
> I'm not convinced.  You have a point, but I think the correct fix is to
> return j - buffer->i + 1 instead.

You're right. I'd almost finished a detailed reply explaining why it _was_ correct as I wrote it, but I think that was just my jet-lag talking. Must be time to make coffee....
Comment 20 Behdad Esfahbod 2011-11-23 11:30:21 PST
Comment on attachment 576429 [details] [diff] [review]
patch v2.2 - fixed indexing in match_input

Review of attachment 576429 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.
Comment 21 Behdad Esfahbod 2011-11-23 11:30:49 PST
Please ping me when it's ok to push this upstream.
Comment 22 Jonathan Kew (:jfkthame) 2011-11-24 00:00:09 PST
Pushed to mozilla-central:
https://hg.mozilla.org/mozilla-central/rev/de483d897af4
Comment 23 Jonathan Kew (:jfkthame) 2011-11-24 00:11:23 PST
Comment on attachment 576429 [details] [diff] [review]
patch v2.2 - fixed indexing in match_input

Nominating for aurora and beta as this is a potential vulnerability in our currently-shipping code. I'm not sure how feasible it would be for someone to exploit this for arbitrary code execution rather than "just" a crash/DoS (perhaps using a specially-crafted font, along with specific text strings that hit the erroneous code paths), but anything that can manifest as heap corruption seems dangerous.
Comment 24 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-11-28 06:46:09 PST
Is the failure mode here understood well enough to write an automated test case for it?
Comment 25 christian 2011-11-28 13:01:44 PST
Comment on attachment 576429 [details] [diff] [review]
patch v2.2 - fixed indexing in match_input

[triage comment]
This looks...scary. How well tested is this area of code? We'll approve it but we definitely have reservations.
Comment 27 Jonathan Kew (:jfkthame) 2011-12-02 01:14:14 PST
(In reply to Christian Legnitto [:LegNeato] from comment #25)
> [triage comment]
> This looks...scary. How well tested is this area of code? We'll approve it
> but we definitely have reservations.

Hmm, do you mean the crash looks scary, or the patch looks scary? Or both? :)

IMO, although the crash is bad, it would probably be pretty challenging to exploit in any predictable way - making use of the heap corruption caused by overshooting the glyph buffer by one position to achieve a controlled, malicious result seems like it'd be really hard.

As for the patch, I don't think it's quite as scary as it looks, really. The specific error that caused this crash was pinpointed and clearly understood (thanks, Kyle - comment 6), and could have been fixed with a very small and obviously correct patch (comment 8). However, on examination of similar code fragments we found that the equivalent problem - and hence the potential for corruption - was present in multiple places, and so we're fixing them all pre-emptively; and in discussion with Behdad, we decided to rewrite the loops in a form that makes the indexing and loop-termination logic easier to follow, as it was a bit confusing as originally written. All this makes for a larger and more complex-looking patch, but a better result.

I strongly suspect bug 707095 (newly filed from crash-stats) is a manifestation of this bug, and so those reports should die away once the fix here gets out to users.
Comment 28 Behdad Esfahbod 2011-12-02 11:00:16 PST
I agree with Jonathan.  We're both much more confident in the code being added here than just trying to fix the old code with minimal changes.
Comment 29 Matt Evans [:mevans] 2011-12-03 14:56:42 PST
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111203 Firefox/11.0a1
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a2) Gecko/20111203 Firefox/10.0a2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0

Loaded websites listed in the Desciption many times. Crash was easily reproducible with FF8. Validated on Win7 across all channels.
Comment 30 Daniel Veditz [:dveditz] 2011-12-07 14:44:04 PST
Harfbuzz didn't exist on the 1.9.2 branch so this bug should not affect it.
Comment 31 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2012-01-15 13:34:34 PST
*** Bug 707095 has been marked as a duplicate of this bug. ***
Comment 32 Behdad Esfahbod 2012-02-21 18:09:27 PST
This has been upstreamed already btw.

Note You need to log in before you can comment on or make changes to this bug.