Last Comment Bug 883514 - (CVE-2013-1732) Global buffer overflow (read 4) at nsFloatManager::GetFlowArea() with multicol, list, floats
(CVE-2013-1732)
: Global buffer overflow (read 4) at nsFloatManager::GetFlowArea() with multico...
Status: RESOLVED FIXED
[asan][adv-main24+][adv-esr1709+]
: csectype-bounds, sec-critical
Product: Core
Classification: Components
Component: Layout: Floats (show other bugs)
: 23 Branch
: x86_64 Linux
: -- critical (vote)
: mozilla26
Assigned To: Daniel Holbert [:dholbert]
:
Mentors:
Depends on:
Blocks: 913162
  Show dependency treegraph
 
Reported: 2013-06-15 03:04 PDT by Aki Helin
Modified: 2014-11-19 20:11 PST (History)
9 users (show)
dveditz: sec‑bounty+
dholbert: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
wontfix
+
verified
+
verified
+
verified
24+
verified
24+
fixed
fixed
fixed


Attachments
original testcase (aborts debug builds when resized or when many copies are loaded) (15.86 KB, text/html)
2013-06-15 03:04 PDT, Aki Helin
no flags Details
stack (5.26 KB, text/plain)
2013-06-16 09:17 PDT, Mats Palmgren (:mats)
no flags Details
backtrace of problematic RemoveTrailingRegions() call, followed by the abort (35.09 KB, text/plain)
2013-07-19 17:21 PDT, Daniel Holbert [:dholbert]
no flags Details
backtrace of problematic RemoveTrailingRegions() call, followed by the abort (34.22 KB, text/plain)
2013-07-19 17:24 PDT, Daniel Holbert [:dholbert]
no flags Details
frametree_at_abort.txt (29.06 KB, text/plain)
2013-07-19 17:32 PDT, Daniel Holbert [:dholbert]
no flags Details
testcase 2 (reduced) (aborts debug builds when loaded) (358 bytes, text/html)
2013-07-22 13:33 PDT, Daniel Holbert [:dholbert]
no flags Details
testcase 3 (reduced) (aborts debug builds when loaded) (300 bytes, text/html)
2013-07-22 15:53 PDT, Daniel Holbert [:dholbert]
no flags Details
band-aid (probably not what we actually want to land) (1.03 KB, patch)
2013-07-22 17:10 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Review
slightly better fix? (1.26 KB, patch)
2013-07-22 17:39 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Review
fix v1 (1.74 KB, patch)
2013-07-23 14:00 PDT, Daniel Holbert [:dholbert]
dbaron: review+
dveditz: approval‑mozilla‑aurora+
dveditz: approval‑mozilla‑beta+
dveditz: approval‑mozilla‑esr17+
dveditz: sec‑approval+
Details | Diff | Review

Description Aki Helin 2013-06-15 03:04:31 PDT
Created attachment 763086 [details]
original testcase (aborts debug builds when resized or when many copies are loaded)

Opening several of the attached page at once causes ASan to spot a global buffer overflow in current aurora- and beta asan builds, most of the time. The read is aligned and pretty far away from adjecent objects, so filing as a potential security issue since it's not obvious we're always reading something fixed, like padding bytes, with fixed outcome.

The original even larger testcase needed just two of them to trigger this reliably. The attached somewhat reduced page still hits this most of the time. Let me know if you have trouble reproducing, because this could be related to machine speed etc.

$ opt/firefox-aurora-asan/firefox ff-gbof-getflowarea.html{,,,,,,,,,,,,,,} 2>&1 | symbolize | c++filt
=================================================================
==14198== ERROR: AddressSanitizer global-buffer-overflow on address 0x7ff38e6b69a0 at pc 0x7ff386ed8bb1 bp 0x7fff9afa6740 sp 0x7fff9afa6738
READ of size 4 at 0x7ff38e6b69a0 thread T0
    #0 0x7ff386ed8bb0 in nsFloatManager::GetFlowArea(int, nsFloatManager::BandInfoType, int, nsRect, nsFloatManager::SavedState*) const /home/aki/src/mozilla-aurora/layout/generic/nsFloatManager.cpp:140
0x7ff38e6b69a0 is located 32 bytes to the left of global variable 'mozilla::sse_private::sse3_enabled (/home/aki/src/mozilla-aurora/objdir-ff-asan/xpcom/build/SSE.cpp)' (0x7ff38e6b69c0) of size 1
0x7ff38e6b69a0 is located 24 bytes to the right of global variable 'nsTArrayHeader::sEmptyHdr (/home/aki/src/mozilla-aurora/objdir-ff-asan/xpcom/build/nsTArray.cpp)' (0x7ff38e6b6980) of size 8
  'nsTArrayHeader::sEmptyHdr (/home/aki/src/mozilla-aurora/objdir-ff-asan/xpcom/build/nsTArray.cpp)' is ascii string ''
Shadow byte and word:
  0x1ffe71cd6d34: f9
  0x1ffe71cd6d30: 00 f9 f9 f9 f9 f9 f9 f9
More shadow bytes:
  0x1ffe71cd6d10: 04 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d18: 01 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d20: 04 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d28: 00 00 00 f9 f9 f9 f9 f9
=>0x1ffe71cd6d30: 00 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d38: 01 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d40: 01 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d48: 01 f9 f9 f9 f9 f9 f9 f9
  0x1ffe71cd6d50: 01 f9 f9 f9 f9 f9 f9 f9
Stats: 150M malloced (212M for red zones) by 600810 calls
Stats: 12M realloced by 34962 calls
Stats: 102M freed by 265916 calls
Stats: 0M really freed by 0 calls
Stats: 400M (102447 full pages) mmaped in 100 calls
  mmaps   by size class: 8:524256; 9:57337; 10:24570; 11:16376; 12:4096; 13:4096; 14:1280; 15:512; 16:1152; 17:128; 18:32; 19:8; 20:4;
  mallocs by size class: 8:502183; 9:54311; 10:19963; 11:14517; 12:3446; 13:3628; 14:1128; 15:394; 16:1111; 17:100; 18:17; 19:8; 20:4;
  frees   by size class: 8:202047; 9:33734; 10:12283; 11:11046; 12:2220; 13:2726; 14:536; 15:272; 16:944; 17:90; 18:11; 19:4; 20:3;
  rfrees  by size class:
Stats: malloc large: 129 small slow: 2773
==14198== ABORTING
Comment 1 Mats Palmgren (:mats) 2013-06-16 09:17:48 PDT
Created attachment 763255 [details]
stack

In a local m-c debug build on Linux64 I get a fatal assertion:

###!!! ABORT: bad state: 'floatCount <= mFloats.Length()', file layout/generic/nsFloatManager.cpp, line 131

(gdb) list
126       // Determine the last float that we should consider.
127       uint32_t floatCount;
128       if (aState) {
129         // Use the provided state.
130         floatCount = aState->mFloatInfoCount;
131         NS_ABORT_IF_FALSE(floatCount <= mFloats.Length(), "bad state");
132       } else {
133         // Use our current state.
134         floatCount = mFloats.Length();
135       }
(gdb) p floatCount
$10 = 1
(gdb) p this
$11 = (const nsFloatManager *) 0x7fffb3adb1c0
(gdb) p this->mFloats.Length()
$12 = 0
Comment 2 Mats Palmgren (:mats) 2013-06-16 17:12:10 PDT
The assertion also occurs in FF21 and ESR17 debug builds on Linux64.
I wonder it's a regression from bug 25888?
Comment 3 Mats Palmgren (:mats) 2013-06-17 07:44:33 PDT
Jet, can you find an owner for this bug please?
Comment 4 Daniel Veditz [:dveditz] 2013-06-19 10:31:17 PDT
dbaron: do you agree w/Mats that this might be related to bug 25888?
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 14:48:40 PDT
So far I don't see the abort locally, though it's possible that's due to patches I have locally.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 14:49:56 PDT
(In reply to Daniel Veditz [:dveditz] from comment #4)
> dbaron: do you agree w/Mats that this might be related to bug 25888?

Is it important to know that?  I'd rather spend the time trying to fix it than figuring that out.
Comment 7 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 17:54:42 PDT
I don't see the ABORT in a clean tree either.
Comment 8 Aki Helin 2013-06-21 23:34:36 PDT
dbaron, did you open the page in multiple tabs? This won't reproduce if you open it just once. 

This trace has come up multiple times over the last week on my test machine, so the bug should still be there. The repro might depend on on build and/or moon phase though.
Comment 9 Daniel Veditz [:dveditz] 2013-06-26 10:28:06 PDT
dbaron: if you'd like to try an ASAN build (often makes this type of problem more deterministic) you can find them at
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/

There are also debug asan builds for linux.
Comment 10 Aki Helin 2013-07-16 04:20:20 PDT
FWIW, this seems to still reproduce in aurora built today. A few more pages were needed to reproduce reliably on a slower Core 2 Duo laptop, so it could be useful to try this with thread sanitizer.
Comment 11 Daniel Veditz [:dveditz] 2013-07-17 18:00:26 PDT
> (In reply to Daniel Veditz [:dveditz] from comment #4)
> > dbaron: do you agree w/Mats that this might be related to bug 25888?
> 
> Is it important to know that?  I'd rather spend the time trying to fix it
> than figuring that out.

Sorry, reflexive question trying to figure out which releases might need fixing. But Mats already answered that ESR and Release channels are affected (assuming the assert represents the same bad state, which seems likely).
Comment 12 Daniel Veditz [:dveditz] 2013-07-18 13:40:29 PDT
Rather than an unowned sec-crit I'm going to interpret comment 6 as volunteering.
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-07-18 23:12:09 PDT
In turn, I'm going to give it to dholbert.
Comment 14 Daniel Holbert [:dholbert] 2013-07-19 14:03:45 PDT
In my ASAN debug+opt build, I'm seeing the abort that mats mentioned in comment 1 (about 1/3 of the time that I start Firefox as-directed in comment 0, with the "{,,,,,,,,,,,,,,}" to get many copies of the testcase).

I haven't yet seen the global-buffer-overflow. I wonder if that only happens if we continue executing after the (debug-only) abort...?
Comment 15 Daniel Holbert [:dholbert] 2013-07-19 14:06:56 PDT
(In reply to Daniel Holbert [:dholbert] from comment #14)
> I haven't yet seen the global-buffer-overflow. I wonder if that only happens
> if we continue executing after the (debug-only) abort...?

(Looks like that's correct -- the abort is checking that floatCount <= mFloats.Length(), and then the buffer-overflow is a few lines later, when we do mFloats[floatCount-1].  If floatCount is too large, it makes sense that mFloats[floatCount-1] would likely be an out-of-bounds read.)
Comment 16 Lukas Blakk [:lsblakk] use ?needinfo 2013-07-19 14:21:01 PDT
If low risk fix is ready within the next week we could consider it for Beta uplift.
Comment 17 Daniel Holbert [:dholbert] 2013-07-19 15:36:57 PDT
Per comment 1: when this fails, mFloatInfoCount == floatCount == 1, whereas mFloats.Length() is 0.

Here's approximately how we get in that situation:
 (a) Initially, nsFloatManager::mFloats has length 1. (It contains 1 frame.)

 (b) We call nsFloatManager::PushState, which (among other things) saves a snapshot of that float-count, in nsBlockReflowState::mFloatManagerStateBefore. So that state now has mFloatInfoCount = 1.

 (c) We call nsFloatManager::RemoveTrailingRegions, which *removes the frame* from nsFloatManager::mFloats.

 (d) Now, we call GetFloatAvailableSpaceWithState(), passing in mFloatManagerStateBefore. This assumes (per the ABORT_IF_FALSE) that our float manager still has at least as many floats as it had in part (b). **But it does not**, due to the RemoveTrailingRegions call.

So the RemoveTrailingRegions call is what's giving us trouble here, it would appear.
Comment 18 Daniel Holbert [:dholbert] 2013-07-19 15:39:52 PDT
Also: We get lots of calls to RemoveTrailingRegions that leave mFloats the same length (i.e. keep it at 0, or keep it at 1), likely for other tabs, before we hit the problematic call.

In my debugging printf's, I only see one call to RemoveTrailingRegions that shrinks mFloats from 1 to 0, and it's immediately before we explode.
Comment 19 Daniel Holbert [:dholbert] 2013-07-19 17:14:55 PDT
The RemoveTrailingRegions call is preceded by this comment:
> 6976     // [...] It is safe to do this here
> 6977     // because we know from here on the float manager will only be
> 6978     // used for its XMost and YMost, not to place new floats and
> 6979     // lines.

This comment is apparently incorrect; the float manager is *not* just  used for XMost and YMost. It's also used for the "GetFloatAvailableSpaceWithState" call, which touches the mFloats list. This means it is *not* safe.
Comment 20 Daniel Holbert [:dholbert] 2013-07-19 17:21:37 PDT
Created attachment 778739 [details]
backtrace of problematic RemoveTrailingRegions() call, followed by the abort

MXR link for that comment and the RemoveTrailingRegions invocation:
 http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsBlockFrame.cpp#6976

I'm attaching a backtrace from the faulty RemoveTrailingRegions() invocation (note that in this case, it was triggered by a window-resize).

I typed "fin" a few times to finish execution of the deepest stackframes, and I hit the abort while finishing up a block reflow.
Comment 21 Daniel Holbert [:dholbert] 2013-07-19 17:24:18 PDT
Created attachment 778742 [details]
backtrace of problematic RemoveTrailingRegions() call, followed by the abort

(cleaned up my gdb log slightly, to prune out a spammy warning that gdb prints along with every null pointer for some reason)
Comment 22 Daniel Holbert [:dholbert] 2013-07-19 17:32:53 PDT
Created attachment 778748 [details]
frametree_at_abort.txt

(For reference, here's the frame tree from that same debugging session, when we hit the abort.)
Comment 23 Daniel Holbert [:dholbert] 2013-07-22 12:22:50 PDT
Also: I can reliably trigger the abort by loading the testcase (just one copy) and then resizing the window horizontally.
Comment 24 Daniel Holbert [:dholbert] 2013-07-22 13:33:16 PDT
Created attachment 779375 [details]
testcase 2 (reduced) (aborts debug builds when loaded)

Here's a reduced testcase. This aborts my debug build when I load it.
Comment 25 Daniel Holbert [:dholbert] 2013-07-22 15:53:54 PDT
Created attachment 779463 [details]
testcase 3 (reduced) (aborts debug builds when loaded)

Here's a slightly-more-reduced testcase.
Comment 26 Daniel Holbert [:dholbert] 2013-07-22 17:10:09 PDT
Created attachment 779508 [details] [diff] [review]
band-aid (probably not what we actually want to land)

Here's a mostly-for-debugging purposes patch, which prevents the abort & out-of-bounds read. (It just changes the MOZ_ASSERT'ed condition into an "if" check, and averts disaster (and spams a NS_ERROR assertion) if we're about to explode.)
Comment 27 Daniel Holbert [:dholbert] 2013-07-22 17:39:15 PDT
Created attachment 779524 [details] [diff] [review]
slightly better fix?

This patch also fixes the issue. It's probably a little better than the "band-aid", but it still seems hacky...

This patch fixes the inconsistency between phase (c) and phase (d) in comment 17 -- that is, when we (potentially) shrink we shrink mFloats, **immediately** update mFloatManagerStateBefore to be sure it's not too large.

I don't like this, because it seems like it could result in incorrect (albeit non-crashing) behavior in GetFloatAvailableSpaceWithState, in some circumstances.
Comment 28 Daniel Holbert [:dholbert] 2013-07-23 12:44:22 PDT
Actually, even better: if we can just reverse the order of (c) and (d) from comment 17, then we'd be OK! (Then, the quote in comment 19 would actually be correct RE "It is safe")

In particular: if we delay nsBlockFrame::Reflow's CheckFloats() call (which makes the problematic  RemoveTrailingRegions call) to happen a bit later -- just before ComputeFinalSize(), *after* we reflow our bullets -- that would take care of this bug, I think.

(Note that the CheckFloats() call originally lived just before ComputeFinalSize() when we initially added it, in the patch on bug 282173.  At that point, there wasn't any bullet-reflow code there -- but the patch that later inserted that code (in bug 367332) seems to care more about being *before ComputeFinalSize* than about being *after CheckFloats*.  So I suspect the ordering of the bullet code vs. CheckFloats() was & is arbitrary.)

Try push (without any mention of this bug or any tempting security-related comments), to see if that change breaks anything:
 https://tbpl.mozilla.org/?tree=Try&rev=ed8a4d891a59
Comment 29 Daniel Holbert [:dholbert] 2013-07-23 14:00:29 PDT
Created attachment 779995 [details] [diff] [review]
fix v1

Here's that patch (now with a commit message included).

Optimistically requesting review. (If anything goes wrong with the try push, I'll cancel the review request.)
Comment 30 Daniel Holbert [:dholbert] 2013-07-23 16:17:34 PDT
Try run is green (all unit tests, linux & windows, debug + opt), modulo two jobs that hit intermittent failures.
Comment 31 Aki Helin 2013-07-24 11:24:25 PDT
Aurora + fix v1 had no issues with my other repros for this.
Comment 32 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-07-25 17:58:07 PDT
So what seems odd to me is that the RemoveTrailingRegions call is removing a float that was present in the float manager before we started reflowing the block.  (Or at least that's what it seems like.)  Is that what's happening, and do you know why?  Did some previous block fail to remove it?

I'd have expected this RemoveTrailingRegions call to be removing only floats that weren't placed prior to starting reflow of this block.

(So it seems like the layout here would be incorrect both before and after the patch.)
Comment 33 Daniel Holbert [:dholbert] 2013-07-25 18:07:12 PDT
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #32)
> So what seems odd to me is that the RemoveTrailingRegions call is removing a
> float that was present in the float manager before we started reflowing the
> block.  (Or at least that's what it seems like.)  Is that what's happening

That is indeed what's happening.

> and do you know why?  Did some previous block fail to remove it?

I don't yet know why, but I'll look into it.
Comment 34 Daniel Holbert [:dholbert] 2013-07-26 12:35:51 PDT
A few frame dumps coming up, using "testcase 3 (reduced)".

In these dumps, the block frame that we're reflowing when everything falls apart is 0x7f1c5e745b88 (which I'll call "the b88 block"). The float that gets removed (and causes trouble) is the nsImageFrame.

When we enter the doomed nsBlockFrame::Reflow call, that nsImageFrame is the one float tracked by the reflow state's nsFloatManager::mFloats, and our frame tree looks like this:

> ColumnSet(ul)(1)@0x7f1c5e7292f0 {0,0,38400,2460} vis-overflow=0,-473,38400,2933 scr-overflow=0,-473,38400,2933 [state=0002024000400631] [content=0x7f1c579ae1a0] [sc=0x7f1c5e5bfb50]<
[...]
>   Block(ul)(1)@0x7f1c5e798738 next=0x7f1c5e798888 prev-in-flow=0x7f1c5e729220 next-in-flow=0x7f1c5e798888 {5243,0,1883,1140} [state=0002100000100004] [content=0x7f1c579ae1a0] [sc=0x7f1c5e7293f8:-moz-column-content]<
[...]
>     OverflowContainersList 0x7f1c5e798bc0 <
>       Block(li)(3)@0x7f1c5e798920 prev-in-flow=0x7f1c5e745b88 next-in-flow=0x7f1c5e798a90 {0,0,1883,0} vis-overflow=0,0,1883,1440 scr-overflow=0,0,1883,1440 [state=0000120000100284] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
>       >
>     >
>   >
>   Block(ul)(1)@0x7f1c5e798888 prev-in-flow=0x7f1c5e798738 {8086,0,1883,1440} [state=0000100000100405] [content=0x7f1c579ae1a0] [sc=0x7f1c5e7293f8:-moz-column-content]<
>     line 0x7f1c5e7971c8: count=1 state=block,clean,prevmarginclean,impacted,not wrapped,before:nobr,after:nobr[0x18] {0,0,0,0} <
>       Block(li)(3)@0x7f1c5e745b88 next-in-flow=0x7f1c5e798920 {0,1140,1883,1140} vis-overflow=-840,0,2723,1140 scr-overflow=-840,0,2723,1140 [state=0000120042100601] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
>         Overflow-lines 0x7f1c55517d00/0x7f1c55517d10 <
>           line 0x7f1c5e797078: count=1 state=block,dirty,prevmargindirty,not impacted,not wrapped,before:nobr,after:nobr[0x20b] {0,0,0,0} <
>             Block(div)(1)@0x7f1c5e745ec0 next=0x7f1c5e797020 {0,0,1883,0} [state=0000120000100600] [content=0x7f1c695c5470] [sc=0x7f1c5e744dc8]<
>             >
>           >
>           line 0x7f1c5e7970b8: count=1 state=inline,dirty,prevmargindirty,not impacted,not wrapped,before:nobr,after:nobr[0xd43] {0,0,0,0} vis-overflow=-840,427,360,473 scr-overflow=-840,427,360,473 <
>             Placeholder(img)(3)@0x7f1c5e797020 {0,900,0,0} [state=0000000000100000] [content=0x7f1c59ddbe40] [sc=0x7f1c5e729be8:-moz-non-element] outOfFlowFrame=ImageFrame(img)(3)@0x7f1c5e745f58
>           >
>         >
>         BulletList 0x7f1c5e797178 <
>           Bullet(li)(3)@0x7f1c5e7970f8 {-840,427,360,473} [state=0000004000000000] [content=0x7f1c69568a10] [sc=0x7f1c5e744948:-moz-list-bullet]
>         >
>       >
>     >
>     OverflowContainersList 0x7f1c5e798878 <
>       Block(li)(3)@0x7f1c5e798a90 next=0x7f1c5e7989f8 prev-in-flow=0x7f1c5e798920 {0,0,1883,0} vis-overflow=0,0,1883,1440 scr-overflow=0,0,1883,1440 [state=0000120000100284] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
>         FloatList 0x7f1c5e798b18 <
>           ImageFrame(img)(3)@0x7f1c5e745f58 {0,0,1440,1440} [state=0000064100200300] [content=0x7f1c59ddbe40] [sc=0x7f1c5e745c68]
>         >
>       >

In particular, note that the ImageFrame is a child of the b88 block's next-in-flow's next-in-flow. (where both of those next-in-flows happen to be in OverflowContainersLists.)

When we hit CheckFloats() call in this reflow of b88, the frame tree has changed to look like this (starting with the last "Block(ul)" above, because nothing above that has changed):

>   Block(ul)(1)@0x7f1c5e798888 prev-in-flow=0x7f1c5e798738 {8086,0,1883,1440} [state=0000100000100405] [content=0x7f1c579ae1a0] [sc=0x7f1c5e7293f8:-moz-column-content]<
>     line 0x7f1c5e7971c8: count=1 state=block,clean,prevmarginclean,impacted,not wrapped,before:nobr,after:nobr[0x18] {0,0,0,0} <
>       Block(li)(3)@0x7f1c5e745b88 next-in-flow=0x7f1c5e798920 {0,1140,1883,1140} vis-overflow=-840,0,2723,1140 scr-overflow=-840,0,2723,1140 [state=0000120046300601] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
>         line 0x7f1c5e797078: count=1 state=block,clean,prevmarginclean,impacted,not wrapped,before:nobr,after:nobr[0x318] {0,0,1883,0} <
>           Block(div)(1)@0x7f1c5e745ec0 {0,0,1883,0} [state=0000120000100200] [content=0x7f1c695c5470] [sc=0x7f1c5e744dc8]<
>           >
>         >
>         Overflow-lines 0x7f1c55517d00/0x7f1c55517d10 <
>           line 0x7f1c5e7970b8: count=1 state=inline,dirty,prevmargindirty,impacted,not wrapped,before:nobr,after:nobr[0xd53] {0,0,0,0} vis-overflow=600,427,360,473 scr-overflow=600,427,360,473 <
>             Placeholder(img)(3)@0x7f1c5e797020 {1440,900,0,0} [state=0000000000100000] [content=0x7f1c59ddbe40] [sc=0x7f1c5e729be8:-moz-non-element] outOfFlowFrame=ImageFrame(img)(3)@0x7f1c5e745f58
>           >
>         >
>         OverflowOutOfFlowList 0x7f1c5e798be0 <
>           ImageFrame(img)(3)@0x7f1c5e745f58 {0,0,1440,1440} [state=0000064100200300] [content=0x7f1c59ddbe40] [sc=0x7f1c5e745c68]
>         >
>         BulletList 0x7f1c5e797178 <
>           Bullet(li)(3)@0x7f1c5e7970f8 {600,427,360,473} [state=0000004000000000] [content=0x7f1c69568a10] [sc=0x7f1c5e744948:-moz-list-bullet]
>         >
>       >
>     >
>     OverflowContainersList 0x7f1c5e798878 <
>       Block(li)(3)@0x7f1c5e798a90 next=0x7f1c5e7989f8 prev-in-flow=0x7f1c5e798920 {0,0,1883,0} vis-overflow=0,0,1883,1440 scr-overflow=0,0,1883,1440 [state=0000120000100284] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
>       >

Note that the nsImageFrame *is now a child* of our b88 block -- it's in its OverflowOutOfFlowList, which is why we end up removing it in CheckFloats()'s call to RemoveTrailingRegions().
Comment 35 Daniel Holbert [:dholbert] 2013-07-26 12:46:23 PDT
(Note that, while the b88 block doesn't initially have the floated ImageFrame as a child, it *does* have the *Placeholder* for the floated ImageFrame as a child. Then, it takes the float from the other block when it reflows that placeholder, and that's how we end up with that float in our OverflowOutOfFlowList.)
Comment 36 Daniel Holbert [:dholbert] 2013-07-26 14:42:09 PDT
So I'm fuzzy on the float-manager internals & what the invariants are, but one possible source of trouble here might be that we're reflowing a bit out of order here.  In particular, we reflow b88's two-hops-away next-continuation (a90) before we reflow b88 itself -- and that's troublemaking because b88 has a placeholder and a90 has the corresponding float.

IN MORE DETAIL: when we're reflowing b88's parent (the block "0x7f1c5e798888" in the frame dumps above):

 (a) we first call ReflowOverflowContainerChildren(), which reflows block a90, and its float-ImageFrame child.  (This is where the float gets added to the nsFloatManager's mFloats list.)

 (b) Then, we call ReflowDirtyLines() to reflow the "normal" child, block b88.  (When we reflow b88's PlaceholderFrame child, we reparent the float to be a child of b88, and it ends up in b88's OverflowOutOfFlowList and causes trouble when we hit CheckFloats.

So -- I'm calling that order "backwards" because we're reflowing b88 *after* a90, even though b88 is a *prev* continuation of a90. (two hops away, in the continuation chain)  And this also means we're reflowing the float before its placeholder.

I suspect that's bad.
Comment 37 Lukas Blakk [:lsblakk] use ?needinfo 2013-07-29 15:11:59 PDT
Didn't get this in time for FF23, wontfixing.
Comment 38 Al Billings [:abillings] 2013-08-15 13:15:17 PDT
Any updates here? This is pending review for almost a month now.
Comment 39 Daniel Holbert [:dholbert] 2013-08-15 18:33:28 PDT
[toggling needinfo=dbaron -- I seem to recall that he was probably OK with the patch, but wanted to understand the situation better]

Hopefully Comment 36 answers the review-question from 1st paragraph comment 32, at least to a first approximation.

I'll paraphrase that comment, to sum up & possibly make it more understandable: basically, the problem arises when a frame (like b88 in the dumps above) contains a placeholder whose corresponding floated-frame lives in a future continuation, and the continuation happens to be sitting in an overflow list on the original frame's parent.  (So, the frame and its continuation have the same parent.)  This means the continuation will get reflowed first (which gets the float tracked in the float manager). Then, when we get to actually reflowing the original frame (b88), we reparent the float (as part of reflowing the placeholder), and then we can stick the float on our (b88's) OverflowOutOfFlowList, which means the float will get removed in RemoveTrailingRegions -- even though it's been tracked by the float manager since we started reflowing our frame (b88). And then we're hosed when we hit GetFloatAvailableSpaceWithState(), since that assumes that our float manager hasn't removed anything since we (b88) entered reflow.

But if we delay the RemoveTrailingRegions call to happen *after* GetFloatAvailableSpaceWithState (as the attached patch does), then the float manager will still be intact for the duration of GetFloatAvailableSpaceWithState, and we won't be hosed.
Comment 40 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-08-19 19:59:39 PDT
Yeah, my problem is that I no longer remember what I thought about it and need to start over.  Hopefully later this week, but not tomorrow.
Comment 41 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-09-05 04:13:06 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

I think this fix is good and we should go ahead and do it, but I think 
there's still another problem here that could lead to other sec-critical
bugs, and we should get a followup filed on fixing that.

r=dbaron, and sorry for taking so long
Comment 42 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-09-05 04:26:13 PDT
(In reply to Daniel Holbert [:dholbert] from comment #34)
> > ColumnSet(ul)(1)@0x7f1c5e7292f0 {0,0,38400,2460} vis-overflow=0,-473,38400,2933 scr-overflow=0,-473,38400,2933 [state=0002024000400631] [content=0x7f1c579ae1a0] [sc=0x7f1c5e5bfb50]<
> [...]
> >   Block(ul)(1)@0x7f1c5e798738 next=0x7f1c5e798888 prev-in-flow=0x7f1c5e729220 next-in-flow=0x7f1c5e798888 {5243,0,1883,1140} [state=0002100000100004] [content=0x7f1c579ae1a0] [sc=0x7f1c5e7293f8:-moz-column-content]<
> [...]
> >     OverflowContainersList 0x7f1c5e798bc0 <
> >       Block(li)(3)@0x7f1c5e798920 prev-in-flow=0x7f1c5e745b88 next-in-flow=0x7f1c5e798a90 {0,0,1883,0} vis-overflow=0,0,1883,1440 scr-overflow=0,0,1883,1440 [state=0000120000100284] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<
> >       >
> >     >
> >   >
> >   Block(ul)(1)@0x7f1c5e798888 prev-in-flow=0x7f1c5e798738 {8086,0,1883,1440} [state=0000100000100405] [content=0x7f1c579ae1a0] [sc=0x7f1c5e7293f8:-moz-column-content]<
> >     line 0x7f1c5e7971c8: count=1 state=block,clean,prevmarginclean,impacted,not wrapped,before:nobr,after:nobr[0x18] {0,0,0,0} <
> >       Block(li)(3)@0x7f1c5e745b88 next-in-flow=0x7f1c5e798920 {0,1140,1883,1140} vis-overflow=-840,0,2723,1140 scr-overflow=-840,0,2723,1140 [state=0000120042100601] [content=0x7f1c69568a10] [sc=0x7f1c5e729888]<

It also seems pretty broken here that we have a block whose next-in-flow is on the overflow containers list of it's parent's *prev-in-flow*.  (See the two "Block(li)".)

(In reply to Daniel Holbert [:dholbert] from comment #39)
> continuation have the same parent.)  This means the continuation will get
> reflowed first (which gets the float tracked in the float manager). Then,

Yeah, but we shouldn't be reflowing the next-continuation before the prev-continuation.  That seems to be where things go wrong.

I'd still like to see a second patch for this, maybe in a different bug.  (Perhaps hand off to Mats if he's willing and you have too many other things?  I suspect he has a reasonably good understanding of this code.)
Comment 43 Daniel Holbert [:dholbert] 2013-09-05 09:04:50 PDT
(In reply to David Baron [:dbaron] (needinfo? me) from comment #42)
> Yeah, but we shouldn't be reflowing the next-continuation before the
> prev-continuation.  That seems to be where things go wrong.

OK -- that's sort of what I suspected, at the beginning/end of comment 36. Makes sense.

> I'd still like to see a second patch for this, maybe in a different bug. 
> (Perhaps hand off to Mats if he's willing and you have too many other
> things?  I suspect he has a reasonably good understanding of this code.)

OK. I'll file a followup bug, also security-sensitive, because it'll be using the testcase from this bug and we still have users affected by this bug (for the time being).

Thanks!
Comment 44 Daniel Holbert [:dholbert] 2013-09-05 12:01:37 PDT
Landed on inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/11e825ffbe54

(Once its TBPL cycle completes successfully, I'll request backport approval.)
Comment 45 Daniel Holbert [:dholbert] 2013-09-05 12:16:17 PDT
Followup filed, per ends of comment 42 / comment 43: bug 913162.
Comment 46 Daniel Veditz [:dveditz] 2013-09-05 13:26:31 PDT
Please request sec-approval? before landing security patches -- we need to make sure we line up our ducks on branch landings. You should probably ask retroactively (along with branch landing approval) so that you can fill in the questions, which will help judge risk and necessity for the branches.
Comment 47 Daniel Veditz [:dveditz] 2013-09-05 13:27:24 PDT
Meant to include this link with previous comment
https://wiki.mozilla.org/Security/Bug_Approval_Process
Comment 48 Daniel Holbert [:dholbert] 2013-09-05 13:28:17 PDT
Oops! I totally knew that, and forgot. Apologies. I'll request retroactive approval to fill out the form.
Comment 49 Daniel Holbert [:dholbert] 2013-09-05 13:45:14 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

Not easily. The patch reveals that lists and floats are involved, but not columns/pagination. (and the testcase that's required to trigger this is very fragile/specially-crafted)

> Do comments in the patch, the check-in comment, or tests included
> in the patch paint a bulls-eye on the security problem?

No.

> Which older supported branches are affected by this flaw?

All branches. (At least as far back as ver 17, per comment 2)

> If not all supported branches, which bug introduced the flaw?

N/A

> Do you have backports for the affected branches? If not, how different,
> hard to create, and risky will they be?

Yes; the attached patch applies cleanly to ESR17, so no custom backport-patches are required.

> How likely is this patch to cause regressions; how much testing does it need?

Unlikely to cause regressions; this is the sort of patch where, if it passes tests, it's almost certainly fine.  As noted in comment 28, the position of this line with respect to the code I'm moving it past appears to be arbitrary, so it shouldn't affect behavior. (And it only fixes the abort in this case because we already have a fragile/broken frame tree when we get to that point in the code, with this testcase; that's tracked in the followup bug 913162.)
Comment 50 Daniel Holbert [:dholbert] 2013-09-05 13:53:54 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

Requesting approval for Aurora/Beta/ESR17.

[Approval Request Comment]
> If this is not a sec:{high,crit} bug, please state case for ESR consideration:
It's sec-critical.

> User impact if declined:
Security vulnerability

> Fix Landed on Version:
So far, just inbound, which is at version 26, though I'm requesting to be able to backport it to 25 and 24 (aurora and beta).

> Risk to taking this patch (and alternatives if risky):
Low, per last chunk of previous comment.

> String or UUID changes made by this patch:
None.
Comment 51 Daniel Veditz [:dveditz] 2013-09-05 16:27:39 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

sec-approval+
a=dveditz for aurora/beta/esr-17
Comment 52 Daniel Holbert [:dholbert] 2013-09-05 17:00:15 PDT
Thanks! I'll land the backports tomorrow, if RyanVM doesn't beat me to it.
Comment 53 Ed Morley [:emorley] 2013-09-06 08:27:28 PDT
https://hg.mozilla.org/mozilla-central/rev/11e825ffbe54
Comment 55 Daniel Holbert [:dholbert] 2013-09-06 09:41:03 PDT
BTW: for sanity/verification's sake, I loaded "testcase 3" in aurora, beta, and esr17 debug builds with the patch & without the patch (current tip vs. tip-1).

They all behaved as I'd hope/expect. (They aborted without the patch, and they loaded with no issues with the patch.)
Comment 57 Matt Wobensmith [:mwobensmith][:matt:] 2013-09-11 19:52:19 PDT
Confirmed crash on ASan m-c from 2013-06-06.
Verified fixed on ASan builds of FF17esr, FF24, FF25 and FF26, 2013-09-11.
Comment 58 bhavana bajaj [:bajaj] 2013-10-15 14:55:36 PDT
NI on :dholbert to help with a backport on mozilla-b2g18 as the status is affected.
Comment 59 Daniel Holbert [:dholbert] 2013-10-15 16:25:49 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

This trunk patch applies cleanly to b2g18. Requesting approval‑mozilla‑b2g18 so that it can land there.

[Approval Request Comment]
See comment 50 for approval request responses.
Comment 60 Ryan VanderMeulen [:RyanVM] 2013-10-16 05:56:39 PDT
Comment on attachment 779995 [details] [diff] [review]
fix v1

a=bajaj over email
Comment 61 Ryan VanderMeulen [:RyanVM] 2013-10-16 06:08:30 PDT
https://hg.mozilla.org/releases/mozilla-b2g18/rev/9afbbe05cc1b

Note You need to log in before you can comment on or make changes to this bug.