Closed Bug 942794 Opened 12 years ago Closed 3 years ago

global buffer overflow (read) at nsFloatManager::GetFlowArea, preceded by ###!!! ABORT: bad state: 'floatCount <= mFloats.Length()

Categories

(Core :: Layout: Floats, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: aki.helin, Unassigned)

References

(Blocks 1 open bug)

Details

(5 keywords, Whiteboard: [asan])

Attachments

(8 files)

Attached file gbof-getflowarea.html
Address sanitizer spots the following error when the attached page is opened in a Firefox. This happens at least in current mozilla-central build and an older build from last month on my 64-bit Linux machine. ==840==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7fcde1d17640 at pc 0x7fcdda949ff6 bp 0x7fffba95a4e0 sp 0x7fffba95a4d8 READ of size 4 at 0x7fcde1d17640 thread T0 ==840==WARNING: Trying to symbolize code, but external symbolizer is not initialized! #0 0x7fcdda949ff5 in nsFloatManager::GetFlowArea(int, nsFloatManager::BandInfoType, int, nsRect, nsFloatManager::SavedState*) const /home/aki/src/mozilla-aurora/layout/generic/nsFloatManager.cpp:140 #1 0x7fcdda91b256 in nsBlockReflowState::GetFloatAvailableSpaceWithState(int, nsFloatManager::SavedState*) const /home/aki/src/mozilla-aurora/layout/generic/nsBlockReflowState.cpp:258 #2 0x7fcdda8fd3c2 in nsBlockFrame::ReflowBullet(nsIFrame*, nsBlockReflowState&, nsHTMLReflowMetrics&, int) /home/aki/src/mozilla-aurora/layout/generic/nsBlockFrame.cpp:6788 #3 0x7fcdda8f6f65 in nsBlockFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) /home/aki/src/mozilla-aurora/layout/generic/nsBlockFrame.cpp:1076 [...] #96 0x48299c in _start ??:0 0x7fcde1d17640 is located 32 bytes to the left of global variable 'nsComponentManagerImpl::gComponentManager' from '/home/aki/src/mozilla-aurora/xpcom/build/nsXPComInit.cpp' (0x7fcde1d17660) of size 8 0x7fcde1d17640 is located 24 bytes to the right of global variable 'nsTArrayHeader::sEmptyHdr' from '/home/aki/src/mozilla-aurora/xpcom/glue/nsTArray.cpp' (0x7fcde1d17620) of size 8 SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 ?? [...] Filing as a potential security bug based on error type.
In a debug build I get: ###!!! ABORT: bad state: 'floatCount <= mFloats.Length()', file layout/generic/nsFloatManager.cpp, line 131 139 if (floatCount == 0 || 140 (mFloats[floatCount-1].mLeftYMost <= top && 141 mFloats[floatCount-1].mRightYMost <= top)) { 142 return nsFlowAreaRect(aContentArea.x, aYOffset, aContentArea.width, 143 aHeight, false); 144 } (gdb) p floatCount $1 = <optimized out> (gdb) p mFloats.Length() $2 = 0 The above appears to be a (safe) out-of-bounds read, and later in the same function the for-loop also appears to be the same. When I comment out the NS_ABORT_IF_FALSE and change the above to "if (0)" and add "mFloats.Length() > 0" to the loop condition, to force it to continue executing with this error, then it doesn't crash. So it appears to be non-exploitable at first glance.
Priority: -- → P3
Whiteboard: [asan]
I would guess this is similar to bug 883514. Daniel, could you take a look?
Flags: needinfo?(dholbert)
I haven't dug into the code yet, but I partially reduced the testcase yesterday. (I actually *thought* I reduced it significantly further, but it turned out that at some point I switched from hitting this bug's fatal assertion to hitting an unrelated GFX fatal assertion. So, this testcase I'm attaching here is the smallest testcase that I'd gotten in that process that still hit this bug's assertion, and I filed bug 943978 on the smaller testcase with the GFX assertion.)
Flags: needinfo?(dholbert)
Flags: needinfo?(dholbert)
Flags: needinfo?(dholbert)
Summary: global buffer overflow (read) at nsFloatManager::GetFlowArea → global buffer overflow (read) at nsFloatManager::GetFlowArea, preceded by ###!!! ABORT: bad state: 'floatCount <= mFloats.Length()
Flags: needinfo?(dholbert)
Flags: needinfo?(dholbert)
Here's a patch that I used to investigate this a bit. This catches the error a little sooner, and it provides a handy place for a breakpoint to poke around and see what's going on. (We may want to land an actual assertion along these lines at some point.)
I'm attaching 2 frametrees, generated from testcase 4, with my debugging patch applied. This first one is from when we enter the fateful nsBlockFrame::Reflow() call, which is doomed to fail the NS_ERROR in my debugging patch. As shown in the gdb snippet at the beginning of this frametree: At this point (when we enter Reflow for this block frame), we're tracking 1 float.
...and here's the second frametree, taken from later in the same reflow, when we hit the NS_ERROR in the debugging patch. At this point, we've just reflowed a child, and it cleared our float manager's mFloats array, via a call to RemoveTrailingRegions. (And that's going to get us into trouble, because we're expecting that float to still be there.)
So the float in question is for the text "f1". In the first frametree, it lives in an OverflowContainersList > OverflowContainersList > FloatList, which is all chained off of the Block(ol), which is the parent of the Block(li) that we're reflowing. By the time we get to the NS_ERROR, we've moved the float to be in a OverflowOutOfFlowList on the child of that Block(li) (the child who we reflow right before hitting the NS_ERROR in my debugging patch). So the problem, at a high level, is that while a block is being reflowed, we're moving a float from its parent's OverflowContainersList-subtree to its child's OverflowOutOfFlowList list. And so the child thinks it's allowed to yank that float from the float manager without anyone noticing -- but it can't, because the float was actually in the float manager long before the child started reflowing.
The frametrees show that we're also suffering from the problem described in bug 913162. Quoting bug 913162 comment 1, the brokenness described in that bug is when "we have a block whose next-in-flow is on the overflow containers list of its parent's *prev-in-flow*". And frametree 1 shows that we have exactly that here -- the Block(li) 70e0's next-in-flow is on the overflow containers list of its parent's prev-in-flow. Selectively quoting frametree 1 to demonstrate: { Block(ol)(1)@0x7f5d5e2c2020 next=0x7f5d5e2c25d0 prev-in-flow=0x7f5d5e14c228 next-in-flow=0x7f5d5e2c25d0 [...] OverflowContainersList 0x7f5d5e2c20b8 < Block(li)(3)@0x7f5d5e2c24f8 prev-in-flow=0x7f5d5e1f70e0 next-in-flow=0x7f5d5e2c2b58 Block(ol)(1)@0x7f5d5e2c25d0 prev-in-flow=0x7f5d5e2c2020 [...] Block(li)(3)@0x7f5d5e1f70e0 next-in-flow=0x7f5d5e2c24f8 } So I think this is a version of bug 913162.
(And comment 2 is correct -- this is a bit like bug 883514. That bug's delay of CheckFloats doesn't help us here, though, because here, it's a *child frame's* CheckFloats call that ends up hosing us.)
(The best way forward here is probably to figure out what's behind bug 913162. Mats, could you poke around there when you get a chance?)
Group: core-security → layout-core-security
Blocks: 1209952
Severity: critical → S2

I'm not seeing any ASAN issues with the attached testcases anymore (with unprefixed -moz-column styling, so that part of the testcase still does something).

The first 3 testcases (reporter's testcase & my testcase 2 and 3) do hang Firefox for some substantial amount of time, but I didn't get any crashes or ASAN issues after letting them hang for a little while. Nor do I get any assertions or aborts in a debug build. So I think this is WORKSFORME.

I'll get a reduced testcase for the hang and spin that off as a new bug.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dholbert)
Resolution: --- → WORKSFORME

(In reply to Daniel Holbert [:dholbert] from comment #13)

I'll get a reduced testcase for the hang and spin that off as a new bug.

Filed as bug 1795182.

Conveniently, testcase 4 loads pretty quickly, so I'll plan on landing that as a crashtest here. (The earlier ones could be crashtests but they'd be kind of expensive in terms of compute time, due to that aforementioned hang.)

And I think we can remove the security flag, given that this seems to be worksforme (no issues with current ASAN builds) and was probably non-exploitable when filed per comment 1.

Group: layout-core-security

BTW testcase 4 (modified to use both prefixed & unprefixed column styling) doesn't have any issues in mozregression-launched ASAN nigthly 2018-05-01:

mozregression --launch 2018-05-01 -B asan

(That seems to be roughly as far back as ASAN nightlies go. If I run with 04 (April) instead of 05 (May), or use any earlier month, mozregression tells me it's unable to find builds)

So it seems like this was fixed before then, but there may not be a quick & easy way to find out any more info about fix range than that.

Flags: needinfo?(dholbert)
Attachment #9302100 - Attachment description: WIP: Bug 942794: Add crashtest for this no-longer-reproducible bug. (no review, crashtest-only) → Bug 942794: Add crashtest for this no-longer-reproducible bug. (no review, crashtest-only)
Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bd0da821444e Add crashtest for this no-longer-reproducible bug. (no review, crashtest-only)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: