Closed Bug 1430589 Opened 2 years ago Closed 2 years ago

ASAN Stack-overflow on nsDisplayListBuilder::AllocateDisplayItemClipChain

Categories

(Core :: Web Painting, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 - fixed

People

(Reporter: zhanghanming, Assigned: kats)

Details

(Keywords: csectype-uaf, regression, sec-high)

Attachments

(4 files, 2 obsolete files)

Attached file asan_output_without_symbol.txt (obsolete) —
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36

Steps to reproduce:

My Test Env:
    Linux ubuntu 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    Firefox local asan build () [build from 1/15 hg-mozilla-central source code]
ST to reproduce:
    1. download attached poc.html.
    2. launch a Firefox asan build.
    3. use open poc.html.
    4. wait a few second, asan should be crash.
    (when you open poc.html you should see a textarea contain "Ashitaka", you can drag to change it's size.This should lead a immediately crash).


Actual results:

Firefox crash.


Expected results:

No crash.
Attached file poc.html
PoC to trigger this issue.
Attached file addr2line_output.txt (obsolete) —
My local VM symbolized callstack.
If you need any more informations to reproduce this issue. Please Let me know.
Sorry for typo on comment 1, my Firefox version is 59.0a1 (2018-01-15) (64-bit)
Can't verify with a nightly asan-opt build, build ID 20180115095817.
In a debug build, I get warnings like these:
[8560, Main Thread] WARNING: The offset is out of bounds: '!mParent || mOffset.value() <= mParent->Length()', file /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/EditorDOMPoint.h, line 372
I have test it with two VM (both run within VMware 14.0 pro),it crashed.Have you try drag to resize the textarea? 
From callstack, the gcc version should be 5.4.0,please check this.
I will build my env on a clean VM to test poc again.
Or if you think perfs.js have relative to this,I can offer my local perfs.js.
I have build a few VM and I think following step should be clear to reproduce this issue.
   1. Set-up a VM on VMware (14.0.0 build-6661328) use ubuntu-16.043.3-desktop-amd64.iso 
      iso MD5:0D9FE8E1EA408A5895CBBE3431989295
      VM have 8 core\8G RAM\20G disk space.
   2. Download artifact builds from https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Firefox_and_Address_Sanitizer.
      I'm using "mozilla-central optimized builds"
      Download link: https://public-artifacts.taskcluster.net/eMQpWicUSR-oFRzfQNde0A/0/public/build/target.tar.bz2
      artifact builds MD5:2A9EBBD4FAAA3B61C3CBF4834CA2506A
   3. unzip artifact builds on ubuntu-VM.
   4. run "python -m SimpleHTTPServer" to launch a local httpserver which can visit poc.html
   5. launch Firefox and visit "127.0.0.1:8000/poc.html"
   6. Trigger a asan crash.
The crash type can be Stack-use-after-scope, Stack-overflow, SEGV on Null.
I have do step.1 to step.6 two times,so I thick this issue is triggerable. 
If you don't drag textarea to resize it,It will take a few second to trigger a crash.
If you drag textarea to resize it,It should be immeduately trigger a crash.
If you still can not reproduce this issue,please let me know.
obsoletes old asan output.
upload official asan-opt build output.
And used build platform.ini content as follow:

[Build]
BuildID=20180115185939
Milestone=59.0a1
SourceStamp=8e33bdf820dcc2ecd6f326eb83c39fc5e4769dcf
Attachment #8942649 - Attachment is obsolete: true
Attachment #8942652 - Attachment is obsolete: true
It looks like you touched nsDisplayListBuilder::AllocateDisplayItemClipChain() in bug 1422057, which landed on 1-6, kats. Could that be related to this?
Group: firefox-core-security → layout-core-security
Component: Untriaged → Layout: Web Painting
Flags: needinfo?(bugmail)
Product: Firefox → Core
That does seem plausible. I'll investigate.
Flags: needinfo?(bugmail)
Assignee: nobody → bugmail
Matt stared at the code for a while and realized that at [1], ancestorSC might be a pointer to a stack-allocated DisplayItemClipChain. That could get passed down at [2] as the parentSC and get stashed as the parent pointer on a heap-allocated DisplayItemClipChain which goes into the clip deduplicator. That's bad, as after the stack unwinds the deduplicator will have a garbage pointer.

[1] https://searchfox.org/mozilla-central/rev/48cbb200aa027a0a379b6004b6196a167344b865/layout/painting/DisplayListClipState.cpp#63
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: sec-bounty?
So this is not a duplicate issue,right?
Priority: -- → P1
(In reply to zhanghanming from comment #12)
> So this is not a duplicate issue,right?

Not that I'm aware of. If comment 11 is correct (I'll verify it tomorrow or the day after) then it's a regression from bug 1422057.
[Tracking Requested - why for this release]: sec-critical
I verified that the problem is indeed as described in comment 11. I tried to implement Matt's suggestion of changing the types so that the stack-based DisplayItemClipChain item is of a different type than the heap-based one, and the compiler would catch errors. However it turned out to be quite hard to do because we do mix them in a way that's hard to represent via distinct types. So instead I just added a #ifdef DEBUG boolean into the struct which indicates if it's on the stack or the heap, and then we can use that to assert we never put a stack-based one into the clip deduplicator.

I also converted the reporter's PoC into a crashtest. I have a try push at [1] which includes the crashtest and the debug-boolean, and successfully causes the crashtest to fail - well at least it's failing on linux and mac, not sure why it passed on windows.

In terms of fixing this, I was guided by the documentation at [2] which states that aAncestor should already be in the builder's arena. That invariant is being violated in this case, so I put the fix at the call site that does the violation. At [3] I wrapped ancestorSC into a CopyWholeChain operation. That seems to fix it for me locally, and I have a try push going [4] with that fix to verify.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0a7a59f100a81bf5e412bec63d29c5a206cf84a&group_state=expanded
[2] https://searchfox.org/mozilla-central/rev/48cbb200aa027a0a379b6004b6196a167344b865/layout/painting/nsDisplayList.h#979
[3] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/painting/DisplayListClipState.cpp#63
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b60aace3a721860b758ccbba03d4b6a3af25d9c
The try push I did was mostly green, except that on QR debug crashtests the new test is hitting bug 1421825 (or at least a similar assertion failure/crash). So I guess I should fix that first.
... or I can just make the crashtest skip-if(webrender) to avoid those QR failures. I expect bug 1421825 to go away when we turn on gfx.webrender.hit-test by default, so I don't know if it's worth trying to fix that explicitly right now.
Attached patch Part 1 - The fixSplinter Review
Attachment #8943782 - Flags: review?(mstange)
Attachment #8943782 - Flags: review?(matt.woodrow)
Attachment #8943783 - Flags: review?(matt.woodrow) → review+
Attachment #8943782 - Flags: review?(matt.woodrow) → review+
Attachment #8943782 - Flags: review?(mstange) → review+
Attachment #8943783 - Flags: review?(mstange) → review+
Comment on attachment 8943782 [details] [diff] [review]
Part 1 - The fix

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
- Relatively easily, considering I've added the PoC as a crashtest in the second patch

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- Yeah, the checkin comment on this patch indicates it has to do with stack allocated things getting retained which is a pretty big clue.

Which older supported branches are affected by this flaw?
- None

If not all supported branches, which bug introduced the flaw?
- Regression in 59, so it's still on Nightly (but will be going to beta Real Soon Now)

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- No need for backports if we land this before the upcoming merge. Otherwise the patch should apply to 59 beta too.

How likely is this patch to cause regressions; how much testing does it need?
- Fairly unlikely, I think. Functionally the extra call I'm adding should have no effect in cases where it's not needed. There might be a slight perf impact but this codepath is hit infrequently enough that it shouldn't be significant.
Attachment #8943782 - Flags: sec-approval?
Comment on attachment 8943783 [details] [diff] [review]
Part 2 - Diagnostic and crashtest

[Security approval request comment]
(see previous comment)
Attachment #8943783 - Flags: sec-approval?
Fixes for security problems that only affect Nightly don't need sec-approval. So you can go ahead and land it, as long as it is before Nightly is 60.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Flags: in-testsuite+
Group: layout-core-security → core-security-release
Flags: sec-bounty? → sec-bounty+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.