Closed Bug 1242668 Opened 8 years ago Closed 8 years ago

Memory corruption in mozilla::layers::ClientLayerManager::DidComposite appears to be exploitable /use after free


(Core :: Graphics: Layers, defect)

43 Branch
Not set



Tracking Status
firefox45 --- wontfix
firefox46 + verified
firefox47 + verified
firefox48 + verified
firefox-esr38 --- wontfix
firefox-esr45 46+ fixed


(Reporter: codycrews00, Assigned: jrmuizel)



(Keywords: csectype-uaf, sec-high, Whiteboard: [post-critsmash-triage][adv-main46+][adv-esr45.1+])


(8 files, 4 obsolete files)

474 bytes, text/html
136.45 KB, image/png
250.57 KB, image/png
804 bytes, patch
: review+
Details | Diff | Splinter Review
16.04 KB, text/plain
16.45 KB, text/plain
17.91 KB, text/plain
2.04 KB, patch
: review+
: sec-approval+
Details | Diff | Splinter Review

Load the attached testcase and click anywhere in the content area.  This will open a new window while alert is called in the original window.  Close the new window, then proceed to close the alert box in the original window.

The currently release build of firefox crashes usually with an access violation while trying to read from memory address 0xE5E5E5E5.  This address originates from the css equivalent( like background: #E5E5E5 ). 

I'm currently still trying to find a way to automate the value that ends up there but here's a partial explanation as I understand it so far.  To begin with see

Note that some internal browser elements are styled using this, and as the screen renders itself for vertical sync it actually copies the value to memory many times over representing one rendering of that frame.

I'm certain that this is exploitable, I've ran through many iterations manually and twice I have already overwritten what would normally be 0xE5E5E5E5 with a pointer to an address that was accessible.  I have also just confirmed that the current release build also crashes in linux in this same situation.

This all originated from some click jacking work I've been doing, and I'm still in the process of that and I have to get to an audit on pocket soon(this took priority obviously once I found it.)

Once I've finished my current work I'll try to complete migrating my linux OS to a full VM setup so I can test and debug in both windows and linux at the same time which will hopefully allow me to get a more complete understanding of exactly what is happening here.
Are you sure the address containing 0xE5E5E5E5 comes from the color value, rather than being coincidental? 0xE5 is the fill pattern jemalloc uses now for freed memory (in older builds we used 0x5a). Adding a link or two from about:crashes is helpful if you have them -- saves QA/devs from having to generate them ourselves.
Group: core-security → gfx-core-security
I'm not positive at all.  As i said that this is as I understand it currently.  I do know that if you do other things that write to memory depending on the timing you will get completely different results.

As I said I've already made it past the normal crash point stepping through in asm twice just doing random things.  It crashes a little later, but that's just because I haven't figured out exactly what data is being placed there.  If I knew I would have a full exploit written that would run your calculator.

In the DidComposite function apparently it does not attempt to make the call to mTransactionIdAllocator->NotifyTransactionCompleted(aTransactionId) until after you have closed the alert box.  So before you close it you can attempt to overwrite the value in memory and as I said I've already successfully overwritten it with an accessible memory location twice which stops the read access violation and triggers DEP in windows when it even attempts to execute the code which should be NotifyTransactionCompleted.

That's all I have currently.
This has me worried, because it's not that hard to find and its in the current release build.  I haven't slept in pushing 48 hours now just to get to this point.  Maybe I can bring better info to the table later.

It took me some time just to narrow down the pieces of the click jacking testcase I was working on to get this far, sorry I don't have more currently.
I'll try to generate some crashes and post the links to them.  Honestly I have crash reporting kind of tweaked as in it's partially enabled(generates dumps but never even tries to submit them).

I don't like things automatically sending data ever.  Let me generate some reports and have it submit them so I can get the links from about:crashes.

After that I'm getting some rest and I'll see what you guys have come up with later.

Note if you view the raw info of the dumps, the value of EDI which would normally be 0xE5E5E5E5 isn't that in either of these dumps.  One even has a value that is near to being a valid pointer.  The only difference is I did some additional things in the webconsole to make writes to memory before closing the alert box.
Should be EDX above, time for that sleep lol.  I'll be more on my game when I'm back on ok guys.  This had me worried enough to push a little longer than I should have for sure.
The crash stats in comment 5 look like null derefs. Doesn't mean there isn't a UAF happening, though.
I'm working on a revised testcase right now.  It was passing the read access violation and actually getting to the 'call edx' instruction of NotifyTransactionCompleted(if you look at the disassembly).

So far it made it to this instruction 3 out of 5 times at which point DEP kicks in, that's scary and makes me fairly sure this is exploitable.  I'll be working on it.

I'm not the best with memory corruption to begin with, and firefox makes things like heap spray harder.  I'm still fairly sure this is exploitable and its definitely not just a null pointer deref.  If you go through many iterations you will see that the same values aren't even always null pointers.
Yeah I meant to ask, as you explained 0xE5E5E5E5 comes from jemalloc, does anything fill memory with the value 0xE4E4E4E4?  It's also common to run into here.  I'll look myself soon, I just feel that I'm very close to having something here.

I need a way to leak memory address from JS so I can just forget about bypassing DEP and call into internal functions to run an executable.  I probably wont have much luck with that though.
Attached image useAfterFree.png
I can't believe nobody is taking this as seriously as me.  Here's a screenshot showing that its obviously a use after free.  I've seen people just post an ASAN debug output log showing a use after free and people get to it.  

In this picture you can clearly see its trying to read a memory location, but the pointer to that location has been freed.

Also this is in the current release that everyone has, appears to be there in linux and windows, and isn't that hard to find.
Here's a screenshot of what I consider proof that this is exploitable.  There could be special circumstances in this case, but I think it can be accomplished.

All the relevant changes from the previous screenshot are boxed or added in blue.
Assignee: nobody → jmuizelaar
Jeff, can you take a look?
Flags: needinfo?(jmuizelaar)
(In reply to Cody Crews from comment #9)
> Yeah I meant to ask, as you explained 0xE5E5E5E5 comes from jemalloc, does
> anything fill memory with the value 0xE4E4E4E4? 

Yes, recent jemalloc uses 0xe4 when it allocates and 0xe5 when it frees, so 0xe4 indicates uninitialized memory.
(In reply to Daniel Veditz [:dveditz] from comment #13)
> (In reply to Cody Crews from comment #9)
> > Yeah I meant to ask, as you explained 0xE5E5E5E5 comes from jemalloc, does
> > anything fill memory with the value 0xE4E4E4E4? 
> Yes, recent jemalloc uses 0xe4 when it allocates and 0xe5 when it frees, so
> 0xe4 indicates uninitialized memory.

I had already looked into and confirmed this.  I also spent more time than I should have looking for any out of bounds read that I could use to try to get the base address of the xul library reliably.  I then looked over papers on VUPEN's old work from pwn2own and they have some very interesting techniques for manipulating the internal heaps in firefox.

I eventually had to just let it go as I spent far to much energy on this that could have been used much more efficiently.  When it comes to things security related right in front of me I tent to get tunnel vision.
Flags: needinfo?(jmuizelaar)
Keeping child alive makes sense because we're getting it from a global and there's no guarantee that it will stay alive for the duration of the caller. Fixing this moves the crash to a crash on a disappearing layer manger.

It's not clear to me who's responsibility it is to keep the layer manager alive, but doing so seems to fix the problem.

Overall, I made this fix without any real understanding of what was happening so I have no idea if it makes sense.
Attachment #8716577 - Flags: review?(sotaro.ikeda.g)
I also checked the crash on my machine. The crash seemed wired. The root of dangling pointer seems to exist in different place. I am going to looking into more to the crash.
I seems to find what triggers the crash. "gBrowser.selectedBrowser.focus()" in browser.js is a trigger of the crash.

In the STR, the focus() takes very long time. It was called just before showing alert and it ended just after alert close. Therefore, if we do not close alert, the focus() does not exit.
During the STR, the call stack becomes very deep.
When I set a break point to CompositorChild::RecvDidComposite(), I always see two CompositorChild::RecvDidComposite() in a call stack.
When ClientLayerManager destructor was called in the STR, ClientLayerManager::DidComposite() still existed in the call stack.
It seems better to avoid to use nsAutoScriptBlocker during handling gecko ipc if possible.
Attachment #8717802 - Attachment is patch: false
Comment on attachment 8717851 [details] [diff] [review]
patch - Do not call nsView::DidCompositeWindow() in ClientLayerManager::DidComposite()

Cancel the review. The patch has a problem need to address it.
Attachment #8717851 - Flags: review?(jmuizelaar)
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> attachment 8717851 [details] [diff] [review] caused test failure. It needs
> to be addressed.
> jobs?repo=try&revision=6b32d1d3a783&selectedJob=16585919

It seems that when DidComposite() is called in ClientLayerManager's destructor, widget could be also in destructor :-(
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> jobs?repo=try&revision=8086ab63adda&selectedJob=16609168

On e10s test, there were cases that DocShell was already detached from nsPresContext.
I locally checked layout/reftests/position-dynamic-changes/horizontal/leftN-widthA-rightA-2.html. Its rendering during animation seemed already broken without the patch.
(In reply to Sotaro Ikeda [:sotaro] from comment #32)
> I locally checked
> layout/reftests/position-dynamic-changes/horizontal/leftN-widthA-rightA-2.
> html. Its rendering during animation seemed already broken without the patch.

Sorry, please ignore it.
(In reply to Sotaro Ikeda [:sotaro] from comment #31)
> attachment 8718687 [details] [diff] [review] seemed to cause R-e10s failure
> :-(

Problem looks similar to Bug 1242237.
By the investigation, the problem seems to be caused by a problem of MozAfterPaint event.

At first, I tried to reproduce the reftest failure, but failed to reproduce it locally. Invalidation failure was caused by reftest's invalidation mechanism. Reftest renders documents to canvas to check tests result and invalidation area come from MozAfterPaint event. There is a case that the event does not have a invalidated area. It caused invalidation problem and resulted to reftest failures.
When reftest invalidate whole document area, the reftest failures were addressed.
Depends on: 1242237
In current gecko, MozAfterPaint could be sent when related update is not arrived to LayerManagerComposite. It happens on content process. And in chrome process, reftest tried to render the content to canvas. The rendering to canvas end up to composition in LayerManagerComposite.

There could be a case that the rendering to canvas happen before the content's layers update is delivered to LayerManagerComposite.
The patch prevent to send PaintEvent too early. But the patch causes many test failures :-(
I tried to make CompositorChild::RecvDidComposite() exits quickly. It is like attachment 8718687 [details] [diff] [review]. But it seems to cause tryserver failure. The patch changed the timing of the function calls. It seems to hit another problems. It seems not easy to address them.

From the above, in this bug, it seems better just to address the crash. It seems simpler.
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> I tried to make CompositorChild::RecvDidComposite() exits quickly.

I created Bug 1250025 for the above.
Attachment #8721245 - Attachment is obsolete: true
Attachment #8718687 - Attachment is obsolete: true
Comment on attachment 8716577 [details] [diff] [review]
Keep things alive longer

Review of attachment 8716577 [details] [diff] [review]:

Review+ if the comment is addressed.

It seems better also hold a reference of ClientLayerManager in TabChild::DidComposite(). And you might want to add a comment that the function might not exit quickly.
Attachment #8716577 - Flags: review?(sotaro.ikeda.g) → review+
No longer depends on: 1242237
[Security approval request comment]
How easily could an exploit be constructed based on the patch? Not too easily as someone needs to figure out how to make trigger the lifetime mistake.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? A little bit.

Which older supported branches are affected by this flaw? Probably many of them.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? It should be easy to create backports.

How likely is this patch to cause regressions; how much testing does it need? It shouldn't be too likely to cause regressions as it just keeping things alive a little longer, however it is possible to imagine regressions.
Attachment #8724126 - Flags: sec-approval?
Attachment #8724126 - Flags: review+
This is too late for the 45 release cycle. 

This can go in after March 15, when we're in the new cycle.

It would be good to know how far back this goes. "Probably many of them" isn't very definitive. :-)
Whiteboard: [checkin on 3/15]
Attachment #8724126 - Flags: sec-approval? → sec-approval+
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Group: gfx-core-security → core-security-release
Tracking for 46+ since this is sec-high.
Jeff, can you request uplift?
Flags: needinfo?(jmuizelaar)
Comment on attachment 8724126 [details] [diff] [review]
Hold more references

Approval Request Comment
[Feature/regressing bug #]: Unknown
[User impact if declined]: Remotely triggerable use after free
[Describe test coverage new/current, TreeHerder]: Has been on m-c for a while
[Risks and why]: The largest risk is that we'd hold something alive longer than we intended to.
Flags: needinfo?(jmuizelaar)
Attachment #8724126 - Flags: approval-mozilla-beta?
Attachment #8724126 - Flags: approval-mozilla-aurora?
Comment on attachment 8724126 [details] [diff] [review]
Hold more references

Improves security, taking this for beta 46 . 
It will probably not land until the beta 6 build.
Attachment #8724126 - Flags: approval-mozilla-beta?
Attachment #8724126 - Flags: approval-mozilla-beta+
Attachment #8724126 - Flags: approval-mozilla-aurora?
Attachment #8724126 - Flags: approval-mozilla-aurora+
This doesn't apply cleanly to beta.
Flags: needinfo?(jmuizelaar)
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
Comment on attachment 8724126 [details] [diff] [review]
Hold more references

Also needed in ESR45.
Attachment #8724126 - Flags: approval-mozilla-esr45+
The crash no longer reproduces on FX 46, 47.0a2 (2016-04-18), 48.0a1 (2016-04-18) Win 7.
Verified fixed.
What about ESR38.8?
Flags: needinfo?(sledru)
Comment on attachment 8724126 [details] [diff] [review]
Hold more references

Needed too
Flags: needinfo?(sledru)
Attachment #8724126 - Flags: approval-mozilla-esr38+
has problems uplifting to esr38

grafting 336847:52ac9ba8eb1d "Bug 1242668 - Hold more references. r=sotaro, a=lizzard"
merging dom/ipc/TabChild.cpp
merging gfx/layers/ipc/CompositorChild.cpp
warning: conflicts while merging dom/ipc/TabChild.cpp! (edit, then use 'hg resolve --mark')
warning: conflicts while merging gfx/layers/ipc/CompositorChild.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)
Flags: needinfo?(jmuizelaar)
This is missing ESR38.8 now and since it is EOL after this, I expect this is out of ESR38 forever.
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main46+][adv-esr45.1+]
Sorry about not rebasing the patch on time.
Flags: needinfo?(jmuizelaar)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.