Closed
Bug 1125925
Opened 9 years ago
Closed 9 years ago
UAF in gfxContext::Clip
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
People
(Reporter: bwc, Unassigned)
References
Details
(Keywords: csectype-uaf, sec-critical, Whiteboard: gfx-noted)
Attachments
(5 files)
While trying to test a talky.io call on ASAN, I got the attached crash.
Reporter | ||
Comment 1•9 years ago
|
||
Updated•9 years ago
|
Comment 2•9 years ago
|
||
This looks to be because we're using the clip path pool on multiple threads. It all looks thread safe though, and I can't see how this would actually happen.
Updated•9 years ago
|
Group: gfx-core-security
Comment 3•9 years ago
|
||
Byron, are you able to reproduce this regularly or did this just happen one time?
Flags: needinfo?(docfaraday)
Keywords: csectype-uaf
Reporter | ||
Comment 4•9 years ago
|
||
I got two slightly different crashes, both attached here. I have not seen this again, although I don't think I've run talky.io under ASAN since then.
Flags: needinfo?(docfaraday)
(In reply to Matt Woodrow (:mattwoodrow) from comment #2) > This looks to be because we're using the clip path pool on multiple threads. > > It all looks thread safe though, and I can't see how this would actually > happen. First crash has the original allocate, free and uaf all on the same thread? Second crash does have the two threads involved. I could be reading it wrong.
(In reply to Matt Woodrow (:mattwoodrow) from comment #2) > This looks to be because we're using the clip path pool on multiple threads. According to Jeff, we're not actually supposed to be using the clip path pool?
Reporter | ||
Comment 7•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #5) > (In reply to Matt Woodrow (:mattwoodrow) from comment #2) > > This looks to be because we're using the clip path pool on multiple threads. > > > > It all looks thread safe though, and I can't see how this would actually > > happen. > > First crash has the original allocate, free and uaf all on the same thread? > Second crash does have the two threads involved. I could be reading it > wrong. That is correct.
Reporter | ||
Comment 8•9 years ago
|
||
One thing that jumps out at me is the use of pool->top (eg; here https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool-private.h#80). There are no threadsafety protections on that integer, so it could literally take any value depending on the whims of the compiler, including values beyond the end of the pool array.
Reporter | ||
Comment 9•9 years ago
|
||
This looks flat-out wrong: https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool.c#50 for (i = ARRAY_LENGTH (pool->pool); i--;) { ptr = _atomic_fetch (&pool->pool[i]); if (ptr != NULL) { pool->top = i; return ptr; } } Surely they meant to do something like for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) { ptr = _atomic_fetch (&pool->pool[i]); if (ptr != NULL) { pool->top = i; return ptr; } }
Reporter | ||
Comment 10•9 years ago
|
||
I should also point out that if we mess up the refcounting somehow, we could end up placing a clip_path into the pool, and then destroying it again and freeing it this time because the pool is full, leading to the possibility that some code later will pull it out of the pool and UAF.
Reporter | ||
Comment 11•9 years ago
|
||
Given that this was an ASAN run, the problem in comment 8 is probably not the cause, since an array out of bounds error would have shown up. I also don't think the problem in comment 9 will get us; it is just a busywait or maybe a deadlock at worst. My money is on comment 10.
Reporter | ||
Comment 12•9 years ago
|
||
I've only looked a little bit, but I'm already seeing signs that cairo is not doing an awesome job of keeping its refcounts correct: https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#270 Before this line, the object pointed to by clip->path (call it X) has a refcount of a, and the object pointed to by clip->path->prev (Y) has a refcount of b. After this line, the refcounts remain the same, but X should have a refcount of a - 1, and Y should have a refcount of b + 1 (X still points at Y, so now clip->path is a new reference). The best-case scenario here is for X to be leaked. However, if anything else is holding a ref to X, we're in real trouble and will UAF.
Reporter | ||
Comment 13•9 years ago
|
||
On closer inspection, in comment 12 |clip_path| is another ref to X, so the code is definitely setting up a UAF.
Reporter | ||
Comment 14•9 years ago
|
||
The same bad error handling in some other places: https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#388 https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#431 https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#449
Reporter | ||
Comment 15•9 years ago
|
||
Searching cairo_clip_ on crash-stats finds some stuff, and searching for cairo_ finds more stuff, some of which is operating on some member of a clip path.
Reporter | ||
Comment 16•9 years ago
|
||
FWIW, the latest cairo source does not seem to have these problems.
(In reply to Byron Campen [:bwc] (PTO Feb 18-20) from comment #9) > This looks flat-out wrong: > > https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo- > freed-pool.c#50 > > for (i = ARRAY_LENGTH (pool->pool); i--;) { > ... > > Surely they meant to do something like > > for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) { > ... Yeah, I got caught when I first read this as well; it isn't wrong, but it is annoyingly written. The two for loop statements are equivalent, so the bug is elsewhere.
Updated•9 years ago
|
Comment 18•9 years ago
|
||
Milan, can someone look into this so it can be moved forward?
Flags: needinfo?(milan)
Byron, with all the investigation you've done, do you have an idea of a patch?
Flags: needinfo?(milan) → needinfo?(docfaraday)
Reporter | ||
Comment 20•9 years ago
|
||
Well, I've seen how cairo fixed the bad error handling identified in comment 12 and comment 14, but who knows whether those are actually the cause. It would take a lot more investigation to figure out where the problem actually lies, since I only scratched the surface. We probably want to update our cairo version.
Flags: needinfo?(docfaraday)
Good point, updating the cairo version would be a good thing to do, though probably not the quick fix we're looking for. Al, let me see what we can do here, quickly.
Flags: needinfo?(milan)
Updated•9 years ago
|
Flags: needinfo?(milan)
Whiteboard: gfx-noted
Didn't mean to remove NI on me.
Flags: needinfo?(milan)
(In reply to Byron Campen [:bwc] from comment #16) > FWIW, the latest cairo source does not seem to have these problems. Is this the latest cairo with all of our patches applied, or as it is upstream?
Reporter | ||
Comment 24•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #23) > (In reply to Byron Campen [:bwc] from comment #16) > > FWIW, the latest cairo source does not seem to have these problems. > > Is this the latest cairo with all of our patches applied, or as it is > upstream? Upstream.
Comment 25•9 years ago
|
||
We do need to track this for 38 since we don't want to ship this sec-critical problem in 38. Milan can you help us find an owner on this bug? Thanks.
tracking-firefox38:
--- → +
This probably affects even older releases, don't think this is a recent regression.
Assignee: nobody → jmuizelaar
Flags: needinfo?(milan)
Updated•9 years ago
|
status-firefox40:
--- → affected
status-firefox-esr31:
--- → ?
tracking-firefox38:
+ → ---
tracking-firefox39:
+ → ---
Comment 28•9 years ago
|
||
Can we schedule to take the newer Cairo? This is sec-critical and it has been sitting around for more than a month.
Flags: needinfo?(jmuizelaar)
Lee, can you take a look?
Assignee: jmuizelaar → lsalzman
Flags: needinfo?(jmuizelaar)
Comment 30•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #8) > One thing that jumps out at me is the use of pool->top (eg; here > https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo- > freed-pool-private.h#80). There are no threadsafety protections on that > integer, so it could literally take any value depending on the whims of the > compiler, including values beyond the end of the pool array. At least in this case, the value of top seems to be advisory, functioning more like a fuzzy value for a search start, and thread safety is guaranteed by atomic exchanges (with NULL) used to actually fetch pointers from the pool's array, however unreliable the actual value of top may be. So when it tries to get something from the pool, it looks at the top value, checks for a non-NULL pointer there. If it doesn't find one, it searches the entire pool in any case. Upon analysis, I don't see how this particular piece of the code could be at fault. The freed pool does appear to be getting used in any case.
Comment 31•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #0) > Created attachment 8554646 [details] > uaf.txt > > While trying to test a talky.io call on ASAN, I got the attached crash. Are you still able to reproduce the bug, and if so, is there any more boiled down/reliable procedure that other people might be able to reproduce it by? I at least tried doing a few talky calls, but couldn't get anything suspicious to occur for me.
Flags: needinfo?(docfaraday)
Reporter | ||
Comment 32•9 years ago
|
||
This still happens pretty frequently for me. It seems to happen much more often when the window that is running a webrtc call is backgrounded (say, switching from the browser to a terminal window).
Flags: needinfo?(docfaraday)
Comment 33•9 years ago
|
||
Okay, I put up a hypothetical patch to handle some clearly erroneous behavior when it is trying to destroy paths in failure cases. The clip_path->prev link points to the old path stored in the clip info. But when it destroys clip_path, it never nulls out the prev link, so _clip_path_destroy tries to free it, even though the clip info got altered so that the clip info now points to the very same path (clip->path = clip->path->prev) that may get accidentally destroyed. This is a case of "but how did this ever work at all?" This seems to agree with the fact the first trace all happened on a single thread, so wasn't necessarily a result of a race condition. While this is definitely a bug in the code, I'm not sure it is THE bug we are experiencing here, so if you could try applying this small patch that handles properly NULLing out the prev link before freeing the new clip path to your tree and see if it makes the problem go away for you... If it does I'll clean up the patch and make it more official-like.
Flags: needinfo?(docfaraday)
Updated•9 years ago
|
status-firefox41:
--- → affected
Reporter | ||
Comment 34•9 years ago
|
||
Patch doesn't seem to have fixed the crashiness.
Flags: needinfo?(docfaraday)
Reporter | ||
Comment 35•9 years ago
|
||
Comment 36•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #12) > I've only looked a little bit, but I'm already seeing signs that cairo is > not doing an awesome job of keeping its refcounts correct: > > https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo- > clip.c#270 > > Before this line, the object pointed to by clip->path (call it X) has a > refcount of a, and the object pointed to by clip->path->prev (Y) has a > refcount of b. After this line, the refcounts remain the same, but X should > have a refcount of a - 1, and Y should have a refcount of b + 1 (X still > points at Y, so now clip->path is a new reference). The best-case scenario > here is for X to be leaked. However, if anything else is holding a ref to X, > we're in real trouble and will UAF. It seems like the intention of the reference counts is to keep track of how many clip structs are referring to clip_path structs, and that the primary means of decrementing the RC is through _cairo_clip_path_destroy. So, the bug here, as per the undo patch, seems to be rather that the discarded clip path didn't give up its own reference. But still, it doesn't seem like these error handling cases, though clearly buggy, are what is causing the bug, as they only seem to trigger in out of memory conditions. Whatever is going on here is somewhat less than obvious...
Comment 37•9 years ago
|
||
Byron, do you think you could try reproducing this in rr and investigating what's going on?
Flags: needinfo?(docfaraday)
Reporter | ||
Comment 38•9 years ago
|
||
I'm not going to have the cycles for this in the near term, I'm afraid.
Flags: needinfo?(docfaraday)
Comment 39•9 years ago
|
||
Would taking the newer Cairo have a chance of addressing this?
Reporter | ||
Comment 40•9 years ago
|
||
It certainly seems plausible, but there are no guarantees.
Comment 41•9 years ago
|
||
This sec-critical bug has been around for almost six months now. We need either to find someone who can work on it or we need to reassess its severity. Any ideas, Lee?
Flags: needinfo?(lsalzman)
Comment 42•9 years ago
|
||
(In reply to Matt Wobensmith from comment #41) > This sec-critical bug has been around for almost six months now. We need > either to find someone who can work on it or we need to reassess its > severity. Any ideas, Lee? Further work is stalled until Byron can collaborate on reproducing this.
Flags: needinfo?(lsalzman)
Comment 43•9 years ago
|
||
Byron, see comment 37. I know you are busy, but we are blocked on this critical bug.
Flags: needinfo?(docfaraday)
Updated•9 years ago
|
status-firefox42:
--- → affected
status-firefox-esr38:
--- → affected
tracking-firefox40:
--- → +
tracking-firefox41:
--- → +
tracking-firefox42:
--- → +
Reporter | ||
Comment 44•9 years ago
|
||
I'll only be able to help when it starts happening again, sadly.
Flags: needinfo?(docfaraday)
OK, let's reopen when it starts happening again, so that we don't spin on this bug too much.
Assignee: lsalzman → nobody
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Comment 46•9 years ago
|
||
We don't have a worksforme status option so I'm marking this as wontfix for all active branches. We should reset status if this bug is reopened.
Updated•9 years ago
|
Updated•9 years ago
|
Group: core-security → core-security-release
Updated•9 years ago
|
Group: gfx-core-security
Reporter | ||
Comment 47•9 years ago
|
||
This is happening again, and I've developed a (temporary?) knack for making it happen reliably. Disabling the memory pooling for cairo fixes the problem. Digging into this some more.
Reporter | ||
Comment 48•9 years ago
|
||
Oh now that's interesting. The atomics stuff we're using (HAVE_CXX11_ATOMIC_PRIMITIVES) was written by us. We upstreamed it, but I'm not sure anyone but us is using it. Maybe there's some flaw?
Reporter | ||
Comment 49•9 years ago
|
||
https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairoint.h?case=true&from=cairoint.h#114 ಠ_ಠ
Updated•7 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•