Closed Bug 1125925 Opened 9 years ago Closed 9 years ago

UAF in gfxContext::Clip

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 - wontfix
firefox41 + wontfix
firefox42 + wontfix
firefox-esr31 --- wontfix
firefox-esr38 --- wontfix

People

(Reporter: bwc, Unassigned)

References

Details

(Keywords: csectype-uaf, sec-critical, Whiteboard: gfx-noted)

Attachments

(5 files)

Attached file uaf.txt
While trying to test a talky.io call on ASAN, I got the attached crash.
This looks to be because we're using the clip path pool on multiple threads.

It all looks thread safe though, and I can't see how this would actually happen.
Group: gfx-core-security
Byron, are you able to reproduce this regularly or did this just happen one time?
Flags: needinfo?(docfaraday)
Keywords: csectype-uaf
I got two slightly different crashes, both attached here. I have not seen this again, although I don't think I've run talky.io under ASAN since then.
Flags: needinfo?(docfaraday)
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> This looks to be because we're using the clip path pool on multiple threads.
> 
> It all looks thread safe though, and I can't see how this would actually
> happen.

First crash has the original allocate, free and uaf all on the same thread?  Second crash does have the two threads involved.  I could be reading it wrong.
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> This looks to be because we're using the clip path pool on multiple threads.

According to Jeff, we're not actually supposed to be using the clip path pool?
(In reply to Milan Sreckovic [:milan] from comment #5)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> > This looks to be because we're using the clip path pool on multiple threads.
> > 
> > It all looks thread safe though, and I can't see how this would actually
> > happen.
> 
> First crash has the original allocate, free and uaf all on the same thread? 
> Second crash does have the two threads involved.  I could be reading it
> wrong.

That is correct.
One thing that jumps out at me is the use of pool->top (eg; here https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool-private.h#80). There are no threadsafety protections on that integer, so it could literally take any value depending on the whims of the compiler, including values beyond the end of the pool array.
This looks flat-out wrong:

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool.c#50

    for (i = ARRAY_LENGTH (pool->pool); i--;) {
	ptr = _atomic_fetch (&pool->pool[i]);
	if (ptr != NULL) {
	    pool->top = i;
	    return ptr;
	}
    }

Surely they meant to do something like

    for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) {
	ptr = _atomic_fetch (&pool->pool[i]);
	if (ptr != NULL) {
	    pool->top = i;
	    return ptr;
	}
    }
I should also point out that if we mess up the refcounting somehow, we could end up placing a clip_path into the pool, and then destroying it again and freeing it this time because the pool is full, leading to the possibility that some code later will pull it out of the pool and UAF.
Given that this was an ASAN run, the problem in comment 8 is probably not the cause, since an array out of bounds error would have shown up. I also don't think the problem in comment 9 will get us; it is just a busywait or maybe a deadlock at worst. My money is on comment 10.
I've only looked a little bit, but I'm already seeing signs that cairo is not doing an awesome job of keeping its refcounts correct:

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#270

Before this line, the object pointed to by clip->path (call it X) has a refcount of a, and the object pointed to by clip->path->prev (Y) has a refcount of b. After this line, the refcounts remain the same, but X should have a refcount of a - 1, and Y should have a refcount of b + 1 (X still points at Y, so now clip->path is a new reference). The best-case scenario here is for X to be leaked. However, if anything else is holding a ref to X, we're in real trouble and will UAF.
On closer inspection, in comment 12 |clip_path| is another ref to X, so the code is definitely setting up a UAF.
Searching cairo_clip_ on crash-stats finds some stuff, and searching for cairo_ finds more stuff, some of which is operating on some member of a clip path.
FWIW, the latest cairo source does not seem to have these problems.
(In reply to Byron Campen [:bwc] (PTO Feb 18-20) from comment #9)
> This looks flat-out wrong:
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> freed-pool.c#50
> 
>     for (i = ARRAY_LENGTH (pool->pool); i--;) {
> ...
> 
> Surely they meant to do something like
> 
>     for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) {
> ...

Yeah, I got caught when I first read this as well; it isn't wrong, but it is annoyingly written.  The two for loop statements are equivalent, so the bug is elsewhere.
Milan, can someone look into this so it can be moved forward?
Flags: needinfo?(milan)
Byron, with all the investigation you've done, do you have an idea of a patch?
Flags: needinfo?(milan) → needinfo?(docfaraday)
Well, I've seen how cairo fixed the bad error handling identified in comment 12 and comment 14, but who knows whether those are actually the cause. It would take a lot more investigation to figure out where the problem actually lies, since I only scratched the surface. We probably want to update our cairo version.
Flags: needinfo?(docfaraday)
Good point, updating the cairo version would be a good thing to do, though probably not the quick fix we're looking for.  Al, let me see what we can do here, quickly.
Flags: needinfo?(milan)
Flags: needinfo?(milan)
Whiteboard: gfx-noted
Didn't mean to remove NI on me.
Flags: needinfo?(milan)
(In reply to Byron Campen [:bwc] from comment #16)
> FWIW, the latest cairo source does not seem to have these problems.

Is this the latest cairo with all of our patches applied, or as it is upstream?
(In reply to Milan Sreckovic [:milan] from comment #23)
> (In reply to Byron Campen [:bwc] from comment #16)
> > FWIW, the latest cairo source does not seem to have these problems.
> 
> Is this the latest cairo with all of our patches applied, or as it is
> upstream?

Upstream.
We do need to track this for 38 since we don't want to ship this sec-critical problem in 38. Milan can you help us find an owner on this bug?  Thanks.
Does this also affect 37?
This probably affects even older releases, don't think this is a recent regression.
Assignee: nobody → jmuizelaar
Flags: needinfo?(milan)
Can we schedule to take the newer Cairo? This is sec-critical and it has been sitting around for more than a month.
Flags: needinfo?(jmuizelaar)
Lee, can you take a look?
Assignee: jmuizelaar → lsalzman
Flags: needinfo?(jmuizelaar)
(In reply to Byron Campen [:bwc] from comment #8)
> One thing that jumps out at me is the use of pool->top (eg; here
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> freed-pool-private.h#80). There are no threadsafety protections on that
> integer, so it could literally take any value depending on the whims of the
> compiler, including values beyond the end of the pool array.

At least in this case, the value of top seems to be advisory, functioning more like a fuzzy value for a search start, and thread safety is guaranteed by atomic exchanges (with NULL) used to actually fetch pointers from the pool's array, however unreliable the actual value of top may be.

So when it tries to get something from the pool, it looks at the top value, checks for a non-NULL pointer there. If it doesn't find one, it searches the entire pool in any case. Upon analysis, I don't see how this particular piece of the code could be at fault.

The freed pool does appear to be getting used in any case.
(In reply to Byron Campen [:bwc] from comment #0)
> Created attachment 8554646 [details]
> uaf.txt
> 
> While trying to test a talky.io call on ASAN, I got the attached crash.

Are you still able to reproduce the bug, and if so, is there any more boiled down/reliable procedure that other people might be able to reproduce it by? I at least tried doing a few talky calls, but couldn't get anything suspicious to occur for me.
Flags: needinfo?(docfaraday)
This still happens pretty frequently for me. It seems to happen much more often when the window that is running a webrtc call is backgrounded (say, switching from the browser to a terminal window).
Flags: needinfo?(docfaraday)
Okay, I put up a hypothetical patch to handle some clearly erroneous behavior when it is trying to destroy paths in failure cases.

The clip_path->prev link points to the old path stored in the clip info. But when it destroys clip_path, it never nulls out the prev link, so _clip_path_destroy tries to free it, even though the clip info got altered so that the clip info now points to the very same path (clip->path = clip->path->prev) that may get accidentally destroyed. This is a case of "but how did this ever work at all?" 

This seems to agree with the fact the first trace all happened on a single thread, so wasn't necessarily a result of a race condition.

While this is definitely a bug in the code, I'm not sure it is THE bug we are experiencing here, so if you could try applying this small patch that handles properly NULLing out the prev link before freeing the new clip path to your tree and see if it makes the problem go away for you... If it does I'll clean up the patch and make it more official-like.
Flags: needinfo?(docfaraday)
Patch doesn't seem to have fixed the crashiness.
Flags: needinfo?(docfaraday)
(In reply to Byron Campen [:bwc] from comment #12)
> I've only looked a little bit, but I'm already seeing signs that cairo is
> not doing an awesome job of keeping its refcounts correct:
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> clip.c#270
> 
> Before this line, the object pointed to by clip->path (call it X) has a
> refcount of a, and the object pointed to by clip->path->prev (Y) has a
> refcount of b. After this line, the refcounts remain the same, but X should
> have a refcount of a - 1, and Y should have a refcount of b + 1 (X still
> points at Y, so now clip->path is a new reference). The best-case scenario
> here is for X to be leaked. However, if anything else is holding a ref to X,
> we're in real trouble and will UAF.

It seems like the intention of the reference counts is to keep track of how many clip structs are referring to clip_path structs, and that the primary means of decrementing the RC is through _cairo_clip_path_destroy. So, the bug here, as per the undo patch, seems to be rather that the discarded clip path didn't give up its own reference.

But still, it doesn't seem like these error handling cases, though clearly buggy, are what is causing the bug, as they only seem to trigger in out of memory conditions. Whatever is going on here is somewhat less than obvious...
Byron, do you think you could try reproducing this in rr and investigating what's going on?
Flags: needinfo?(docfaraday)
I'm not going to have the cycles for this in the near term, I'm afraid.
Flags: needinfo?(docfaraday)
Would taking the newer Cairo have a chance of addressing this?
It certainly seems plausible, but there are no guarantees.
This sec-critical bug has been around for almost six months now. We need either to find someone who can work on it or we need to reassess its severity. Any ideas, Lee?
Flags: needinfo?(lsalzman)
(In reply to Matt Wobensmith from comment #41)
> This sec-critical bug has been around for almost six months now. We need
> either to find someone who can work on it or we need to reassess its
> severity. Any ideas, Lee?

Further work is stalled until Byron can collaborate on reproducing this.
Flags: needinfo?(lsalzman)
Byron, see comment 37. I know you are busy, but we are blocked on this critical bug.
Flags: needinfo?(docfaraday)
I'll only be able to help when it starts happening again, sadly.
Flags: needinfo?(docfaraday)
OK, let's reopen when it starts happening again, so that we don't spin on this bug too much.
Assignee: lsalzman → nobody
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
We don't have a worksforme status option so I'm marking this as wontfix for all active branches. We should reset status if this bug is reopened.
Group: core-security → core-security-release
Group: gfx-core-security
This is happening again, and I've developed a (temporary?) knack for making it happen reliably. Disabling the memory pooling for cairo fixes the problem. Digging into this some more.
Oh now that's interesting. The atomics stuff we're using (HAVE_CXX11_ATOMIC_PRIMITIVES) was written by us. We upstreamed it, but I'm not sure anyone but us is using it. Maybe there's some flaw?
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: