Closed Bug 1125925 Opened 9 years ago Closed 9 years ago

UAF in gfxContext::Clip

Tracking

()

Status:

RESOLVED WORKSFORME

Tracking Flags:

Tracking

Status

firefox37

---

wontfix

firefox38

---

wontfix

firefox39

---

wontfix

firefox40

wontfix

firefox41

wontfix

firefox42

wontfix

firefox-esr31

---

wontfix

firefox-esr38

---

wontfix

People

(Reporter: bwc, Unassigned)

References

Details

(Keywords: csectype-uaf, sec-critical, Whiteboard: gfx-noted)

Attachments

(5 files)

uaf.txt 9 years ago Byron Campen [:bwc] 21.42 KB, text/plain		Details
Another ASAN crash when closing the browser on a talky.io call 9 years ago Byron Campen [:bwc] 30.07 KB, text/plain		Details
Patch to properly handle undoing clip paths in error cases 9 years ago Lee Salzman [:lsalzman] 1.59 KB, patch		Details \| Diff \| Splinter Review
Crash with patch applied 9 years ago Byron Campen [:bwc] 23.26 KB, text/plain		Details
Mozconfig I'm using. Probably needs to be scrubbed of distcc/ccache stuff for it to work for you. 9 years ago Byron Campen [:bwc] 1.99 KB, text/plain		Details

Byron Campen [:bwc]

Reporter

Description

•

9 years ago

Attached file uaf.txt — Details

While trying to test a talky.io call on ASAN, I got the attached crash.

Byron Campen [:bwc]

Reporter

Comment 1

•

9 years ago

Attached file Another ASAN crash when closing the browser on a talky.io call — Details

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

status-firefox38: --- → affected

tracking-firefox38: --- → +

Keywords: sec-critical

Matt Woodrow (:mattwoodrow)

Comment 2

•

9 years ago

This looks to be because we're using the clip path pool on multiple threads.

It all looks thread safe though, and I can't see how this would actually happen.

Daniel Veditz [:dveditz]

Updated

•

9 years ago

Group: gfx-core-security

Andrew McCreight [:mccr8]

Comment 3

•

9 years ago

Byron, are you able to reproduce this regularly or did this just happen one time?

Flags: needinfo?(docfaraday)

Keywords: csectype-uaf

Byron Campen [:bwc]

Reporter

Comment 4

•

9 years ago

I got two slightly different crashes, both attached here. I have not seen this again, although I don't think I've run talky.io under ASAN since then.

Flags: needinfo?(docfaraday)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 5

•

9 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> This looks to be because we're using the clip path pool on multiple threads.
> 
> It all looks thread safe though, and I can't see how this would actually
> happen.

First crash has the original allocate, free and uaf all on the same thread?  Second crash does have the two threads involved.  I could be reading it wrong.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 6

•

9 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> This looks to be because we're using the clip path pool on multiple threads.

According to Jeff, we're not actually supposed to be using the clip path pool?

Byron Campen [:bwc]

Reporter

Comment 7

•

9 years ago

(In reply to Milan Sreckovic [:milan] from comment #5)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> > This looks to be because we're using the clip path pool on multiple threads.
> > 
> > It all looks thread safe though, and I can't see how this would actually
> > happen.
> 
> First crash has the original allocate, free and uaf all on the same thread? 
> Second crash does have the two threads involved.  I could be reading it
> wrong.

That is correct.

Byron Campen [:bwc]

Reporter

Comment 8

•

9 years ago

One thing that jumps out at me is the use of pool->top (eg; here https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool-private.h#80). There are no threadsafety protections on that integer, so it could literally take any value depending on the whims of the compiler, including values beyond the end of the pool array.

Byron Campen [:bwc]

Reporter

Comment 9

•

9 years ago

This looks flat-out wrong:

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-freed-pool.c#50

    for (i = ARRAY_LENGTH (pool->pool); i--;) {
	ptr = _atomic_fetch (&pool->pool[i]);
	if (ptr != NULL) {
	    pool->top = i;
	    return ptr;
	}
    }

Surely they meant to do something like

    for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) {
	ptr = _atomic_fetch (&pool->pool[i]);
	if (ptr != NULL) {
	    pool->top = i;
	    return ptr;
	}
    }

Byron Campen [:bwc]

Reporter

Comment 10

•

9 years ago

I should also point out that if we mess up the refcounting somehow, we could end up placing a clip_path into the pool, and then destroying it again and freeing it this time because the pool is full, leading to the possibility that some code later will pull it out of the pool and UAF.

Byron Campen [:bwc]

Reporter

Comment 11

•

9 years ago

Given that this was an ASAN run, the problem in comment 8 is probably not the cause, since an array out of bounds error would have shown up. I also don't think the problem in comment 9 will get us; it is just a busywait or maybe a deadlock at worst. My money is on comment 10.

Byron Campen [:bwc]

Reporter

Comment 12

•

9 years ago

I've only looked a little bit, but I'm already seeing signs that cairo is not doing an awesome job of keeping its refcounts correct:

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#270

Before this line, the object pointed to by clip->path (call it X) has a refcount of a, and the object pointed to by clip->path->prev (Y) has a refcount of b. After this line, the refcounts remain the same, but X should have a refcount of a - 1, and Y should have a refcount of b + 1 (X still points at Y, so now clip->path is a new reference). The best-case scenario here is for X to be leaked. However, if anything else is holding a ref to X, we're in real trouble and will UAF.

Byron Campen [:bwc]

Reporter

Comment 13

•

9 years ago

On closer inspection, in comment 12 |clip_path| is another ref to X, so the code is definitely setting up a UAF.

Byron Campen [:bwc]

Reporter

Comment 14

•

9 years ago

The same bad error handling in some other places:

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#388
https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#431
https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-clip.c#449

Byron Campen [:bwc]

Reporter

Comment 15

•

9 years ago

Searching cairo_clip_ on crash-stats finds some stuff, and searching for cairo_ finds more stuff, some of which is operating on some member of a clip path.

Byron Campen [:bwc]

Reporter

Comment 16

•

9 years ago

FWIW, the latest cairo source does not seem to have these problems.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 17

•

9 years ago

(In reply to Byron Campen [:bwc] (PTO Feb 18-20) from comment #9)
> This looks flat-out wrong:
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> freed-pool.c#50
> 
>     for (i = ARRAY_LENGTH (pool->pool); i--;) {
> ...
> 
> Surely they meant to do something like
> 
>     for (i = ARRAY_LENGTH (pool->pool); i > 0; i--) {
> ...

Yeah, I got caught when I first read this as well; it isn't wrong, but it is annoyingly written.  The two for loop statements are equivalent, so the bug is elsewhere.

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

status-firefox39: --- → affected

tracking-firefox38: + → ---

tracking-firefox39: --- → +

Al Billings [:abillings - ex-MoCo]

Comment 18

•

9 years ago

Milan, can someone look into this so it can be moved forward?

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 19

•

9 years ago

Byron, with all the investigation you've done, do you have an idea of a patch?

Flags: needinfo?(milan) → needinfo?(docfaraday)

Byron Campen [:bwc]

Reporter

Comment 20

•

9 years ago

Well, I've seen how cairo fixed the bad error handling identified in comment 12 and comment 14, but who knows whether those are actually the cause. It would take a lot more investigation to figure out where the problem actually lies, since I only scratched the surface. We probably want to update our cairo version.

Flags: needinfo?(docfaraday)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 21

•

9 years ago

Good point, updating the cairo version would be a good thing to do, though probably not the quick fix we're looking for.  Al, let me see what we can do here, quickly.

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

9 years ago

Flags: needinfo?(milan)

Whiteboard: gfx-noted

Milan Sreckovic [:milan] (needinfo for best results)

Comment 22

•

9 years ago

Didn't mean to remove NI on me.

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 23

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #16)
> FWIW, the latest cairo source does not seem to have these problems.

Is this the latest cairo with all of our patches applied, or as it is upstream?

Byron Campen [:bwc]

Reporter

Comment 24

•

9 years ago

(In reply to Milan Sreckovic [:milan] from comment #23)
> (In reply to Byron Campen [:bwc] from comment #16)
> > FWIW, the latest cairo source does not seem to have these problems.
> 
> Is this the latest cairo with all of our patches applied, or as it is
> upstream?

Upstream.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Blocks: 1095510

Liz Henry (:lizzard) (relman/hg->git project)

Comment 25

•

9 years ago

We do need to track this for 38 since we don't want to ship this sec-critical problem in 38. Milan can you help us find an owner on this bug?  Thanks.

tracking-firefox38: --- → +

Liz Henry (:lizzard) (relman/hg->git project)

Comment 26

•

9 years ago

Does this also affect 37?

status-firefox37: --- → ?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 27

•

9 years ago

This probably affects even older releases, don't think this is a recent regression.

Assignee: nobody → jmuizelaar

Flags: needinfo?(milan)

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

status-firefox37: ? → wontfix

status-firefox40: --- → affected

status-firefox-esr31: --- → ?

tracking-firefox38: + → ---

tracking-firefox39: + → ---

Al Billings [:abillings - ex-MoCo]

Comment 28

•

9 years ago

Can we schedule to take the newer Cairo? This is sec-critical and it has been sitting around for more than a month.

Flags: needinfo?(jmuizelaar)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 29

•

9 years ago

Lee, can you take a look?

Assignee: jmuizelaar → lsalzman

Flags: needinfo?(jmuizelaar)

Lee Salzman [:lsalzman]

Comment 30

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #8)
> One thing that jumps out at me is the use of pool->top (eg; here
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> freed-pool-private.h#80). There are no threadsafety protections on that
> integer, so it could literally take any value depending on the whims of the
> compiler, including values beyond the end of the pool array.

At least in this case, the value of top seems to be advisory, functioning more like a fuzzy value for a search start, and thread safety is guaranteed by atomic exchanges (with NULL) used to actually fetch pointers from the pool's array, however unreliable the actual value of top may be.

So when it tries to get something from the pool, it looks at the top value, checks for a non-NULL pointer there. If it doesn't find one, it searches the entire pool in any case. Upon analysis, I don't see how this particular piece of the code could be at fault.

The freed pool does appear to be getting used in any case.

Lee Salzman [:lsalzman]

Comment 31

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #0)
> Created attachment 8554646 [details]
> uaf.txt
> 
> While trying to test a talky.io call on ASAN, I got the attached crash.

Are you still able to reproduce the bug, and if so, is there any more boiled down/reliable procedure that other people might be able to reproduce it by? I at least tried doing a few talky calls, but couldn't get anything suspicious to occur for me.

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Reporter

Comment 32

•

9 years ago

This still happens pretty frequently for me. It seems to happen much more often when the window that is running a webrtc call is backgrounded (say, switching from the browser to a terminal window).

Flags: needinfo?(docfaraday)

Lee Salzman [:lsalzman]

Comment 33

•

9 years ago

Attached patch Patch to properly handle undoing clip paths in error cases — Details — Splinter Review

Okay, I put up a hypothetical patch to handle some clearly erroneous behavior when it is trying to destroy paths in failure cases.

The clip_path->prev link points to the old path stored in the clip info. But when it destroys clip_path, it never nulls out the prev link, so _clip_path_destroy tries to free it, even though the clip info got altered so that the clip info now points to the very same path (clip->path = clip->path->prev) that may get accidentally destroyed. This is a case of "but how did this ever work at all?" 

This seems to agree with the fact the first trace all happened on a single thread, so wasn't necessarily a result of a race condition.

While this is definitely a bug in the code, I'm not sure it is THE bug we are experiencing here, so if you could try applying this small patch that handles properly NULLing out the prev link before freeing the new clip path to your tree and see if it makes the problem go away for you... If it does I'll clean up the patch and make it more official-like.

Flags: needinfo?(docfaraday)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

status-firefox41: --- → affected

Byron Campen [:bwc]

Reporter

Comment 34

•

9 years ago

Attached file Crash with patch applied — Details

Patch doesn't seem to have fixed the crashiness.

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Reporter

Comment 35

•

9 years ago

Attached file Mozconfig I'm using. Probably needs to be scrubbed of distcc/ccache stuff for it to work for you. — Details

Lee Salzman [:lsalzman]

Comment 36

•

9 years ago

(In reply to Byron Campen [:bwc] from comment #12)
> I've only looked a little bit, but I'm already seeing signs that cairo is
> not doing an awesome job of keeping its refcounts correct:
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-
> clip.c#270
> 
> Before this line, the object pointed to by clip->path (call it X) has a
> refcount of a, and the object pointed to by clip->path->prev (Y) has a
> refcount of b. After this line, the refcounts remain the same, but X should
> have a refcount of a - 1, and Y should have a refcount of b + 1 (X still
> points at Y, so now clip->path is a new reference). The best-case scenario
> here is for X to be leaked. However, if anything else is holding a ref to X,
> we're in real trouble and will UAF.

It seems like the intention of the reference counts is to keep track of how many clip structs are referring to clip_path structs, and that the primary means of decrementing the RC is through _cairo_clip_path_destroy. So, the bug here, as per the undo patch, seems to be rather that the discarded clip path didn't give up its own reference.

But still, it doesn't seem like these error handling cases, though clearly buggy, are what is causing the bug, as they only seem to trigger in out of memory conditions. Whatever is going on here is somewhat less than obvious...

Jeff Muizelaar [:jrmuizel]

Comment 37

•

9 years ago

Byron, do you think you could try reproducing this in rr and investigating what's going on?

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Reporter

Comment 38

•

9 years ago

I'm not going to have the cycles for this in the near term, I'm afraid.

Flags: needinfo?(docfaraday)

Al Billings [:abillings - ex-MoCo]

Comment 39

•

9 years ago

Would taking the newer Cairo have a chance of addressing this?

Byron Campen [:bwc]

Reporter

Comment 40

•

9 years ago

It certainly seems plausible, but there are no guarantees.

Matt Wobensmith [:mwobensmith][:matt:]

Comment 41

•

9 years ago

This sec-critical bug has been around for almost six months now. We need either to find someone who can work on it or we need to reassess its severity. Any ideas, Lee?

Flags: needinfo?(lsalzman)

Lee Salzman [:lsalzman]

Comment 42

•

9 years ago

(In reply to Matt Wobensmith from comment #41)
> This sec-critical bug has been around for almost six months now. We need
> either to find someone who can work on it or we need to reassess its
> severity. Any ideas, Lee?

Further work is stalled until Byron can collaborate on reproducing this.

Flags: needinfo?(lsalzman)

Matt Wobensmith [:mwobensmith][:matt:]

Comment 43

•

9 years ago

Byron, see comment 37. I know you are busy, but we are blocked on this critical bug.

Flags: needinfo?(docfaraday)

Daniel Veditz [:dveditz]

Updated

•

9 years ago

status-firefox38: affected → wontfix

status-firefox39: affected → wontfix

status-firefox42: --- → affected

status-firefox-esr38: --- → affected

tracking-firefox40: --- → +

tracking-firefox41: --- → +

tracking-firefox42: --- → +

Byron Campen [:bwc]

Reporter

Comment 44

•

9 years ago

I'll only be able to help when it starts happening again, sadly.

Flags: needinfo?(docfaraday)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 45

•

9 years ago

OK, let's reopen when it starts happening again, so that we don't spin on this bug too much.

Assignee: lsalzman → nobody

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Lawrence Mandel [:lmandel] (use needinfo)

Comment 46

•

9 years ago

We don't have a worksforme status option so I'm marking this as wontfix for all active branches. We should reset status if this bug is reopened.

status-firefox40: affected → wontfix

status-firefox41: affected → wontfix

status-firefox42: affected → wontfix

status-firefox-esr31: ? → wontfix

status-firefox-esr38: affected → wontfix

Daniel Veditz [:dveditz]

Updated

•

9 years ago

tracking-firefox40: + → -

BMO Automation

Updated

•

9 years ago

Group: core-security → core-security-release

Daniel Veditz [:dveditz]

Updated

•

9 years ago

Group: gfx-core-security

Byron Campen [:bwc]

Reporter

Comment 47

•

9 years ago

This is happening again, and I've developed a (temporary?) knack for making it happen reliably. Disabling the memory pooling for cairo fixes the problem. Digging into this some more.

Byron Campen [:bwc]

Reporter

Comment 48

•

9 years ago

Oh now that's interesting. The atomics stuff we're using (HAVE_CXX11_ATOMIC_PRIMITIVES) was written by us. We upstreamed it, but I'm not sure anyone but us is using it. Maybe there's some flaw?

Byron Campen [:bwc]

Reporter

Comment 49

•

9 years ago

https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairoint.h?case=true&from=cairoint.h#114

ಠ_ಠ

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.