Closed Bug 1533918 Opened 5 years ago Closed 5 years ago

Sandbox escape implications of CrossProcessMutex and CrossProcessSemaphore

Categories

(Core :: IPC, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: jld, Assigned: gcp)

References

Details

(Keywords: csectype-wildptr, sec-low, Whiteboard: [adv-main72-])

Attachments

(1 file, 1 obsolete file)

Something that occurred to me while discussing bug 1479960 comment #7 on IRC: with the CrossProcessMutex and CrossProcessSemaphore classes on POSIX systems, we have (respectively) a pthread_mutex_t or a sem_t that's writeable by a potentially hostile process and accessed from a trusted process.

I don't know that we've ever evaluated the security implications here, and I don't think it's safe to assume that the various thread libraries' authors would have considered that threat model.

It also jumps out to me that neither mMutex nor mCount are marked volatile, even though I'm pretty sure they should be?

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #1)

It also jumps out to me that neither mMutex nor mCount are marked volatile, even though I'm pretty sure they should be?

As far as I can see, they're only ever set in the constructor, so I'm not sure why they would need to be volatile?

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #0)

I don't know that we've ever evaluated the security implications here, and I don't think it's safe to assume that the various thread libraries' authors would have considered that threat model.

I agree and that's a really good catch!

I took a scan through glibc pthread code and saw that a mutex includes linked list pointers. The ENQUEUE_MUTEX_BOTH[1] macro is called during pthread_mutex_lock() (via __pthread_mutex_lock_full and ENQUEUE_MUTEX) and modifies a local linked list setting the ->next pointer to the value of &mutex->__data.__list.__next which could be set by the child process. More digging wold be required to prove it, but I think there is a very good chance this means a child process could cause the parent to overwrite an address of its choosing.

Even if there wasn't a problematic code path today, one could be added in the future and it wouldn't be a pthread bug given that it isn't a pthread requirement to be hardened against this attack as Jed alluded to.

https://code.woboq.org/userspace/glibc/nptl/descr.h.html#_M/ENQUEUE_MUTEX_BOTH

The other question to ask here is: assuming this is exploitable, what do we do about it? Implementing our own synchronization primitives is nontrivial and we'd have to do it separately for each platform (although it wouldn't be needed on platforms where we don't have sandboxing). The alternative would be to understand what graphics is doing with these constructs and finding another way to provide it, but that might end up reducing to the previous problem because of performance considerations.

To expand on comment #0 a little, this probably doesn't affect Windows: these classes map more or less directly to Windows APIs, which I'm assuming are backed by kernel objects rather than userspace-visible shared memory that could be maliciously mutated.

Assignee: nobody → haftandilian
Priority: -- → P2

This definitely shouldn't affect Windows. The Windows mutex APIs use HANDLEs, which are meant to be cross-process safe.

As for implementing our own primitives, I'm sure there are existing libraries with compatible licenses we can import.

Haik, is this something you still might work on? I'm going through the open sec-high issues that have been untouched for a while. Is this something you might take on for the 71/70 timeframe?

Flags: needinfo?(haftandilian)

(In reply to Liz Henry (:lizzard) from comment #7)

Haik, is this something you still might work on? I'm going through the open sec-high issues that have been untouched for a while. Is this something you might take on for the 71/70 timeframe?

Probably not. I think this is going to be complicated to fix and haven't had the chance to dig into it yet.

Flags: needinfo?(haftandilian)
  1. Some superficial digging through Chromium code didn't turn up those primitives in their codebase. Base locks use POSIX mutexes. One dependency seems to implement them via file-based locks (see also 3).
  2. There are only a few uses in our code, mostly in gfx code. Maybe it's possible to avoid use of this construct entirely.
  3. There's a question as to what happens with a cross-process mutex when one of the processes dies. It's especially relevant as some of our cross-process mutexes involve content processes. They risk getting stuck in an unrecoverable deadlock on the other side! It's possible to avoid this problem on POSIX by requesting ROBUST mutexes, but our code does not do this. Hence, anything currently using this construct risks deadlocking Firefox entirely.
  4. It does not look like there's a robust version of semaphores.

Unless I missed anything here, it seems the best way forward would be to consider users of these constructs as buggy and remove the code from our codebase entirely.

It looks like the manual re-implementation of Shmem that our VR classes use has exactly the same errors and vulnerability:
https://searchfox.org/mozilla-central/source/gfx/vr/VRShMem.cpp#234

(In reply to Gian-Carlo Pascutto [:gcp] from comment #9)

  1. There are only a few uses in our code, mostly in gfx code. Maybe it's possible to avoid use of this construct entirely.

The current prototype of WebGL remoting uses cross-process semaphores (and doesn't work on Mac, where we don't have them), so that will also need to be redesigned to some extent.

The Mojo equivalent of what WebGL remoting is doing, the data pipe, uses normal async IPC messages internally to negotiate flow control. (More or less normal — it's similar to our internal messages like SHMEM_CREATED.) However, our IPC may not perform well enough for that to be a feasible approach.

  1. There's a question as to what happens with a cross-process mutex when one of the processes dies.

That's a good point, and handling this usefully should be a requirement for any mutex/semaphore replacement.

For reference, Windows mutex objects have the notion of an “abandoned” mutex, if the thread holding it terminates; the mutex is effectively released but the next holder will be told that this happened (so that it can signal an error or try to recover). Semaphores don't appear to have anything similar (and it's not clear how that would be possible for a producer/consumer use case).

(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)

It looks like the manual re-implementation of Shmem that our VR classes use has exactly the same errors and vulnerability:
https://searchfox.org/mozilla-central/source/gfx/vr/VRShMem.cpp#234

Do I understand correctly that that's used for communication with a process that shouldn't or can't link against libxul? That's another case where the Mojo system library could be useful.

However, our IPC may not perform well enough for that to be a feasible approach.

Is the concern latency, or the amount of data? You can transfer the textures in the shmem and do the synchronization (only) over the pipe.

Checking back in on this, as it's sec-high it should be set to P1 according to the triage guidelines at https://wiki.mozilla.org/Security/Firefox/Security_Bug_Triage_Process. jld is there anyone who can work on this for the 71/72 timeframe?

Flags: needinfo?(jld)

(In reply to Liz Henry (:lizzard) from comment #13)

Checking back in on this, as it's sec-high it should be set to P1 according to the triage guidelines at https://wiki.mozilla.org/Security/Firefox/Security_Bug_Triage_Process. jld is there anyone who can work on this for the 71/72 timeframe?

We're working on this. We discussed it in our weekly IPC meeting last week.

Flags: needinfo?(jld)
Priority: P2 → P1

I took a scan through glibc pthread code and saw that a mutex includes linked list pointers. The ENQUEUE_MUTEX_BOTH[1] macro is called during pthread_mutex_lock() (via __pthread_mutex_lock_full and ENQUEUE_MUTEX) and modifies a local linked list setting the ->next pointer to the value of &mutex->__data.__list.__next which could be set by the child process. More digging wold be required to prove it, but I think there is a very good chance this means a child process could cause the parent to overwrite an address of its choosing.

To link this with comment 9: those operations are done for the robust mutex case. For robust mutexes, there is a per-thread linked list in userspace of all robust mutexes, and to avoid a specific race where the process dies right after acquiring the corresponding mutex, the mutex itself contains a description of the change it wants to do to the linked list, so the kernel can undo it.

https://www.kernel.org/doc/Documentation/robust-futexes.txt

I still need to check if these operations are actually carried out for non-robust mutexes. If they're not, we dodged the security issue here by using the wrong mutex in the first place.

Expanding on Haik's comment, the problematic lines are these:

mutex->__data.__list.__next = THREAD_GETMEM (THREAD_SELF,		      \
					 robust_head.list);	      \
mutex->__data.__list.__prev = (void *) &THREAD_SELF->robust_head;	      \

Thus, we have an arbitrary memory overwrite with the contents of the robust_mutex list. As we don't actually use robust mutexes, this would be a 0, but that might be enough to cause havoc.

I checked the code and those operations only seem to be carried out if the mutexes are robust (this matches the doc in the above comment).

So at least for CrossProcessMutex, our usage is actually not a security hole, just prone to deadlocking.

POSIX Semaphores on Linux are just normal values, in fact the stored value seems to be literally the value, a count of waiters, and whether it's shared with other processes. So I don't see any way to abuse them barring kernel bugs (which are out of scope here, and we don't harden futexes yet anyway e.g. bug 1441993).

There's also no robust version, so any code of ours that is using them risks a complete deadlock between the processes involved unless the timed versions are used and there's fallback.

I think the conclusion is hence that our current usage of all these primitives is "safe" in the security sense. Nevertheless, we should probably document that they're prone to deadlock, consider removing the mutex version entirely (there's no timed version, and someone might get "smart" and make it robust), and spew warnings if an untimed wait is used, indicating the user did not handle process death.

The musl libc code is more readable and has:

           volatile void *prev = m->_m_prev;
           volatile void *next = m->_m_next;
           *(volatile void volatile )prev = next;

So uhm that looks like 100% arbitrary memory write?

Unfortunately there is an attack against non-robust mutexes: the fact whether they're robust or not is stored in the mutex itself, in a userland accessible place. So an attacker can just take the mutex, write a bit to the robust field, and voila, the above code gets executed next time the other side tries to unlock.

That said, given that the only user of CrossProcessMutex is on Fennec/GeckoView, and that those are not sandboxed, this isn't an actual security issue, but one more reason to get rid of CrossProcessMutex.

It should be relatively simple to build CrossProcessMutex only on Android (and ifdef the graphics code where needed), to ensure that it doesn't gain any unwanted uses.

(As for semaphores, it would be nice if we could provide a more pipe-like abstraction with separate producer and consumer capabilities, so that there's a clearly defined condition where one end is dead and operations on the other end should fail / be interrupted, but that seems like a separate bug.)

To summarize what's been discovered, the Mac/Linux implementation of CrossProcessMutex could be used as a sandbox escape, but it is not used in our codebase on any platforms with sandboxing. It is used in AsyncPanZoomController, but only when the layers.progressive-paint static pref is true.

If someone more knowledgeable about Fenix can confirm it is only used there and sandboxing is not yet enabled, I don't think we have a reason to keep the bug private, but we will need to to stop using this implementation of CrossProcessMutex eventually (for Fenix sandboxing.)

For this bug fix, I think we could add the #ifdefs Jed suggested in comment 20. Another bug could factor out CrossProcessMutex from AsyncPanZoomController and remove CrossProcessMutex from the tree. (For AsyncPanZoomController, in the worst case, I think we could instead use atomic pointer swaps with 2 FrameMetrics per side. That would require 4 total FrameMetrics per AsyncPanZoomController instance where each one is 200 bytes.) And another bug could add a safe Mac/Linux-compatible implementation of CrossProcessSemaphore.

I've done some testing to compare the latency of a cross-process mutex unlock with pipe messaging and the limited testing I've done suggests performance of a pipe-based unblocking primitive could be comparable to the pthread shared mutex. Tests available at https://github.com/hafta/ipc-unblock-tests

Lowering the security to sec-low (missing best practice) and making the bug visible to ease discussion.

The proposal is to only allow compiling CrossProcessMutex on non-Windows if MOZ_SANDBOX is disabled. That should prevent accidental use in situations where it matters.

Group: dom-core-security
Keywords: sec-highsec-low
Assignee: haftandilian → gpascutto

The problem with removing CrossProcessMutex for insecure cases (i.e. Linux-like + sandboxing) is this:
https://searchfox.org/mozilla-central/source/gfx/layers/apz/public/MetricsSharingController.h

And it seems APZC uses this even in the same-process case (when it obviously wouldn't actually need the mutex):
https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp#93

Now, surgically disabling those calls does seem to still result in a working browser. I'll let kats comment on how crazy this is. The alternative would likely be to MOZ_ASSERT (or MOZ_RELEASE_ASSERT?) on any attempt to use the mutexes.

Attachment #9104181 - Attachment is obsolete: true
Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59f69f1b19d7
Disallow CrossProcessMutex when sandboxing is enabled. r=botond
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Whiteboard: [adv-main72-]
See Also: → 1710483
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: