Bug 1218124 (CVE-2016-1972)

Race condition in |once| can cause use after free

RESOLVED FIXED in Firefox 45

Status

()

defect
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: q1, Assigned: rillian)

Tracking

({csectype-race, sec-low})

41 Branch
mozilla45
Points:
---
Bug Flags:
sec-bounty +

Firefox Tracking Flags

(firefox44 wontfix, firefox45 fixed)

Details

(Whiteboard: [adv-main45+][post-critsmash-triage])

Attachments

(2 attachments, 3 obsolete attachments)

|once| (media\libvpx\vpx_ports\vpx_once.h) has a race condition that can cause it to read and write to a freed object.

This function is used by various en/decoder initialization functions in media\libvpx . I don't know whether any of these functions are, at present, called in more than one thread context.


Details
-------

The bug is in the handling of the deletion of the critical section object:

19: static void once(void (*func)(void))
20: {
21:     static CRITICAL_SECTION *lock;
22:     static LONG waiters;
23:     static int done;
24:     void *lock_ptr = &lock;
25: 
26:     /* If the initialization is complete, return early. This isn't just an
27:      * optimization, it prevents races on the destruction of the global
28:      * lock.
29:      */
30:     if(done)
31:         return;
32: 
33:     InterlockedIncrement(&waiters);
34: 
35:     /* Get a lock. We create one and try to make it the one-true-lock,
36:      * throwing it away if we lost the race.
37:      */
38: 
39:     {
40:         /* Scope to protect access to new_lock */
41:         CRITICAL_SECTION *new_lock = malloc(sizeof(CRITICAL_SECTION));
42:         InitializeCriticalSection(new_lock);
43:         if (InterlockedCompareExchangePointer(lock_ptr, new_lock, NULL) != NULL)
44:         {
45:             DeleteCriticalSection(new_lock);
46:             free(new_lock);
47:         }
48:     }
49: 
50:     /* At this point, we have a lock that can be synchronized on. We don't
51:      * care which thread actually performed the allocation.
52:      */
53: 
54:     EnterCriticalSection(lock);
55: 
56:     if (!done)
57:     {
58:         func();
59:         done = 1;
60:     }
61: 
62:     LeaveCriticalSection(lock);
63: 
64:     /* Last one out should free resources. The destructed objects are
65:      * protected by checking if(done) above.
66:      */
67:     if(!InterlockedDecrement(&waiters))
68:     {
69:         DeleteCriticalSection(lock);
70:         free(lock);
71:         lock = NULL;
72:     }
73: }

such that the following race can unfold, assuming |done == 0| and |waiters == 0| initially:

   thread 1                            thread 2
   --------                            --------

   execute up to (but not into)
   line 33, then get preempted

                                       execute through line 67,
                                       find that |waiters| is now 0,
                                       then get preempted

   resume execution at line 33,
   skip creating a new critical
   section object in lines 39-48
   (because the existing one is
   still pointed to by |lock|)
   then get preempted before
   executing line 54

                                       resume execution at line 68,
                                       execute lines 69-70 (deleting and freeing
                                       the critical section), then get
                                       preempted

   resume execution at line 54,
   calling |EnterCriticalSection|
   with a deleted and freed critical
   section object


There are also some other problems in this code.

1. The |lock = NULL| on line 71 (which does not use an interlocked instruction) is not properly interlocked with respect to |InterlockedCompareExchangePointer| on line 43.

2. The early-out check at lines 30-31 is questionable. I wonder whether the interlocked instruction presumably executed by LeaveCriticalSection on line 62 necessarily forces all processors in the system (under all architectures) to read the updated value of |done|, though this appears to be the case on Intel hardware; see http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html s.8.2.3.9.

3. |once| is hazardous because nothing prevents it from being called with different function objects in the same compilation unit. In this case, the first use would set |done == 1|, causing the second use to be skipped:

void buggy () {
   once (f);
   once (g); // g() is skipped
}

4. |once| also allows its function object to be executed more than once if |once| is called with the same function object in different compilation units:

unit1.cpp
---------

#include "vpx_once.h"

void f ();

void z1 () {
   once (f); // f() is called
}

unit2.cpp
---------

#include "vpx_once.h"

void f ();

void z2 () {
   once (f); // f() is called again
}
The bug is still present in today's trunk: http://hg.mozilla.org/mozilla-central/file/05d85d9bbce4/media/libvpx/vpx_ports/vpx_once.h .
Ralph, can you look into this? Thanks.
Flags: needinfo?(giles)
Flags: sec-bounty?
I can take a look next week.
Group: core-security → media-core-security
Thanks for the very clear report, and sorry it took so long to get to it.

I agree there's a race, but couldn't figure out how to fix it without rewriting the function entirely. The API constraint of trying to allocate the CRITICAL_SECTION in a thread-safe manner complicates things considerably.

In the end, Kip Gilbert came up with this approach based on a single state machine variable with atomic increments and compares. Does this look better?

It doesn't look too serious; we still need to check our calls to vpx initializers to see if it's possible for the race to occur in gecko. For severity, libvpx only uses this to protect singleton initializers, so I suspect the worst it can do is leak.
Assignee: nobody → giles
Flags: needinfo?(giles)
Attachment #8684495 - Flags: feedback?(q1)
Attachment #8684495 - Flags: feedback?
Comment on attachment 8684495 [details] [diff] [review]
Use atomic increment and compare

Review of attachment 8684495 [details] [diff] [review]:
-----------------------------------------------------------------

This looks correct and clean, except that |state| should have the appropriate alignment directive (e.g., alignas) to ensure that it's aligned on at least a 32 bit boundary, per https://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.85%29.aspx .
Comment on attachment 8684495 [details] [diff] [review]
Use atomic increment and compare

Review of attachment 8684495 [details] [diff] [review]:
-----------------------------------------------------------------

Darn, forgot the review flag.
Attachment #8684495 - Flags: feedback?(q1) → feedback+
Maybe green on try. There are some thread-foo leaks on win 7 mochitest.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bfa596a6059
(In reply to q1 from comment #5)

> This looks correct and clean, except that |state| should have the
> appropriate alignment directive (e.g., alignas) to ensure that it's aligned
> on at least a 32 bit boundary, per
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.
> 85%29.aspx .

Thanks for the review. According to docs, alignas requires Visual Studio 2015, and I believe we still support 2013, but we can use __declspec(align(4)) instead.

I couldn't find anything specifically about static variables, but Microsoft does say, "For simple data types, the compiler assigns addresses that are multiples of the size in bytes of the data type." And I can't find any instances elsewhere in gecko where we use an alignment attribute on something which isn't a struct or an array. So I don't think one is necessary here.

The other two arguments are presumedly immediate or moved onto the stack, so they should be 32-bit aligned as well. Can you elaborate why you believe alignas is necessary here?
Flags: needinfo?(q1)
(In reply to Ralph Giles (:rillian) from comment #8)
> (In reply to q1 from comment #5)
> 
> > This looks correct and clean, except that |state| should have the
> > appropriate alignment directive (e.g., alignas) to ensure that it's aligned
> > on at least a 32 bit boundary, per
> > https://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.
> > 85%29.aspx .
> 
> Thanks for the review. According to docs, alignas requires Visual Studio
> 2015, and I believe we still support 2013, but we can use
> __declspec(align(4)) instead.
> 
> I couldn't find anything specifically about static variables, but Microsoft
> does say, "For simple data types, the compiler assigns addresses that are
> multiples of the size in bytes of the data type." And I can't find any
> instances elsewhere in gecko where we use an alignment attribute on
> something which isn't a struct or an array. So I don't think one is
> necessary here.
> 
> The other two arguments are presumedly immediate or moved onto the stack, so
> they should be 32-bit aligned as well. Can you elaborate why you believe
> alignas is necessary here?

https://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.85%29.aspx says:

> The parameters for this function must be aligned on a 32-bit boundary; otherwise, the function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems. See _aligned_malloc.

But if some other MS doc guarantees natural alignment, then what you wrote is fine.
Flags: needinfo?(q1)
Comment on attachment 8684495 [details] [diff] [review]
Use atomic increment and compare

Ok, thanks. I think we're ok with natural alignment from what I've seen in the docs.

Ready for review.
Attachment #8684495 - Flags: feedback? → review?(gsquelart)
Comment on attachment 8684495 [details] [diff] [review]
Use atomic increment and compare

Review of attachment 8684495 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with one change and a couple of nits:

::: vpx_ports/vpx_once.h
@@ +18,5 @@
>  #include <stdlib.h>
>  static void once(void (*func)(void))
>  {
> +    /* Use 'state' to track progress in calling func() only once. */
> +    static volatile LONG state = 0;

I don't think 'volatile' is needed here, as 'state' in only used in Interlocked* functions, which are memory barriers.
It doesn't hurt, but it gives a false sense of extra safety.

More importantly, static initialization of local variables is *not* thread-safe in MSVC! So there could be a race, however unlikely:
T1: state init yet? no
*switch*
T2: state init yet? no
T2: state = 0
T2: ...
T2: state = 1 or even 2
*switch*
T1: state = 0 !!!

'state' should be made compilation-unit-static (just move it outside of the function), so that it is initialized before threads are started. Please make sure it is still 32-bit aligned (I would imagine so, but you never know with MSVC...)

@@ +50,1 @@
>      }

Busy wait, eew!
But I think it's acceptable here, as we're only waiting for initialization functions that should complete quickly, and it's much easier to wait that way; and contentions should be rare anyway.
Would you mind adding a comment to explain that trade-off?

---

This patch fixes the most important issue of races, but doesn't address other concerns regarding potentially-incorrect usage of once(), as described in points 3 and 4 in comment 0.
At a glance, it seems to be used correctly in libvpx (i.e., at most one time per c file), so I'm okay with not adding more code now.
Maybe a comment near the top of the file would be nice, as a warning to future maintainers.
Attachment #8684495 - Flags: review?(gsquelart) → review+
Not sure why bugzilla only shows only one line of code with my "busy wait, eew" review comment.
The comment obviously applies to the while(){Sleep(0);} loop at lines 43-50.
> More importantly, static initialization of local variables is *not*
> thread-safe in MSVC! 

Eek! I don't believe I missed that! MS (https://msdn.microsoft.com/en-us/library/s1sb61xd.aspx ) says of this issue:

------
Assigning a value to a static local variable in a multithreaded application is not thread safe and we do not recommend it as a programming practice.
------

and making them file-level statics seems the best practice.

However, C++11 s.6.7(4) says (see especially the ***-ed clause):

------
The zero-initialization (8.5) of all block-scope variables with static storage duration (3.7.1) or thread storage duration (3.7.2) is performed before any other initialization takes place. [What does this mean? Before namespace-scope statics are initialized? Or before any further initialization of block-scope statics? I guess the latter.] Constant initialization (3.6.2) of a block-scope entity with static storage duration, if applicable, is performed before its block is first entered. An implementation is permitted to perform early initialization of other block-scope variables with static or thread storage duration under the same conditions that an implementation is permitted to statically initialize a variable with static or thread storage duration in namespace scope (3.6.2). Otherwise such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization.... [***] If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization. [***] [89] If control re-enters the declaration recursively while the variable is being initialized, the behavior is undefined.

[89] The implementation must not introduce any deadlock around execution of the initializer.
------

I wonder how many implementations really obey the ***-ed clause?
FWIW VS 2015 initializes a block-level static with a constant initializer:

   void f () {
      static int i = 7;
   ...
   }

before main () is entered. In fact, a dump of the image file reveals that it compiles it into the image.
gcc and clang do the right thing, even for non-POD vars. But since MSVC doesn't, we cannot rely on it.

Otherwise it would have been real easy to implement 'once':
> static void once(void (*func)(void))
> {
>   static bool done = (func(), true);
>   return (void)done;
> }
:-)
(In reply to Gerald Squelart [:gerald] from comment #11)

> I don't think 'volatile' is needed here, as 'state' in only used in
> Interlocked* functions, which are memory barriers.

I agree it's not needed. Volatile helps prevent the compiler from re-ordering accesses, or eliding reads for when hardware is changing the value, but the intrinsics implementing the Interlocked functions should take care of that.

> More importantly, static initialization of local variables is *not*
> thread-safe in MSVC!

Wow! Good catch. I didn't even think about that.

> 'state' should be made compilation-unit-static (just move it outside of the
> function), so that it is initialized before threads are started.

I wondered about just dropping the initializer. C++ requires that static variables be zero-initialized, iirc. But, no. MSDN says:

> Zero initialization is performed at different times:
> 
>     At program startup, for all named variables that have static duration. These variables may later be initialized again.

https://msdn.microsoft.com/en-us/library/w7wd1177.aspx

Too bad, I find the function-scoped static easier to reason about. Are we sure compilation-unit statics are safe? :-)

> Would you mind adding a comment to explain that trade-off?

Sure.

> At a glance, it seems to be used correctly in libvpx (i.e., at most one time
> per c file), so I'm okay with not adding more code now.

Yes, I think this is a reasonable place to stop trying to fix it. All I could think of was to save the function pointer and assert if we're called with a different one.

> Maybe a comment near the top of the file would be nice, as a warning to
> future maintainers.
Here's a bit more background. The thread-safe initialization requirement was not present in C++ 1998 or C++ 2003, but was added in C++11 (or maybe TR1? I don't have that spec).

VS 2015 implements dynamically-initialized block-scope statics using interthread locking, via module crt\vcstartup\src\misc\thread_safe_statics.cpp, so it, at least, seems compliant with C++11 s.6.7(4). To see this, look at the generated code for:

void f () {
   static DWORD i = GetTickCount ();
}

int main () {
   f ();
   return 0;
}

VS 2013 does NOT implement this section correctly.

> I wondered about just dropping the initializer. C++ requires that static variables be 
> zero-initialized, iirc. But, no. MSDN says:
>
>> Zero initialization is performed at different times:
>> 
>>     At program startup, for all named variables that have static duration. These variables may 
>>     later be initialized again.

This is deep in the weeds, but technically interesting. I think it would be fine to drop the initializer. C++98 says:

------
The memory occupied by any object of static storage duration shall be zero-initialized at program startup before any other initialization takes place. [Note: in some cases, additional initialization is done later.]
------

The bracketed clause refers to initializing a static object with a constant or non-constant value. So theoretically the compiler can initialize

   {
      static int i = 0;
   }

twice: once "at program startup before any other initialization..." and again "before its block is first entered" because it is a "local object of POD type...with static storage duration initialized with a constant-expression[]..." (C++98 s.6.7(4)).

But

   {
      static int i;
   }

should be zero-initialized only "at program startup before any other initialization" because it has no explicit initializer.

> Are we sure compilation-unit statics are safe? :-)

As long as they have no initializers or constant initializers, I think they're fine. If they have non-constant ("dynamic") initializers, then you get into the awful complexities of C++98 s.3.6.2. Let's not.
That should read:

> As long as they have no initializers or [only] constant initializers, I think they're fine.
> If they have non-constant ("dynamic") initializers, then you get into the awful complexities of
> C++98 s.3.6.2. Let's not.
(In reply to q1 from comment #17)

> As long as they have no initializers or constant initializers, I think
> they're fine. If they have non-constant ("dynamic") initializers, then you
> get into the awful complexities of C++98 s.3.6.2. Let's not.

That's how I read it too. Thanks for the additional info.
(In reply to Ralph Giles (:rillian) from comment #19)
> (In reply to q1 from comment #17)
> 
> > As long as they have no initializers or constant initializers, I think
> > they're fine. If they have non-constant ("dynamic") initializers, then you
> > get into the awful complexities of C++98 s.3.6.2. Let's not.
> 
> That's how I read it too. Thanks for the additional info.

And I also agree that it's one way to make this thing work. It just feels so ugly to rely on behind-the-scene initialization when we (programmers of the world!) get in so much trouble when we forget to initialize non-static variables.
That's why I'd personally prefer a nearby compilation-unit static. That's the lesser of two evils to me!

But as it's a question of personal taste, if you want to use it, do go ahead, I'd just request that you document it.
It would be nice if there was some annotation (e.g. a MOZ_??? macro) to tell people&tools that this non-initialization is intended.
Oops, q1's quote was about compilation-unit statics. I was thinking of function-scoped statics when I started writing my comment. Sorry for the confusion

So to clarify:
- I prefer a nearby initialized global static to a non-initialized local static.
- But I'm okay with the non-init local static, if it's clearly documented.
How's this for the additional documentation comments?
Attachment #8684495 - Attachment is obsolete: true
Attachment #8686831 - Flags: review?(gsquelart)
Comment on attachment 8686831 [details] [diff] [review]
Use atomic increment and compare v2

Review of attachment 8686831 [details] [diff] [review]:
-----------------------------------------------------------------

Marvelicious.

::: media/libvpx/vpx_ports/vpx_once.h
@@ +51,3 @@
>  static void once(void (*func)(void))
>  {
> +    /* Try to advance state from its initial value of 0 to 1.

'state' -> 'once_state'. (And a few more times below)
Attachment #8686831 - Flags: review?(gsquelart) → review+
We definitely create decoders from multiple tasks across the thread pool, so I expect this bug is present in Firefox. However, I don't see any allocations wrapped in the the once() calls; it only seems to be used for function pointer and static table initialization (C doesn't have dynamic initializers). Which makes me wonder if there's a thread safety issue at all, or if this is just a performance optimization. Are the simd detection instructions racy?

Marking this sec-low since there doesn't seem to be exploit risk
Keywords: sec-low
Automation bits.
Attachment #8686899 - Flags: review?(gsquelart)
Apply "banana rule" to once_state. Carrying forward r=gerald.
Attachment #8686831 - Attachment is obsolete: true
Attachment #8686900 - Flags: review+
Comment on attachment 8686899 [details] [diff] [review]
Add vpx_once patch to update.sh

Review of attachment 8686899 [details] [diff] [review]:
-----------------------------------------------------------------

r+, but please check patch's argument:

::: media/libvpx/update.py
@@ +592,5 @@
>      # Cherry pick https://chromium-review.googlesource.com/#/c/276889/
>      # to fix crash on 32bit
>      os.system("patch -p1 < vp9_filter_restore_aligment.patch")
> +    # Patch win32 vpx_once.
> +    os.system("patch -p2 < vpx_once.patch")

Shouldn't this be '-p3' to remove "a/media/libvpx" from the filename?
(Just read the docs, I don't actually know patch much, so I may very well be wrong)
Attachment #8686899 - Flags: review?(gsquelart) → review+
Right you are. I actually ran the update script this time. Carrying forward r=gerald.
Attachment #8686924 - Flags: review+
The CondVar leaks with with Win7 debug Mochitest 3 are intermittent and I reproduced with with m-c, so I don't think it's the fault of this patch. https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa810938d894

Ready to land.
Comment on attachment 8686899 [details] [diff] [review]
Add vpx_once patch to update.sh

Should have been marked obsolete after v2 was added.
Attachment #8686899 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/10343f89ef92
https://hg.mozilla.org/mozilla-central/rev/2ee46231d2c0
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Flags: sec-bounty? → sec-bounty+
Group: media-core-security → core-security-release
Adding James Zern from the libvpx upstream team. James, please note this is a confidential bug. We typically remove the private flag after 6 months.
Whiteboard: [adv-main45+]
Whiteboard: [adv-main45+] → [adv-main45+][post-critsmash-triage]
Alias: CVE-2016-1972
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.