Closed Bug 1219339 (CVE-2016-1973) Opened 9 years ago Closed 8 years ago

Race condition in GetStaticInstance can cause use after free

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

41 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox43 --- wontfix
firefox44 + wontfix
firefox45 + fixed
firefox46 + fixed
firefox47 --- fixed
firefox-esr38 - wontfix
firefox-esr45 --- fixed
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- affected
b2g-v2.5 --- affected
b2g-v2.2r --- affected
b2g-master --- fixed

People

(Reporter: q1, Assigned: pkerr)

References

Details

(Keywords: csectype-race, sec-high, Whiteboard: [adv-main45+][post-critsmash-triage])

Attachments

(4 files, 6 obsolete files)

18.19 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.43 KB, patch
jesup
: review+
Details | Diff | Splinter Review
6.35 KB, patch
Details | Diff | Splinter Review
17.01 KB, patch
Details | Diff | Splinter Review
GetStaticInstance (media\webrtc\trunk\webrtc\system_wrappers\interface\static_instance.h) can experience a race condition. If this occurs, it spuriously decrements its reference count, causing the object protected by it to be destroyed and its memory to be released before all of the code streams using it have been terminated.


Details
-------

The bug is in the handling of the kAddRef case (which is used in at least 2 places in the codebase [1]), and is illustrated by a thread race graph below the affected code:

34: template <class T>
35: // Construct On First Use idiom. Avoids
36: // "static initialization order fiasco".
37: static T* GetStaticInstance(CountOperation count_operation) {
38:   // TODO (hellner): use atomic wrapper instead.
39:   static volatile long instance_count = 0;
40:   static T* volatile instance = NULL;
41:   CreateOperation state = kInstanceExists;
42: #ifndef _WIN32
...
89: #else  // _WIN32
...
107:   } else if (count_operation == kAddRef) {
108:     if (instance_count == 0) {
109:       state = kCreate;
110:     } else {
111:       if (1 == InterlockedIncrement(&instance_count)) {
112:         // InterlockedDecrement because reference count should not be
113:         // updated just yet (that's done when the instance is created).
114:         InterlockedDecrement(&instance_count);
115:         state = kCreate;
116:       }
117:     }
118:   } else { [kRelease]
119:     int new_value = InterlockedDecrement(&instance_count);
120:     if (new_value == 0) {
121:       state = kDestroy;
122:     }
123:   }
124: 
125:   if (state == kCreate) {
126:     // Create instance and let whichever thread finishes first assign its
127:     // local copy to the global instance. All other threads reclaim their
128:     // local copy.
129:     T* new_instance = T::CreateInstance();
130:     if (1 == InterlockedIncrement(&instance_count)) {
131:       InterlockedExchangePointer(reinterpret_cast<void * volatile*>(&instance),
132:                                  new_instance);
133:     } else {
134:       InterlockedDecrement(&instance_count);
135:       if (new_instance) {
136:         delete static_cast<T*>(new_instance);
137:       }
138:     }
139:   } else if (state == kDestroy) {
140:     T* old_value = static_cast<T*>(InterlockedExchangePointer(
141:         reinterpret_cast<void * volatile*>(&instance), NULL));
142:     if (old_value) {
143:       delete static_cast<T*>(old_value);
144:     }
145:     return NULL;
146:   }
147: #endif  // #ifndef _WIN32
148:   return instance;
149: }


Assume |instance_count == 0| and |instance == nullptr| initially, and |count_operation == kAddRef| for both threads:

Thread 1                                       Thread 2
--------                                       --------

Execute lines 107-09, notice that              Execute lines 107-09, notice that     
|instance_count == 0| and thus set             |instance_count == 0| and thus set
|state = kCreate|                              |state = kCreate|

Execute lines 125-29, notice that              Execute lines 125-29, notice that     
|state == kCreate| and create a                |state == kCreate| and create a
new instance of T into |*new_instance|         new instance of T into |*new_instance|

Execute line 130, notice that
|instance_count == 1|. Execute lines 131-32,
replacing |instance| with |new_instance|,
then execute line 148,
returning |instance| to the caller.

                                               Execute line 130, notice that |instance_count == 2|,
                                               and execute lines 134-37, decrementing |instance_count| to 1,
                                               and deleting |new_instance|, then execute line 148,
                                               returning |instance| to the caller.

Use |instance| for awhile...                   Use |instance| for awhile...

Call GetStaticInstance with |kRelease|.
Execute lines 119-21, decrementing
|instance_count| to 0, noticing that,
and setting |state = kDestroy|.
Execute lines 140-41 (nulling |instance|)
and lines 142-43 (destroying and freeing
|instance|'s old value.

                                               Continue using the now-destroyed and freed |instance|.


BTW, I am also unsure whether the ordinary reads of |instance_count| and |instance| are compatible with the interlocked operations performed on those variables under all relevant architectures.

This bug is still present in the current trunk: http://hg.mozilla.org/mozilla-central/file/fc706d376f06/media/webrtc/trunk/webrtc/system_wrappers/interface/static_instance.h .

[1] media\webrtc\trunk\webrtc\modules\rtc_rtcp\source\ssrc_database.cc and media\webrtc\trunk\webrtc\system_wrappers\source\trace_impl.cc .
    It is also used in the test code media\webrtc\trunk\webrtc\test\channel_transport\udp_socket_manager_wrapper.cc .
Flags: sec-bounty?
Group: core-security → media-core-security
Assignee: nobody → rjesup
backlog: --- → webrtc/webaudio+
Rank: 5
Priority: -- → P1
Rewritten from scratch using Mozilla (c++11) Atomics.  Tries very hard to keep the normal paths away from the critical section via the Atomics.  Note that the common usage (especially for TraceImpl) uses pairs of kAddRefNoCreate/kRelease, with kAddRef being used ~once when logging is enabled for each call.
Attachment #8683486 - Flags: review?(nfroyd)
(In reply to Randell Jesup [:jesup] from comment #1)
> Created attachment 8683486 [details] [diff] [review]
> proposed rewrite of GetStaticInstance
> 
> Rewritten from scratch using Mozilla (c++11) Atomics.  Tries very hard to
> keep the normal paths away from the critical section via the Atomics.  Note
> that the common usage (especially for TraceImpl) uses pairs of
> kAddRefNoCreate/kRelease, with kAddRef being used ~once when logging is
> enabled for each call.

This seems more complex than necessary. Why not just do everything under the critical section, so everyone easily can see that it's correct? Is this code used in places where the performance impact justifies the complexity?
(In reply to q1 from comment #2)
> (In reply to Randell Jesup [:jesup] from comment #1)
> > Created attachment 8683486 [details] [diff] [review]
> > proposed rewrite of GetStaticInstance
> > 
> > Rewritten from scratch using Mozilla (c++11) Atomics.  Tries very hard to
> > keep the normal paths away from the critical section via the Atomics.  Note
> > that the common usage (especially for TraceImpl) uses pairs of
> > kAddRefNoCreate/kRelease, with kAddRef being used ~once when logging is
> > enabled for each call.
> 
> This seems more complex than necessary. Why not just do everything under the
> critical section, so everyone easily can see that it's correct? Is this code
> used in places where the performance impact justifies the complexity?

Yes, this is used on a hot-path heavily-contended logging function (WebRTC TraceImpl) in realtime code.  The other usage (SSRC database) isn't perf-critical.  I'd use something very like a simple "static T* GetStaticInstance() { static T instance; return &instance; }  IF TraceImpl wasn't big (megabytes when in use), because that would cause it to linger.

I wanted to take a run at an Atomic version of this; there are other ways to do it, but most involve a mutex somewhere.   We can try alternates as well, and a coming update of webrtc.org code gets rid of most of the buffers/size for TraceImpl, which would allow us to keep it around.  (MB of unused ram after one webrtc use will cause problems for mobile).
I think there's a bug in the patch.

Imagine that thread 1, handling a kRelease, decrements the refcnt to 0, then executes the |if| below and gets preempted before executing the .exchange:

      if (instance_count == 0) {
        delete_me = instance.exchange(NULL); // XXX is exchange needed?
      }

Now imagine that thread 2 does a kAddRef, bumping the refcnt to 1 and then getting preempted.

And then imagine that thread 3 does a kAddRef, bumping the refcnt to 2 and then executing:

    if (old_count > 0) {
      // don't need to create it
      T* value = instance;
      if (value) {
        return value;
      }

So thread 3 gets |instance|.

Now thread 1 executes, does the exchange, and deletes |instance|. Ick.
Attachment #8683486 - Flags: review?(nfroyd)
(In reply to q1 from comment #4)
> I think there's a bug in the patch.

> Now thread 1 executes, does the exchange, and deletes |instance|. Ick.

Yeah, that's why this sort of thing is "fun"....  We may need a post-exchange check of the count before doing anything with it.  And if so we need to be careful about the type of Atomic constraints used, and may need to change from ReleaseAcquire.

I'll probably put up a pure critical-section-based patch as an alternative.  There is some concern about use of that in the field, due to perf.  As a high-performance alternative, I'll also put up a c++11 static-constructor based patch, and see if I can find a way around the memory hit (perhaps some way to drop the data and let it by dynamically re-allocated if needed).
(In reply to Randell Jesup [:jesup] from comment #5)
> (In reply to q1 from comment #4)
> > I think there's a bug in the patch.
> 
> > Now thread 1 executes, does the exchange, and deletes |instance|. Ick.
> 
> Yeah, that's why this sort of thing is "fun"....  We may need a
> post-exchange check of the count before doing anything with it.  And if so
> we need to be careful about the type of Atomic constraints used, and may
> need to change from ReleaseAcquire.

There are some simple contention-reduction approaches using eventcounts and spinlocks, such as http://www.cs.uml.edu/~bill/cs515/Eventcounts_Sequencers_Reed_Kanodia_79.pdf .

> I'll probably put up a pure critical-section-based patch as an alternative. 
> There is some concern about use of that in the field, due to perf.  

It'd be interesting to see what impact that has on performance.

> As a
> high-performance alternative, I'll also put up a c++11 static-constructor
> based patch, and see if I can find a way around the memory hit (perhaps some
> way to drop the data and let it by dynamically re-allocated if needed).

How much memory does it use? Unless it's huge, perhaps it doesn't much matter because the OS eventually pushes unused memory out of the working set, meaning that really all it consumes is virtual address space.
(In reply to q1 from comment #6)
> (In reply to Randell Jesup [:jesup] from comment #5)
> > (In reply to q1 from comment #4)
> > > I think there's a bug in the patch.
> > 
> > > Now thread 1 executes, does the exchange, and deletes |instance|. Ick.
> > 
> > Yeah, that's why this sort of thing is "fun"....  We may need a
> > post-exchange check of the count before doing anything with it.  And if so
> > we need to be careful about the type of Atomic constraints used, and may
> > need to change from ReleaseAcquire.
> 
> There are some simple contention-reduction approaches using eventcounts and
> spinlocks, such as
> http://www.cs.uml.edu/~bill/cs515/Eventcounts_Sequencers_Reed_Kanodia_79.pdf

Perhaps...  Spinlocks are very bad in most realtime code.  And realtime code has deadlines; statistically taking a reasonable amount of time isn't good if it occasionally takes much longer.

> > I'll probably put up a pure critical-section-based patch as an alternative. 
> > There is some concern about use of that in the field, due to perf.  
> 
> It'd be interesting to see what impact that has on performance.

Yes.  And it might be ok-ish, though on the edges of performance it may suffer.

> > As a
> > high-performance alternative, I'll also put up a c++11 static-constructor
> > based patch, and see if I can find a way around the memory hit (perhaps some
> > way to drop the data and let it by dynamically re-allocated if needed).
> 
> How much memory does it use? Unless it's huge, perhaps it doesn't much
> matter because the OS eventually pushes unused memory out of the working
> set, meaning that really all it consumes is virtual address space.

one word: mobile.  And this is multi-megabytes; on b2g (firefoxos) this is a bug chunk of ram and no device to page it to.
(In reply to Randell Jesup [:jesup] from comment #7)
> (In reply to q1 from comment #6)
> > (In reply to Randell Jesup [:jesup] from comment #5)
> > > (In reply to q1 from comment #4)
> > > > I think there's a bug in the patch.
> > > 
> > > > Now thread 1 executes, does the exchange, and deletes |instance|. Ick.
> > > 
> > > Yeah, that's why this sort of thing is "fun"....  We may need a
> > > post-exchange check of the count before doing anything with it.  And if so
> > > we need to be careful about the type of Atomic constraints used, and may
> > > need to change from ReleaseAcquire.
> > 
> > There are some simple contention-reduction approaches using eventcounts and
> > spinlocks, such as
> > http://www.cs.uml.edu/~bill/cs515/Eventcounts_Sequencers_Reed_Kanodia_79.pdf
> 
> Perhaps...  Spinlocks are very bad in most realtime code.  

I disagree. They are the fastest way to implement critical sections. You just do:

   if (InterlockedTestAndSet (&lockbit)) { } // Wait while someone else is in critical section
   // You're in the critical section, do your thing.
   InterlockedClear (&lockbit); // Exit critical section

Within a few instructions' time of the owning thread exiting the critical section, a waiting thread enters it.

The other typical way of doing this involves calls into the OS scheduler (via mutexes, semaphores, and the like), which typically is much slower.

I'd also note that spinlocks are generally the only available multiprocessor synchronization mechanism for code executing in device drivers at interrupt (high) hardware priorities. (Of course in such applications you do the rock-bottom minimum while under a spinlock so that you don't delay other device drivers or the OS.)

The chief drawback of spinlocks is not lack of speed, but waste of CPU cycles when there is contention. You can obviate this problem (if required -- it might not be) by using sleep() and similar approaches, at the cost of increased wake latency. In a usermode environment like FF you can also combine spinlocks with scheduler primitives like Windows events. Another drawback is that plain spinlocks don't guarantee FIFO service to waiting threads, but I suspect that wouldn't be an issue in a soft realtime environment like FF.

> And realtime code
> has deadlines; statistically taking a reasonable amount of time isn't good
> if it occasionally takes much longer.

Agreed.

> > > I'll probably put up a pure critical-section-based patch as an alternative. 
> > > There is some concern about use of that in the field, due to perf.  
> > 
> > It'd be interesting to see what impact that has on performance.
> 
> Yes.  And it might be ok-ish, though on the edges of performance it may
> suffer.

Is there a good way to time it?
 
> > > As a
> > > high-performance alternative, I'll also put up a c++11 static-constructor
> > > based patch, and see if I can find a way around the memory hit (perhaps some
> > > way to drop the data and let it by dynamically re-allocated if needed).
> > 
> > How much memory does it use? Unless it's huge, perhaps it doesn't much
> > matter because the OS eventually pushes unused memory out of the working
> > set, meaning that really all it consumes is virtual address space.
> 
> one word: mobile.  And this is multi-megabytes; on b2g (firefoxos) this is a
> bug chunk of ram and no device to page it to.

Excellent point. I totally missed that issue.
Ack!

>    if (InterlockedTestAndSet (&lockbit)) { } // Wait while someone else is in critical section

should of course be

   while (InterlockedTestAndSet (&lockbit)) { } // Wait while someone else is in critical section
Bug 1198458 removed the memory hit, so a patch like
"static T* GetStaticInstance() { static T instance; return &instance; }"
no longer has the negative impact.  I'll put that up.
Need the ifdef for windows/posix because TraceImpl is abstract, btw
Attachment #8696261 - Flags: review?(nfroyd)
(In reply to Randell Jesup [:jesup] from comment #10)
> Bug 1198458 removed the memory hit, so a patch like
> "static T* GetStaticInstance() { static T instance; return &instance; }"
> no longer has the negative impact.  I'll put that up.

That also has a race in non-C++11-compliant compilers; see https://bugzilla.mozilla.org/show_bug.cgi?id=1230768 . You have to use a module-level static, but even then it's important to beware order-of-initialization bugs; see C++11 s.3.6.2(2).
We require c++11 compilers for mozilla's code.  MSVC 2013 is the lowest level supported; we use this pattern elsewhere as well.
(In reply to Randell Jesup [:jesup] from comment #13)
> We require c++11 compilers for mozilla's code.  MSVC 2013 is the lowest
> level supported; we use this pattern elsewhere as well.

VS 2013 also generates thread-unsafe code for the return-address-of-local-static pattern. I just tested it.
(In reply to q1 from comment #14)
> (In reply to Randell Jesup [:jesup] from comment #13)
> > We require c++11 compilers for mozilla's code.  MSVC 2013 is the lowest
> > level supported; we use this pattern elsewhere as well.
> 
> VS 2013 also generates thread-unsafe code for the
> return-address-of-local-static pattern. I just tested it.

I'm guessing that's bug 1230768?  That could be problematic, depending on what how it goes wrong.  A Module-level static doesn't help us with deferring creation to first use, note.
Flags: needinfo?(nfroyd)
Flags: needinfo?(dveditz)
(In reply to Randell Jesup [:jesup] from comment #15)
> (In reply to q1 from comment #14)
> > (In reply to Randell Jesup [:jesup] from comment #13)
> > > We require c++11 compilers for mozilla's code.  MSVC 2013 is the lowest
> > > level supported; we use this pattern elsewhere as well.
> > 
> > VS 2013 also generates thread-unsafe code for the
> > return-address-of-local-static pattern. I just tested it.
> 
> I'm guessing that's bug 1230768?  

Yes.

> That could be problematic, depending on
> what how it goes wrong.

Ya. The pattern is used widely, but I don't know the codebase nearly well enough to say which usages might be subject to races.

> A Module-level static doesn't help us with
> deferring creation to first use, note.

True. You need some kind of thread-safe template singleton class implemented with a class-level static that's accessed with interlocked instructions, and that allows lazy creation. There's something like this in security\sandbox\chromium\base\memory\singleton.h/.cc , but I haven't reviewed it for correctness.
Flags: needinfo?(nfroyd)
Comment on attachment 8696261 [details] [diff] [review]
Greatly simplify GetStaticInstance since we don't allocate MB of buffers now

Review of attachment 8696261 [details] [diff] [review]:
-----------------------------------------------------------------

The static variable here is subject to thread races, so I don't think this is going to fly.  Can we really not do double-checked locking here?  That would at least only lock the mutex once.
Attachment #8696261 - Flags: review?(nfroyd)
Flags: sec-bounty? → sec-bounty+
doesn't build for the unittests... darn linkage visibility issues
Comment on attachment 8702385 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl (WIP)

So here's a proposed patch (and template for dealing with singletons elsewhere and avoid the static T instance; return &instance; pattern that's broken in MSVC 2013).

The one downside is that this doesn't build for the unittests (I tried to work around the problem, but failed due to pain in fighting the compiler/build system).

The errors I get for all the unit tests look like this:

 0:11.50 ../../../../../../../ipc/chromium/src/base/atomicops_internals_x86_gcc.h:204: error: undefined reference to 'AtomicOps_Internalx86CPUFeatures'
 0:11.50 ../../../../../../../ipc/chromium/src/base/singleton.h:134: error: undefined reference to 'base::AtExitManager::RegisterCallback(void (*)(void*), void*)'
 0:11.50 ../../../../../../../ipc/chromium/src/base/singleton.h:150: error: undefined reference to 'PlatformThread::YieldCurrentThread()'
 0:11.50 ../../../../../../../ipc/chromium/src/base/singleton.h:134: error: undefined reference to 'base::AtExitManager::RegisterCallback(void (*)(void*), void*)'
 0:11.50 ../../../../../../../ipc/chromium/src/base/singleton.h:150: error: undefined reference to 'PlatformThread::YieldCurrentThread()'

Basically, it can't access the ipc objects from libxul (sigh).

Note that the attempt I made to sidestep this in the patch (using FakeIPC.h) fails; they still are undefined (not sure why).

I could use suggestions on how to resolve this without too much pain.  Thanks!
Attachment #8702385 - Flags: feedback?(nfroyd)
Attachment #8702385 - Flags: feedback?(mh+mozilla)
Ok, solved the linkage problem by splitting the impl from teh definition (strange, but ok).  Will run a try and see if we need to do something like I did to stub the atomicops stuff on Linux
Attachment #8702424 - Flags: feedback?(nfroyd)
Attachment #8702424 - Flags: feedback?(mh+mozilla)
Attachment #8702385 - Attachment is obsolete: true
Attachment #8702385 - Flags: feedback?(nfroyd)
Attachment #8702385 - Flags: feedback?(mh+mozilla)
Attachment #8702424 - Attachment is obsolete: true
Attachment #8702424 - Flags: feedback?(nfroyd)
Attachment #8702424 - Flags: feedback?(mh+mozilla)
q1 - you may want to check singleton.h's impl before we land.  Thanks!
Flags: needinfo?(q1)
(In reply to Randell Jesup [:jesup] from comment #22)
> q1 - you may want to check singleton.h's impl before we land.  Thanks!

I will try to do this review over the next few days.
Comment on attachment 8702916 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Review of attachment 8702916 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for the delay.  singleton.h looks reasonable.  r=me with satisfactory answers to the below questions.

::: media/webrtc/signaling/test/FakeIPC.cpp
@@ +4,5 @@
> +
> +#include "FakeIPC.h"
> +#include <unistd.h>
> +
> +// The implementations can't be in the .h file for some annoying reason

Is this comment still true, or is it a remnant of working things out on try?

::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/ssrc_database.h
@@ +32,4 @@
>      SSRCDatabase();
>      virtual ~SSRCDatabase();
>  
> +protected:

What is this change for?
Attachment #8702916 - Flags: review?(nfroyd) → review+
> ::: media/webrtc/signaling/test/FakeIPC.cpp
> @@ +4,5 @@
> > +
> > +#include "FakeIPC.h"
> > +#include <unistd.h>
> > +
> > +// The implementations can't be in the .h file for some annoying reason
> 
> Is this comment still true, or is it a remnant of working things out on try?

It's true.  C++ sucks sometimes.

> 
> ::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/ssrc_database.h
> @@ +32,4 @@
> >      SSRCDatabase();
> >      virtual ~SSRCDatabase();
> >  
> > +protected:
> 
> What is this change for?

This has to be exposed due using the revised GetStaticInstance with SSRCDatabase.  This is without that change:

 0:21.77 ../../../../../../../ipc/chromium/src/base/singleton.h: In instantiation of ‘static Type* DefaultSingletonTraits<Type>::New() [with Type = webrtc::SSRCDatabase]’:
 0:21.77 ../../../../../../../ipc/chromium/src/base/singleton.h:129:33:   required from ‘static Type* Singleton<Type, Traits, DifferentiatingType>::get() [with Type = webrtc::SSRCDatabase; Traits = DefaultSingletonTraits<webrtc::SSRCDatabase>; DifferentiatingType = webrtc::SSRCDatabase]’
 0:21.77 ../../../../../../../media/webrtc/trunk/webrtc/system_wrappers/interface/static_instance.h:39:27:   required from ‘T* webrtc::GetStaticInstance(webrtc::CountOperation) [with T = webrtc::SSRCDatabase]’
 0:21.78 /home/jesup/src/mozilla/inbound_prof/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/ssrc_database.cc:37:54:   required from here
 0:21.78 /home/jesup/src/mozilla/inbound_prof/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/ssrc_database.cc:85:1: error: ‘webrtc::SSRCDatabase::SSRCDatabase()’ is protected
Comment on attachment 8702916 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Very tough.  Provoking a thread-race in GetStaticInstance would not be easy.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Not really; though one could infer (knowing it landed on a hidden bug) that there's a race there somewhere.

Which older supported branches are affected by this flaw?
All.  

If not all supported branches, which bug introduced the flaw?
N/A - upstream bug

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Trivial backport.  Need to double-check that we don't waste memory for MB of trace buffers in the before-webrtc.org-43 revs (FF44 and earlier); I think we're good with singleton.h.  (c++11 static getInstance would waste memory in 44 and earlier).

How likely is this patch to cause regressions; how much testing does it need?  Unlikely to cause regressions; uses a widely-known singleton class and remove a bunch of custom code for doing that.
Attachment #8702916 - Flags: sec-approval?
Attachment #8702916 - Flags: approval-mozilla-beta?
Attachment #8702916 - Flags: approval-mozilla-aurora?
sec-approval+ for trunk. 

Shouldn't we be taking this on ESR38 as well?
Attachment #8702916 - Flags: sec-approval?
Attachment #8702916 - Flags: sec-approval+
Attachment #8702916 - Flags: approval-mozilla-beta?
Attachment #8702916 - Flags: approval-mozilla-beta+
Attachment #8702916 - Flags: approval-mozilla-aurora?
Attachment #8702916 - Flags: approval-mozilla-aurora+
(In reply to q1 from comment #23)
> (In reply to Randell Jesup [:jesup] from comment #22)
> > q1 - you may want to check singleton.h's impl before we land.  Thanks!
> 
> I will try to do this review over the next few days.

I'm sorry that I never did this review, but I plead excuse because there has been a medical emergency in my family. I will be pretty spotty on bugs for awhile, alas, but I will eventually be back at it.
Flags: needinfo?(q1)
per the comment, recursing in the singleton creation deadlocks, so don't do that.  Note that the code that logged during TraceImpl creation is windows-only.
Attachment #8707593 - Flags: review?(nfroyd)
Attachment #8702916 - Attachment is obsolete: true
(In reply to q1 from comment #28)
> (In reply to q1 from comment #23)
> > (In reply to Randell Jesup [:jesup] from comment #22)
> > > q1 - you may want to check singleton.h's impl before we land.  Thanks!
> > 
> > I will try to do this review over the next few days.
> 
> I'm sorry that I never did this review, but I plead excuse because there has
> been a medical emergency in my family. I will be pretty spotty on bugs for
> awhile, alas, but I will eventually be back at it.

Not in any way a problem; I hope everyone is ok.

I updated the patch because there was a windows-only deadlock (recursing trying to Log while instantiating the Log object).
Comment on attachment 8707593 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Review of attachment 8707593 [details] [diff] [review]:
-----------------------------------------------------------------

The deadlocks from the commented-out logging statements were 100% reproducible?  i.e. we shouldn't have to worry about other random deadlocks?
Attachment #8707593 - Flags: review?(nfroyd) → review+
100% reproducible, yes.  I was double-checking before landing and found windows wouldn't work at all.
Depends on: 1239825
On WinXP (only) it apparently hit a second set of WEBRTC_TRACE() macros while instantiating the TraceWindows object; they're commented out now.  I audited system_wrappers for any further trace calls it could hit, and there aren't any more.  (just 2 remain, in the posix impl of SetThreadPriority()).
Attachment #8708241 - Flags: review?(nfroyd)
Attachment #8683486 - Attachment is obsolete: true
Comment on attachment 8708241 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Review of attachment 8708241 [details] [diff] [review]:
-----------------------------------------------------------------

This is one of those patches where it would have been super-nice to have automatic interdiffs from MozReview...
Attachment #8708241 - Flags: review?(nfroyd) → review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6653283565a0
(I forced winXP tests this time - previous landing I hadn't forced them in Try)
Also I tested manually on a WinXP tester box remotely.
Attachment #8707593 - Attachment is obsolete: true
Attachment #8696261 - Attachment is obsolete: true
Seems risky to land into the last beta, bumping out to the next release.
After discussion with ritu and dveditz, we agreed to land in the next cycle and then uplift to 45.  It's very late in 44 beta.  I have a hopefully better patch ready to review in any case, so that's a plus.  (no risk of deadlocks on recursive GetStaticInstances, for example).
Comment on attachment 8702916 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Clearing the approvals for now so it doesn't show up on the needs-uplift radar.
Attachment #8702916 - Flags: approval-mozilla-beta+
Attachment #8702916 - Flags: approval-mozilla-aurora+
No pressure but it is the next release now, if you are (more) ready to land and uplift.
Flags: needinfo?(rjesup)
https://hg.mozilla.org/mozilla-central/rev/8ee641313088
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Group: media-core-security → core-security-release
Randell, could you fill the uplift requests? Thanks
Flags: needinfo?(rjesup)
Comment on attachment 8708241 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Note: already aurora/beta +'d by abillings -- see comment 27; we just had to defer checkin until after 44 had shipped.  I assumed it still held.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Very tough.  Provoking a thread-race in GetStaticInstance would not be easy.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Not really; though one could infer (knowing it landed on a hidden bug) that there's a race there somewhere.

Which older supported branches are affected by this flaw?
All.  

If not all supported branches, which bug introduced the flaw?
N/A - upstream bug

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Trivial backport.  Need to double-check that we don't waste memory for MB of trace buffers in the before-webrtc.org-43 revs (FF44 and earlier); I think we're good with singleton.h.  (c++11 static getInstance would waste memory in 44 and earlier).

How likely is this patch to cause regressions; how much testing does it need?  Unlikely to cause regressions; uses a widely-known singleton class and remove a bunch of custom code for doing that.
Flags: needinfo?(rjesup)
Attachment #8708241 - Flags: approval-mozilla-beta?
Attachment #8708241 - Flags: approval-mozilla-aurora?
Note: I may want to hold off for a few hours or a day on the uplift while we check if a minor tweak is needed when using Webrtc internal logging in debug builds.
Assignee: rjesup → pkerr
Comment on attachment 8717723 [details] [diff] [review]
Part2: Ensure close of webrtc trace file during shutdown

Created a StopWebRtcLog() call that closes any trace file opened either from the environment vars or using about:webrtc. I moved the EnableWebRtcLog() calls from VideoConduit.cpp and AudioConduit.cpp into the global init of PeerConnectionCtx. This makes the setup of logging from environment vars happen at the global singleton level in both MediaManager (where it already existed) and PeerConnection. Both these modules now call StopWebRtcLog() when shutting down. (Putting a StopWebRtcLog() call in VideoConduit.cpp or AudioConduit.cpp would have closed the log whenever a single session ended.)
Attachment #8717723 - Flags: review?(rjesup)
Comment on attachment 8717723 [details] [diff] [review]
Part2: Ensure close of webrtc trace file during shutdown

Review of attachment 8717723 [details] [diff] [review]:
-----------------------------------------------------------------

r- because of loss of logs from plain getUserMedia.  simply don't change AudioConduit of VideoConduit, and that should be r+, though try it!

::: media/webrtc/signaling/src/media-conduit/AudioConduit.cpp
@@ -256,5 @@
>      CSFLogError(logTag, "%s Unable to create voice engine", __FUNCTION__);
>      return kMediaConduitSessionNotInited;
>    }
>  
> -  EnableWebRtcLog();

Removing these means we can't collect logs for getUserMedia (without PeerConnection).  that's doable, but not preferable.  I think we can leave these in, as MediaManager will shut it down in an observer.

::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp
@@ -307,5 @@
>        }
>      }
>    }
>  
> -  EnableWebRtcLog();

ditto
Attachment #8717723 - Flags: review?(rjesup) → review-
There is already a call to EnableWebRtcLog() in MediaManager.cpp. It is called during the construction that is trigger by the first call to GetUserMedia. I have tested this with the GUM test and the log is created.

However, putting the enables back into the AudioConduit and VideoConduit will not hurt anything. I will go ahead if you feel it is needed.
Flags: needinfo?(rjesup)
So long as the logs are getting enabled for normal gUM, I'm good with it.  r+
Flags: needinfo?(rjesup)
Attachment #8717723 - Flags: review- → review+
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8717723 - Flags: approval-mozilla-beta?
Attachment #8717723 - Flags: approval-mozilla-aurora?
Waiting for this to land on m-c before uplift to aurora and beta.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b5e15ecccb6c
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Hi Randell, Paul, could you please nominate the patches for uplift to esr38? Thanks!
Flags: needinfo?(rjesup)
Flags: needinfo?(pkerr)
Comment on attachment 8708241 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Sec-high, taking it.
Should be in 45 beta 7
Attachment #8708241 - Flags: approval-mozilla-beta?
Attachment #8708241 - Flags: approval-mozilla-beta+
Attachment #8708241 - Flags: approval-mozilla-aurora?
Attachment #8708241 - Flags: approval-mozilla-aurora+
Comment on attachment 8717723 [details] [diff] [review]
Part2: Ensure close of webrtc trace file during shutdown

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: The LateWriteMonitor will be triggered when trace logging is enabled for the webrtc components due to using the singleton class to fix the security bug.
Fix Landed on Version: 47
Risk to taking this patch (and alternatives if risky): Very low risk: added log close calls in the same modules that open the logs.
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(pkerr)
Attachment #8717723 - Flags: approval-mozilla-esr38?
Comment on attachment 8708241 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: This is a sec bug fix to prevent race conditions during shutdown.
Fix Landed on Version: 47
Risk to taking this patch (and alternatives if risky): 
String or UUID changes made by this patch: none
Flags: needinfo?(rjesup)
Attachment #8708241 - Flags: approval-mozilla-esr38?
Attachment #8717723 - Flags: approval-mozilla-beta?
Attachment #8717723 - Flags: approval-mozilla-beta+
Attachment #8717723 - Flags: approval-mozilla-aurora?
Attachment #8717723 - Flags: approval-mozilla-aurora+
Comment on attachment 8708241 [details] [diff] [review]
switch GetStaticInstance to use IPC's Singleton<T> impl

Sec-high that should also be uplifted to ESR38.7
Attachment #8708241 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Attachment #8717723 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
part 2 has problems uplifting to esr38 

warning: conflicts while merging dom/media/MediaManager.cpp! (edit, then use 'hg resolve --mark')
Flags: needinfo?(pkerr)
Taking a look at where to add the StopWebRtcLog() call in the older version. The shutdown logic is different.
Flags: needinfo?(pkerr)
I have a patch the applies on my clone of the esr38 repo. How do you want to receive it? Attach another file to this bug?
Flags: needinfo?(cbook)
(In reply to Paul Kerr [:pkerr] from comment #65)
> I have a patch the applies on my clone of the esr38 repo. How do you want to
> receive it? Attach another file to this bug?

Hi Paul, yes I think that is the right thing to do. Wes, please correct me if I am wrong.
Yeah, attach the new patch here, we'll carry forward the a+ when we push it.
Flags: needinfo?(cbook)
ESR38 specific patch. Works with older MediaManager shutdown code.
Comment on attachment 8722161 [details] [diff] [review]
Part2 (esr38): Ensure close of webrtc trace file during shutdown

[Approval Request Comment]
Re-implementation of MediaManager portion of patch. See previous esr38 request.
Attachment #8722161 - Flags: approval-mozilla-esr38?
Comment on attachment 8722161 [details] [diff] [review]
Part2 (esr38): Ensure close of webrtc trace file during shutdown

Sec-high, taking it in esr38.7
Attachment #8722161 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
remote:   https://hg.mozilla.org/releases/mozilla-esr38/rev/8b0125a79e90
remote:   https://hg.mozilla.org/releases/mozilla-esr38/rev/6037b3d95c6b

Two of the tests part 1 modified don't exist on esr38, so I skipped those particular changes but left the rest of part 1 intact. Jesup says that's the right thing to do, so I hope he's right.
(In reply to Wes Kocher (:KWierso) from comment #71)
> Two of the tests part 1 modified don't exist on esr38, so I skipped those
> particular changes but left the rest of part 1 intact. Jesup says that's the
> right thing to do, so I hope he's right.

It is; both of those were added in ~mozilla-45
Will take a look.
Flags: needinfo?(pkerr)
Paul, I will gtb esr38.7 release candidate tomorrow (ideally it should have happened today noon PST). Could you please re-land an updated patch on esr38 branch asap? Thanks!
Flags: needinfo?(pkerr)
Flags: needinfo?(rjesup)
Whiteboard: [adv-main45+][adv-esr38.7+]
Whiteboard: [adv-main45+][adv-esr38.7+] → [adv-main45+][adv-esr38.7+][post-critsmash-triage]
Paul backed this out for Windows failures (that don't happen locally):
https://hg.mozilla.org/releases/mozilla-esr38/rev/f4220254d5bd
IMO, we should not block ESR38 on this patch.  This is a long-standing bug, with minimal actual risk:

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Very tough.  Provoking a thread-race in GetStaticInstance would not be easy.

In fact it would be very hard.  Perhaps not impossible, but very hard.
[Tracking Requested - why for this release]: Updating status from fixed to affected based on the backout commit in comment 79. Based on :Jesup's comment, I am putting this back in tracking-esr38 queue. We could either wontfix this for esr38.x or reconsider for esr38.8. The latter seems like a fine idea.
Alias: CVE-2016-1973
Whiteboard: [adv-main45+][adv-esr38.7+][post-critsmash-triage] → [adv-main45+][post-critsmash-triage]
ESR 38.8 did seem like a fine idea, but it's too late now.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: