Closed Bug 845148 Opened 11 years ago Closed 11 years ago

Valgrind warning about use after free in SocketAcceptTask

Categories

(Core :: DOM: Device Interfaces, defect)

18 Branch
ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

VERIFIED FIXED
B2G C4 (2jan on)
blocking-b2g tef+
Tracking Status
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- wontfix
firefox-esr17 --- wontfix
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: bent.mozilla, Assigned: mrbkap)

References

Details

(Keywords: csectype-uaf, sec-high)

Attachments

(2 files, 4 obsolete files)

==228== Invalid write of size 4
==228==    at 0x5BFC8E4: mozilla::ipc::SocketAcceptTask::Run() (in /system/b2g/libxul.so)
==228==    by 0x5C47FA1: MessageLoop::RunTask(Task*) (in /system/b2g/libxul.so)
==228==    by 0x5C48F9F: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) (in /system/b2g/libxul.so)
==228==    by 0x5C49017: MessageLoop::DoDelayedWork(base::TimeTicks*) (in /system/b2g/libxul.so)
==228==    by 0x5C5BB7D: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (in /system/b2g/libxul.so)
==228==    by 0x5C47F3D: MessageLoop::RunInternal() (in /system/b2g/libxul.so)
==228==    by 0x5C4801D: MessageLoop::Run() (in /system/b2g/libxul.so)
==228==    by 0x5C5104D: base::Thread::ThreadMain() (in /system/b2g/libxul.so)
==228==    by 0x5C5C065: ThreadFunc(void*) (in /system/b2g/libxul.so)
==228==    by 0x483AE17: __thread_entry (in /system/lib/libc.so)
==228==  Address 0xbae2bc8 is 48 bytes inside a block of size 84 free'd
==228==    at 0x48062B0: free (vg_replace_malloc.c:446)
==228==    by 0x60E3DA7: moz_free (in /system/b2g/libxul.so)
==228==    by 0x5BFCAB3: mozilla::ipc::UnixSocketImpl::~UnixSocketImpl() (in /system/b2g/libxul.so)
==228==    by 0x5BFBE79: mozilla::ipc::DeleteInstanceRunnable<mozilla::ipc::UnixSocketImpl>::Run() (in /system/b2g/libxul.so)
==228==    by 0x5C235DF: nsThread::ProcessNextEvent(bool, bool*) (in /system/b2g/libxul.so)
==228==    by 0x5C00A27: NS_ProcessNextEvent_P(nsIThread*, bool) (in /system/b2g/libxul.so)
==228==    by 0x5B05EE3: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (in /system/b2g/libxul.so)
==228==    by 0x5C47F3D: MessageLoop::RunInternal() (in /system/b2g/libxul.so)
==228==    by 0x5C4801D: MessageLoop::Run() (in /system/b2g/libxul.so)
==228==    by 0x5A81337: nsBaseAppShell::Run() (in /system/b2g/libxul.so)
==228==    by 0x59D5A91: nsAppStartup::Run() (in /system/b2g/libxul.so)
==228==    by 0x5381871: XREMain::XRE_mainRun() (in /system/b2g/libxul.so)

Looks like there is a Cancel method on SocketAcceptTask and SocketConnectTask but mImpl is used before checking mCanceled in Run(). Some of these other task classes hold mImpl members and are not cancelable, so I don't know how those will hold up either.

This may be the cause of bug 832385 since the first thing this Run() method does is set a random pointer to null.
(In reply to ben turner [:bent] from comment #0)
> ==228==  Address 0xbae2bc8 is 48 bytes inside a block of size 84 free'd

FWIW this is exactly the offset of the first word of mResultVal in IDBRequest.
Blocking since this results in random behaviour that could go as far as exploitable crash.
blocking-b2g: tef? → tef+
Assignee: nobody → bent.mozilla
Assignee: bent.mozilla → kyle
The problem is that in UnixSocket.cpp, we keep multiple bare pointers to UnixSocketImpl instances in tasks that run asynchronously on the IO Thread. The implementation pointer can be deleted before the task is run, and we have a good way to check this at the moment.
Attached patch patch v1 (obsolete) — Splinter Review
NOTE: This patch is against the b2g18 branch.

I haven't had a chance to fully test this patch, but wanted to get it in the bug. It attempts to fully clean up the destruction path for UnixSocketImpls. The basic idea is this:

We introduce two state bits: mShuttingDownMainThread and mShuttingDownIOThread. When CloseSocket is called (on the main thread), we set mShuttingDownMain to true, send an event to the IO thread and start enforcing that no more events are sent from the main thread to that Impl (for completeness, we also null out mImpl). Once the event runs on the IO thread, we clean up our state to stop receiving "data ready" events and set a bit that means that no more Accept calls are re-queued. From this point on, the Impl will be untouched on the IO thread. We then proxy the delete back to the main thread.

It's worth noting that this approach is similar to what happened on trunk, though that uses a lock (though it only locks around the get and set of a boolean, so might as well not) and this approach is a little more paranoid.
Assignee: kyle → mrbkap
Status: NEW → ASSIGNED
Attachment #718798 - Flags: review?(kyle)
Attachment #718798 - Flags: review?(bent.mozilla)
Oh, I realized after writing my comment that mShuttingDownMainThread is now redundant with mConsumer being non-null, so I removed it in favor of that.
Comment on attachment 718798 [details] [diff] [review]
patch v1

Review of attachment 718798 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!

::: ipc/unixsocket/UnixSocket.cpp
@@ +419,5 @@
>    SocketAcceptTask(UnixSocketImpl* aImpl) : mCanceled(false), mImpl(aImpl) { }
> +
> +  virtual void Cancel()
> +  {
> +    mCanceled = true;

This should only happen on the IO thread, right? Let's assert (I thought we did already...).
Attachment #718798 - Flags: review?(bent.mozilla) → review+
Is this a sec-critical rated issue? Could this be triggered by web content?
(In reply to Al Billings [:abillings] from comment #7)
> Is this a sec-critical rated issue? Could this be triggered by web content?

Well, on B2G, all one has to do is disable bluetooth from the settings app. I don't know how we rate that kind of thing...
Keywords: sec-high
Keywords: csec-uaf
Attached patch patch v2 (obsolete) — Splinter Review
I'm sorry, I couldn't refrain from fixing Read... This gets most of the way there as far as my limited testing can tell. Still missing is a patch to make sure all callers of SendSocketData come from the main thread.
Attachment #718798 - Attachment is obsolete: true
Attachment #718798 - Flags: review?(kyle)
Attachment #719243 - Flags: review?(kyle)
Attachment #719243 - Flags: review?(bent.mozilla)
Attachment #719243 - Flags: review?(kyle) → review+
Are you moving the changed to BluetoothOppManager to another bug, or was that supposed to be in this one?
Attached patch patch v2.1 (obsolete) — Splinter Review
The last patch was overeager in removing the calls to StopWatchingFileDescriptor in the "socket failed" case. I assumed that the socket failing would prevent any more events coming in on it, but that is not the case. This adds them back. Testcase: Send a file. Try to open an application. Note no application opens.
Attachment #719243 - Attachment is obsolete: true
Attachment #719243 - Flags: review?(bent.mozilla)
Attachment #719286 - Flags: review?(kyle)
Attachment #719286 - Flags: review?(bent.mozilla)
Attached patch Fix for the OPP manager (obsolete) — Splinter Review
The OPP manager was playing fast and loose with threads: calling into the UnixSocket code from a third thread! Properly synchronizing a third thread would have forced us into the land of locks and even more difficult problems than the ones that we're facing in this bug. Fortunately, the third thread isn't even necessary: we can use the IO thread, meaning that all the synchronization is done for us. One note: it'd be really good for someone to double check that canceling a file transfer can't result in us sending data after we call CloseSocket. I didn't have time to check it fully, but there are MOZ_ASSERTs thrown in there.
Attachment #719287 - Flags: review?(kyle)
Attachment #719287 - Flags: review?(bent.mozilla)
Comment on attachment 719287 [details] [diff] [review]
Fix for the OPP manager

Bringing in echou on the OPP manager review since he's the expert on that particular subject. :)
Attachment #719287 - Flags: review?(echou)
Attachment #719286 - Flags: review?(kyle) → review+
Attachment #719287 - Flags: review?(kyle) → review+
This patch represents a really scary change right on a deadline. I've been testing sending and receiving files, but I don't have a bluetooth headset to test and I'm sure that I haven't stress tested things like error conditions enough. Can other people help out here?
Keywords: qawanted
(fwiw: if this lands 2/29 I honestly doubt anybody outside of moz will notice, except for maybe me and I'm over it)
To be clear, the result of this bug is that turning bluetooth off and on a few times in a row causes the parent process to crash. I'm not going to try to assign a severity to that.
Blocks: 845990
+Matt

I'll go hunt for a QA owner.
Comment on attachment 719286 [details] [diff] [review]
patch v2.1

Review of attachment 719286 [details] [diff] [review]:
-----------------------------------------------------------------

r=me.

I'm running this through valgrind as we speak.

::: ipc/unixsocket/UnixSocket.cpp
@@ +406,5 @@
> +  SocketAcceptTask(UnixSocketImpl* aImpl) : mImpl(aImpl) { }
> +
> +  virtual void Cancel()
> +  {
> +    mImpl = nullptr;

MOZ_ASSERT(!NS_IsMainThread()) here since this is only ever supposed to be called on IO thread.

@@ +675,2 @@
>          }
> +        else if (errno == EAGAIN || errno == EWOULDBLOCK) {

Nit: else-after-continue :)

@@ +687,5 @@
> +      // cause us to end up back here.
> +      mReadWatcher.StopWatchingFileDescriptor();
> +      mWriteWatcher.StopWatchingFileDescriptor();
> +
> +      nsRefPtr<SocketCloseTask> t = new SocketCloseTask(this);

SocketCloseTask assumes that it has a mConsumer... Do you need to check that? You added a |if (!mConsumer)| check elsewhere here...
Attachment #719286 - Flags: review?(bent.mozilla) → review+
Comment on attachment 719287 [details] [diff] [review]
Fix for the OPP manager

Review of attachment 719287 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/unixsocket/UnixSocket.cpp
@@ +611,5 @@
> +{
> +  MOZ_ASSERT(!NS_IsMainThread());
> +  MOZ_ASSERT(mImpl && !mImpl->IsShutdownOnIOThread());
> +
> +  mImpl->QueueWriteData(aData);

I don't know how we can guarantee that this is safe. Isn't mImpl set (and unset) on the main thread?
(Greg to cc: since he's tracking bug 845990 from over here)
Comment on attachment 719287 [details] [diff] [review]
Fix for the OPP manager

bent and I have a plan for how to fix this.
Attachment #719287 - Attachment is obsolete: true
Attachment #719287 - Flags: review?(echou)
Attachment #719287 - Flags: review?(bent.mozilla)
(In reply to ben turner [:bent] from comment #19)\
> SocketCloseTask assumes that it has a mConsumer... Do you need to check
> that? You added a |if (!mConsumer)| check elsewhere here...

It has a check in the !IsShutdownOnMainThread call.
Attached patch patch v2.2Splinter Review
Attachment #719286 - Attachment is obsolete: true
Attachment #719691 - Flags: review?(bent.mozilla)
This no longer races to get mImpl from the OppManager. It also no longer races to set mLastCommand off the main thread.

The relationship between the classes here makes hacking the code here really weird.
Attachment #719694 - Flags: review?(echou)
Attachment #719694 - Flags: review?(bent.mozilla)
Attachment #719691 - Flags: review?(bent.mozilla) → review+
Comment on attachment 719694 [details] [diff] [review]
OPP manager patch v2

Review of attachment 719694 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bluetooth/BluetoothOppManager.cpp
@@ +124,5 @@
>  public:
> +  ReadFileTask(UnixSocketImpl* aImpl,
> +               nsIInputStream* aInputStream,
> +               uint32_t aRemoteMaxPacketSize)
> +    : mImpl(aImpl),

Assert mImpl

@@ +997,5 @@
> +  // responses come back from the socket.
> +  nsRefPtr<SetLastCommandRunnable> runnable(
> +      new SetLastCommandRunnable(ObexRequestCode::Put));
> +  nsresult rv = NS_DispatchToMainThread(runnable);
> +  NS_ENSURE_SUCCESS_VOID(rv);

This will leak req, let's put that in an autoptr

::: dom/bluetooth/BluetoothOppManager.h
@@ +73,5 @@
>    bool ExtractBlobHeaders();
>  
> +  void SetLastCommand(int aCommand)
> +  {
> +    mLastCommand = aCommand;

This needs a main thread assertion, maybe just move to the cpp.
Attachment #719694 - Flags: review?(bent.mozilla) → review+
This needs to be checked into Mozilla-B2g18_v1_0_1 only (I think!)
Keywords: checkin-needed
Oh, let's clone and have a new one for trunk after we figure out what's going on in bug 830290.
This was a branch-only fix. Let's track bug 846615 for anything that needs to happen on trunk (starting with me actually verifying that there is a real problem on trunk).
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
with one other tester, we ran a quick testpass against 03-01 nightly:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/69e249bf251c
Gaia   f46906e594b613571bfcc8f146d60bffd42e5a5b
BuildID 20130301070202
Version 18.0

partial results of the testrun are here: https://moztrap.mozilla.org/runtests/run/898/env/305/

summary:

accepting phonecall via bt headset - OK
rejecting phonecall via bt headset  - OK
pairing bt device - ok
disconnecting bt device - OK
searching for devices when BT switch on  - OK

however we saw issues where bt file tranfer.
1.) in one case FFOS on the device crashed while trying to send a file from unagi (happened once);
2.) not able to establish connection between macbook air and unagi device to send a file from unagi to laptop (laptop says "failed to establish connection')

3.) in some cases though, after a repair, we can send files to laptop to unagi
 Note that we'll be investigating this more when we have more time but this is the case for now.
I'll wait for the followup bugs to be filed for the items in comment 32, then we can remove qawanted on the bug.
Target Milestone: --- → B2G C4 (2jan on)
John, have you written up the follow up bugs for this yet?  If so, can you please list them?
Flags: needinfo?(jhammink)
Hi,

Here are a few bugs currently related to this issue:

https://bugzilla.mozilla.org/show_bug.cgi?id=847963 -  sometimes cannot transfer files to a mac

https://bugzilla.mozilla.org/show_bug.cgi?id=823803 -  [Open_][bluetooth] Bluetooth file which The test machine to accept not be able to view, when accept completed-

To be clear, I want to try and test the above scenario on the latest nightly, to see where things are.
Flags: needinfo?(jhammink)
Tested BT File transfer, and and BT Headset (answering call) on the following builds.

accepting phonecall via bt headset - OK
terminating phonecall via bt headset - OK
pairing bt device - ok
disconnecting bt device - OK
searching for devices when BT switch on  - OK


Tested on both 1.0.1. and 1.1 channels.
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/99680a32f297
Gaia   de3e5b9205e6cb1a6bd0858a98d159272ad96d11
BuildID 20130312070202
Version 18.0


Also tested on
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/cdf058e47197
Gaia   1a6e1cd7715a5192b32db4b7127c5ae5b8162a7a
BuildID 20130312070203
Version 18.0
Resolution: FIXED → WORKSFORME
Keywords: qawanted
Thanks John.
Resolution: WORKSFORME → FIXED
Status: RESOLVED → VERIFIED
Comment on attachment 719694 [details] [diff] [review]
OPP manager patch v2

Review of attachment 719694 [details] [diff] [review]:
-----------------------------------------------------------------

Remove review request since it has been resolved.
Attachment #719694 - Flags: review?(echou)
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: