Closed Bug 981239 Opened 10 years ago Closed 10 years ago

Accessing dead UnixSocketImpl in UnixSocket.cpp:275

Categories

(Core :: DOM: Device Interfaces, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31
blocking-b2g 1.3+
Tracking Status
firefox29 --- unaffected
firefox30 + fixed
firefox31 + fixed
firefox-esr24 --- unaffected
b2g18 --- wontfix
b2g-v1.1hd --- wontfix
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: gwagner, Assigned: tzimmermann)

References

Details

(Keywords: csectype-uaf, regression, sec-critical)

Attachments

(1 file, 2 obsolete files)

Seen on nexus 4 with debug gecko on todays trunk build after testing NFC:

Program received signal SIGSEGV, Segmentation fault.
0xb45fed34 in mozilla::ipc::SocketReceiveRunnable::Run (this=0xa9050560) at ../../../ipc/unixsocket/UnixSocket.cpp:275
275	    impl->mConsumer->ReceiveSocketData(mRawData);
(gdb) bt
#0  0xb45fed34 in mozilla::ipc::SocketReceiveRunnable::Run (this=0xa9050560) at ../../../ipc/unixsocket/UnixSocket.cpp:275
#1  0xb4374b6a in ProcessNextEvent (result=0xbee5f807, mayWait=false, this=0xb6a34680) at ../../../xpcom/threads/nsThread.cpp:643
#2  nsThread::ProcessNextEvent (this=0xb6a34680, mayWait=<optimized out>, result=0xbee5f807) at ../../../xpcom/threads/nsThread.cpp:567
#3  0xb432ec70 in NS_ProcessNextEvent (thread=0xb6a34680, mayWait=<optimized out>) at ../../../xpcom/glue/nsThreadUtils.cpp:263
#4  0xb45130b8 in mozilla::ipc::MessagePump::Run (this=0xb6a01df0, aDelegate=0xb6a4e1a0) at ../../../ipc/glue/MessagePump.cpp:95
#5  0xb45007f2 in MessageLoop::RunInternal (this=0xb6a4e1a0) at ../../../ipc/chromium/src/base/message_loop.cc:226
#6  0xb450080a in RunHandler (this=0xb6a4e1a0) at ../../../ipc/chromium/src/base/message_loop.cc:219
#7  MessageLoop::Run (this=0xb6a4e1a0) at ../../../ipc/chromium/src/base/message_loop.cc:193
#8  0xb4a7c9aa in nsBaseAppShell::Run (this=0xb1f3a880) at ../../../widget/xpwidgets/nsBaseAppShell.cpp:164
#9  0xb5330306 in nsAppStartup::Run (this=0xb2144c70) at ../../../../toolkit/components/startup/nsAppStartup.cpp:276
#10 0xb530b610 in XREMain::XRE_mainRun (this=0xbee5f98c) at ../../../toolkit/xre/nsAppRunner.cpp:4008
#11 0xb530b806 in XREMain::XRE_main (this=0xbee5f98c, argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>) at ../../../toolkit/xre/nsAppRunner.cpp:4075
#12 0xb530b95c in XRE_main (argc=1, argv=0xbee61b44, aAppData=0x26920, aFlags=<optimized out>) at ../../../toolkit/xre/nsAppRunner.cpp:4285
#13 0x0000a4b4 in do_main (argv=0xbee61b44, argc=1) at ../../../b2g/app/nsBrowserApp.cpp:163
#14 main (argc=<optimized out>, argv=<optimized out>) at ../../../b2g/app/nsBrowserApp.cpp:256
(gdb) p mRawData
$1 = {mRawPtr = 0xa9089180}
(gdb) p *mRawData
Could not find operator*.
(gdb) p imps
No symbol "imps" in current context.
(gdb) p impl
$2 = (mozilla::ipc::UnixSocketImpl *) 0xb14ac7c0
(gdb) p *impl
$3 = {<mozilla::ipc::UnixSocketWatcher> = {<mozilla::ipc::UnixFdWatcher> = {<base::MessagePumpLibevent::Watcher> = {_vptr.Watcher = 0x5a5a5a5a}, mIOLoop = 0x5a5a5a5a, mFd = {value = 1515870810, 
        _mCheckNotUsedAsTemporary = {statementDone = 90}}, mReadWatcher = {is_persistent_ = 90, event_ = 0x5a5a5a5a}, mWriteWatcher = {is_persistent_ = 90, event_ = 0x5a5a5a5a}}, 
    mConnectionStatus = 1515870810}, mConsumer = {ptr = 0x5a5a5a5a}, 
  mOutgoingQ = {<nsTArray_Impl<mozilla::ipc::UnixSocketRawData*, nsTArrayInfallibleAllocator>> = {<nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>> = {
        mHdr = 0x5a5a5a5a}, <nsTArray_TypedBase<mozilla::ipc::UnixSocketRawData*, nsTArray_Impl<mozilla::ipc::UnixSocketRawData*, nsTArrayInfallibleAllocator> >> = {<nsTArray_SafeElementAtHelper<mozilla::ipc::UnixSocketRawData*, nsTArray_Impl<mozilla::ipc::UnixSocketRawData*, nsTArrayInfallibleAllocator> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, mConnector = {
    mRawPtr = 0x5a5a5a5a}, mShuttingDownOnIOThread = false, mAddress = {<nsACString_internal> = {mData = 0x0, mLength = 2974468124, mFlags = 2974468124}, <No data fields>}, mAddrSize = -1320499172, mAddr = {
    storage = {ss_family = 51228, 
      __data = "J\261", '\000' <repeats 20 times>"\245, \245\245\245", '\000' <repeats 20 times>"\340, \257\315\260\000\000\000\000\000\000\000\000\377\377\377\377\000\000\031\000\245\245\245\245\245\245\245\245\200\261!\262", '\000' <repeats 24 times>"\340, \305J\261\377\377\377\377\000\000\000\000\245\245\245\245\245\245\245\245`\017#\262"}, un = {sun_family = 51228, 
      sun_path = "J\261", '\000' <repeats 20 times>"\245, \245\245\245", '\000' <repeats 20 times>"\340, \257\315\260\000\000\000\000\000\000\000\000\377\377\377\377\000\000\031\000\245\245\245\245\245\245\245\245\200\261!\262", '\000' <repeats 24 times>"\340, \305J\261\377\377"}, in = {sin_family = 51228, sin_port = 45386, sin_addr = {s_addr = 0}, __pad = "\000\000\000\000\000\000\000"}, in6 = {
      sin6_family = 51228, sin6_port = 45386, sin6_flowinfo = 0, sin6_addr = {in6_u = {u6_addr8 = '\000' <repeats 15 times>, u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, u6_addr32 = {0, 0, 0, 0}}}, 
      sin6_scope_id = 2779096485}}, mDelayedConnectTask = 0xb2230f60}
Looks like this came in with Bug 977672
So it's only in 1.4 then.
blocking-b2g: --- → 1.4+
Blocks: 977672
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → ARM
Assignee: nobody → tzimmermann
Hi Gregor,

do you have an STR? How often does this problem happen?
Status: NEW → ASSIGNED
Attached patch 01_close_fd.patch (obsolete) — Splinter Review
Hi Gregor,

Could you apply this patch during further tests of NFC. It doesn't fix the problem, but hopefully makes it more visible. See the next paragraphs for what might happen here. With the patch attached, the second assertion in |SocketCanReceiveWithoutBlocking| should fail (i.e., incorrect connection status).

In comment 0, |UnixSocketImpl| contains lots of nonsense, such as '0x5a5a5a5a', in its pointers. This indicates that it was already deleted, but was valid at some point.

Usually, |UnixSocketConsumer::Close| posts a |ShutdownSocketTask| to the I/O thread. This task removes the fd watchers from the I/O loop and we don't receive any data from the socket. Afterwards the instance of |UnixSocketImpl| gets deleted on the main thread.

Posting tasks to the I/O loop works by making a file descriptor read-able. I could imagine that the 'post-task' fd and the socket fd, became readable in the same call to |poll|. If the poll-task fd got handled first, the |ShutdownSocketTask| would have removed the watchers and dispatched a |DeleteInstanceRunnable| to the main thread for deleting the instance of |UnixSocketImpl|. The I/O thread could still read from the socket fd until |UnixSocketImpl| get's deleted on the main thread. The resulting |SocketReceiveRunnable| would use invalid data.
Attachment #8388443 - Flags: feedback?(anygregor)
Hm not sure what the correct STR here is. I see multiple problems :(
I can also reproduce one crash if I don't touch NFC at all and just toggle bluetooth.
I really need more information. You're using the Nexus 4, right? Toggling BT on the Nexus 4 works fine for me. Do you have a device paired? Are you transferring data?
Flags: needinfo?(anygregor)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #6)
> I really need more information. You're using the Nexus 4, right? Toggling BT
> on the Nexus 4 works fine for me. Do you have a device paired? Are you
> transferring data?

This happened on nexus4 right after a reboot when I toggled Bluetooth a few times. Nothing is paired and no data transfer is going on. It's intermittent and I got it maybe 1 out of 10 times.
I guess the key here is to trigger a GC. Just by clicking on the bluetooth quick setting you don't trigger a GC and the intermittent part comes from some other timer that triggers a GC.
Flags: needinfo?(anygregor)
The Bluetooth issue seems to be something different. I opened bug 983590 for this. I still had no luck with reproducing the I/O crash.
It is very intermittent. I was debugging another crash today and hit it twice during startup.
STR is basically have a clean profile (after FTU) and start the phone.
Gregor, Thomas, what's the next step on this sec-critical bug?
Hi Johnny,

I'd love to help, but I'm unable to reproduce the problem ATM. And no one else seems to have reported a similar issue. Gregor, could you please attach your .userconfig? Do you have some feedback on the attached patch's changes? Thanks.

BTW, can we drop the sec-critical flag? I don't see how this bug is different from any other crash.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #11)

> BTW, can we drop the sec-critical flag? I don't see how this bug is
> different from any other crash.

Is this not a security issue? If so, we need *some* rating on it.
Looks like this is a crash due to us calling functions on deleted objects which is likely exploitable, I think the security rating stands.

Gregor, seems like nobody other than yourself can reproduce this :( Can you help dig in here, or help others reproduce so we can move forward here?
Flags: needinfo?(anygregor)
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #13)
> Looks like this is a crash due to us calling functions on deleted objects
> which is likely exploitable, I think the security rating stands.
> 
> Gregor, seems like nobody other than yourself can reproduce this :( Can you
> help dig in here, or help others reproduce so we can move forward here?

Sorry I didn't see the earlier comment. I am quite busy with other stuff right now so I can't invest much time here.
Basically what I do is running a debug gecko on my nexus 4 device and this happens 'sometimes' maybe 1 out of 10 times during startup :(
If nobody else can reproduce we shouldn't block on it.
blocking-b2g: 1.4+ → 1.4?
Flags: needinfo?(anygregor)
We need security weigh in here for rating. backlog for now.
blocking-b2g: 1.4? → backlog
Flags: needinfo?(ptheriault)
(In reply to Preeti Raghunath(:Preeti) from comment #15)
> We need security weigh in here for rating. backlog for now.

JST evaluation sounded accurate to me. Leaving as sec-critical until proven otherwise.
blocking-b2g: backlog → 1.4?
Flags: needinfo?(ptheriault)
Comment on attachment 8388443 [details] [diff] [review]
01_close_fd.patch

Landing this in bug 990984.
Attachment #8388443 - Attachment is obsolete: true
Attachment #8388443 - Flags: feedback?(anygregor)
I think this needs to remain as a non-blocker until someone able to diagnose this. This doesn't seem to be actionable right now.
blocking-b2g: 1.4? → backlog
I can reproduce reliably \o/

Start the phone in GDB,
Press power button every second for a ca. 1min
Unlock the lockscreen.

Sometimes I hit this assertion during pressing the power button, sometimes during the unlock.
Found during debugging bug 980559. But now I am more likely to hit this bug rather than the other blocker sigh :(
This last STR finally allowed me to reproduce the bug, although not very reliably.

I've found that the |UnixSocketConsumer| gets destroyed by |SocketSendTask|'s destructor. |SocketSendTask| acquires the respective reference on the main thread and releases the reference on the I/O thread. There are no unrefs or cleanups of the class, so I strongly suspect that the bug is caused by non-thread-safe ref-counting. The attached patch fixes this.

Gregor, the bug is still very hard to reproduce for me, so I need your feedback if the patch works for you.
Attachment #8401254 - Flags: review?(kyle)
Attachment #8401254 - Flags: feedback?(anygregor)
Another bug that would have been caught if we weren't using this MFBT crap :(
Comment on attachment 8401254 [details] [diff] [review]
[01] Bug 977672: Use atomic ref-counting for |UnixSocketConsumer|

Review of attachment 8401254 [details] [diff] [review]:
-----------------------------------------------------------------

\o/ Its not crashing any more after some basic testing.
Attachment #8401254 - Flags: feedback?(anygregor) → feedback+
blocking-b2g: backlog → 1.4+
blocking-b2g: 1.4+ → 1.3?
blocking-b2g: 1.3? → 1.3+
According to comment 1, 1.3 is not affected.
Ah - fixing the blocking flag then.
blocking-b2g: 1.3+ → 1.4+
(In reply to Gregor Wagner [:gwagner] from comment #24)
> According to comment 1, 1.3 is not affected.

Comment 1 is not correct. The bug is in 1.3 [1] and has probably been there since 1.0.

[1] http://hg.mozilla.org/releases/mozilla-b2g28_v1_3/file/fba2e7d69356/ipc/unixsocket/UnixSocket.h#l160
blocking-b2g: 1.4+ → 1.3?
blocking-b2g: 1.3? → 1.3+
Updates since v1:

  - fixed bug number in patch description
  - fixed assert to test for !mImpl
Attachment #8401254 - Attachment is obsolete: true
Attachment #8401254 - Flags: review?(kyle)
Attachment #8402635 - Flags: review?(kyle)
Blocks: 992922
No longer blocks: 992922
Attachment #8402635 - Flags: review?(kyle) → review+
No longer going to block on Mozilla-specific 1.3 blockers unless it hits a special exception list, so moving to backlog.
blocking-b2g: 1.3+ → backlog
landed on central as https://hg.mozilla.org/mozilla-central/rev/f663d2714032
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
blocking-b2g: backlog → 1.3+
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: