Signaling unittest crashes on Linux in nr_socket_getfd

RESOLVED WORKSFORME

Status

()

Core
WebRTC: Networking
RESOLVED WORKSFORME
4 years ago
24 days ago

People

(Reporter: ehugg, Assigned: ekr)

Tracking

({sec-critical})

Trunk
x86_64
Linux
sec-critical
Points:
---

Firefox Tracking Flags

(firefox25 ?, firefox26 ?, firefox27 affected, firefox-esr17 unaffected, firefox-esr24 ?, b2g18 unaffected)

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
Signaling unittest is solid on OSX, but crashes consistently on Linux (Ubuntu 64).  Crash is not always on the same test, but the stack is always similar to this:

#0  0x00000000012b20ac in nr_socket_getfd (sock=0x7ffeb38a0f01, fd=
    0x7fffe58fc610)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/net/nr_socket.c:94
#1  0x00000000012b1a7a in nr_ice_socket_close (isock=0x7ffebd6e18ec)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_socket.c:257
#2  0x00000000012b1980 in nr_ice_socket_destroy (isockp=0x7fffe58fc680)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_socket.c:228
#3  0x00000000012a6eac in nr_ice_component_destroy (componentp=0x7fffe58fc6f0)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_component.c:157
#4  0x00000000012ac273 in nr_ice_media_stream_destroy (streamp=0x7fffe58fc750)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c:101
#5  0x00000000012ab1b5 in nr_ice_ctx_destroy_cb (s=0x0, how=0, cb_arg=
    0x7ffedbba960c)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:415
#6  0x000000000066e408 in mozilla::nrappkitTimerCallback::Notify (this=
    0x7ffeb09d89c0, timer=0x7ffeb09ce8c0)
    at /home/ehugg/mozilla/mozilla-central/media/mtransport/nr_timer.cpp:99
#7  0x00007ffff37884c4 in nsTimerImpl::Fire (this=0x7ffeb09ce8c0)
    at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsTimerImpl.cpp:549
#8  0x00007ffff378883f in nsTimerEvent::Run (this=0x7fffe4f89200)
    at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsTimerImpl.cpp:630
#9  0x00007ffff37804ba in nsThread::ProcessNextEvent (this=0x7fffe771b120, 
    mayWait=true, result=0x7fffe58fc9ef)
    at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:622
#10 0x00007ffff3700cbe in NS_ProcessNextEvent (thread=0x7fffe771b120, mayWait=
    true) at /home/ehugg/mozilla/mozilla-central/xpcom/glue/nsThreadUtils.cpp:238
#11 0x00007ffff12ed38c in nsSocketTransportService::Run (this=0x7fffe69c0220)
    at /home/ehugg/mozilla/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:688
#12 0x00007ffff37804ba in nsThread::ProcessNextEvent (this=0x7fffe771b120, 
    mayWait=true, result=0x7fffe58fcb5f)
    at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:622
#13 0x00007ffff3700cbe in NS_ProcessNextEvent (thread=0x7fffe771b120, mayWait=
    true) at /home/ehugg/mozilla/mozilla-central/xpcom/glue/nsThreadUtils.cpp:238
#14 0x00007ffff377f336 in nsThread::ThreadFunc (arg=0x7fffe771b120)
    at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:250
#15 0x00007ffff00abd91 in _pt_root (arg=0x7fffe77158c0)
    at /home/ehugg/mozilla/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:204
#16 0x00007ffff7bc4e9a in start_thread (arg=0x7fffe58fd700)
    at pthread_create.c:308
#17 0x00007fffeda99ccd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Updated

4 years ago
Group: media-core-security
(Reporter)

Comment 1

4 years ago
Just got it to happen under the debugger in OSX as well.  Each time so far I've seen a sock pointer that looks normal but either points to a bunch of 0x5a or some garbage bytes.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
[Switching to process 3535 thread 0x2413]
0x0000000100e35598 in nr_socket_getfd (sock=0x116ac7c01, fd=0x1122035c8) at nr_socket.c:94
94	    return sock->vtbl->getfd(sock->obj, fd);
(gdb) bt
#0  0x0000000100e35598 in nr_socket_getfd (sock=0x116ac7c01, fd=0x1122035c8) at nr_socket.c:94
#1  0x0000000100e34d67 in nr_ice_socket_close (isock=0x11423498c) at ice_socket.c:257
#2  0x0000000100e34c06 in nr_ice_socket_destroy (isockp=0x1122036b0) at ice_socket.c:228
#3  0x0000000100e27f4d in nr_ice_component_destroy (componentp=0x112203720) at ice_component.c:157
#4  0x0000000100e2e24e in nr_ice_media_stream_destroy (streamp=0x112203798) at ice_media_stream.c:101
#5  0x0000000100e2cea0 in nr_ice_ctx_destroy_cb (s=0x0, how=0, cb_arg=0x11c6eca0c) at ice_ctx.c:415
#6  0x00000001000560c9 in mozilla::nrappkitTimerCallback::Notify (this=0x118bc1f80, timer=0x11172eb20) at /Users/ehugg/mozilla/work/media/mtransport/nr_timer.cpp:99
#7  0x00000001067c9b21 in nsTimerImpl::Fire (this=0x11172eb20) at /Users/ehugg/mozilla/work/xpcom/threads/nsTimerImpl.cpp:549
#8  0x00000001067c9f11 in nsTimerEvent::Run (this=0x113014380) at /Users/ehugg/mozilla/work/xpcom/threads/nsTimerImpl.cpp:630
#9  0x00000001067bf8c3 in nsThread::ProcessNextEvent (this=0x11174aa10, mayWait=true, result=0x112203b2e) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:622
#10 0x0000000106716cd9 in NS_ProcessNextEvent (thread=0x11174aa10, mayWait=true) at nsThreadUtils.cpp:238
#11 0x00000001039deb60 in nsSocketTransportService::Run (this=0x112785310) at /Users/ehugg/mozilla/work/netwerk/base/src/nsSocketTransportService2.cpp:688
#12 0x00000001039df55c in non-virtual thunk to nsSocketTransportService::Run() (this=0x112785328) at /Users/ehugg/mozilla/work/netwerk/base/src/nsSocketTransportService2.cpp:725
#13 0x00000001067bf8c3 in nsThread::ProcessNextEvent (this=0x11174aa10, mayWait=true, result=0x112203dde) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:622
#14 0x0000000106716cd9 in NS_ProcessNextEvent (thread=0x11174aa10, mayWait=true) at nsThreadUtils.cpp:238
#15 0x00000001067be237 in nsThread::ThreadFunc (arg=0x11174aa10) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:250
Could this be related to bug 922068?
Sounds sort of like a use-after-free so marking critical for now.  Feel free to adjust as needed.
Group: media-core-security → core-security
Keywords: sec-critical
(Reporter)

Comment 4

4 years ago
>Could this be related to bug 922068?

Probably, but I cannot say for sure.  I think this one also appeared with the trickle ICE patch and is probably also a sync bug.  I've had this crash twenty times or so and it always has getfd at the top of the stack and never the snprintf from 922068.  If a patch appears for that bug I can re-test and see if it fixes this one as well.
status-b2g18: --- → unaffected
status-firefox25: --- → ?
status-firefox26: --- → ?
status-firefox27: --- → affected
status-firefox-esr17: --- → unaffected
status-firefox-esr24: --- → ?
tracking-firefox27: --- → +
Jesup - assigning to you for starters, can you investigate further here or find an assignee?
Assignee: nobody → rjesup
(Assignee)

Comment 6

4 years ago
This is my code, Iw will investigate.
Assignee: rjesup → ekr
(Assignee)

Comment 7

4 years ago
Created attachment 815192 [details] [diff] [review]
WIP Fix for CloseSendStream memory error

In trying to reproduce this under ASan I ran into a bad cast in the unit test that causes a reliable ASan failure. There's no really good reason to believe that this is the source of your problem, but since it just scribbles memory somewhere, can you try the attached change and see if it fixes your problem.
(Assignee)

Updated

4 years ago
Flags: needinfo?(ethanhugg)
(Reporter)

Comment 8

4 years ago
I have not been able to repro this bug now that I included this patch 5/5 crashless runs when it used to crash almost every time.
Flags: needinfo?(ethanhugg)
(Assignee)

Comment 9

4 years ago
I filed a separate bug for the issue fixed by the candidate patch (which is clearly an issue)

https://bugzilla.mozilla.org/show_bug.cgi?id=925226

I have r+ on that from ABR, so I propose to land it and then keep a watch on this bug and
see if it goes away.
(Assignee)

Comment 10

4 years ago
Current thinking is that this is a bug solely in the unit test framework. Ethan, any updates?

If we can't repro, I propose we:

1. Un-mark this as security since it appears to not be a bug in the code.
2. Close.

Thoughts?
Flags: needinfo?(ethanhugg)
(Reporter)

Comment 11

4 years ago
After hundreds of attempts I am unable to repro this now with the patch from bug 925226.  I am hesitant to unmark as security because I hold an irrational belief that it's simply hidden.  Marking resolved and worksforme, if it re-appears we can re-open.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(ethanhugg)
Resolution: --- → WORKSFORME
(Assignee)

Comment 12

4 years ago
That WFM
Removing from release tracking per comment #11.

Updated

4 years ago
tracking-firefox27: + → ---

Updated

2 years ago
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.