Signaling unittest is solid on OSX, but crashes consistently on Linux (Ubuntu 64). Crash is not always on the same test, but the stack is always similar to this: #0 0x00000000012b20ac in nr_socket_getfd (sock=0x7ffeb38a0f01, fd= 0x7fffe58fc610) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/net/nr_socket.c:94 #1 0x00000000012b1a7a in nr_ice_socket_close (isock=0x7ffebd6e18ec) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_socket.c:257 #2 0x00000000012b1980 in nr_ice_socket_destroy (isockp=0x7fffe58fc680) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_socket.c:228 #3 0x00000000012a6eac in nr_ice_component_destroy (componentp=0x7fffe58fc6f0) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_component.c:157 #4 0x00000000012ac273 in nr_ice_media_stream_destroy (streamp=0x7fffe58fc750) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_media_stream.c:101 #5 0x00000000012ab1b5 in nr_ice_ctx_destroy_cb (s=0x0, how=0, cb_arg= 0x7ffedbba960c) at /home/ehugg/mozilla/mozilla-central/media/mtransport/third_party/nICEr/src/ice/ice_ctx.c:415 #6 0x000000000066e408 in mozilla::nrappkitTimerCallback::Notify (this= 0x7ffeb09d89c0, timer=0x7ffeb09ce8c0) at /home/ehugg/mozilla/mozilla-central/media/mtransport/nr_timer.cpp:99 #7 0x00007ffff37884c4 in nsTimerImpl::Fire (this=0x7ffeb09ce8c0) at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsTimerImpl.cpp:549 #8 0x00007ffff378883f in nsTimerEvent::Run (this=0x7fffe4f89200) at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsTimerImpl.cpp:630 #9 0x00007ffff37804ba in nsThread::ProcessNextEvent (this=0x7fffe771b120, mayWait=true, result=0x7fffe58fc9ef) at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:622 #10 0x00007ffff3700cbe in NS_ProcessNextEvent (thread=0x7fffe771b120, mayWait= true) at /home/ehugg/mozilla/mozilla-central/xpcom/glue/nsThreadUtils.cpp:238 #11 0x00007ffff12ed38c in nsSocketTransportService::Run (this=0x7fffe69c0220) at /home/ehugg/mozilla/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:688 #12 0x00007ffff37804ba in nsThread::ProcessNextEvent (this=0x7fffe771b120, mayWait=true, result=0x7fffe58fcb5f) at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:622 #13 0x00007ffff3700cbe in NS_ProcessNextEvent (thread=0x7fffe771b120, mayWait= true) at /home/ehugg/mozilla/mozilla-central/xpcom/glue/nsThreadUtils.cpp:238 #14 0x00007ffff377f336 in nsThread::ThreadFunc (arg=0x7fffe771b120) at /home/ehugg/mozilla/mozilla-central/xpcom/threads/nsThread.cpp:250 #15 0x00007ffff00abd91 in _pt_root (arg=0x7fffe77158c0) at /home/ehugg/mozilla/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:204 #16 0x00007ffff7bc4e9a in start_thread (arg=0x7fffe58fd700) at pthread_create.c:308 #17 0x00007fffeda99ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
Just got it to happen under the debugger in OSX as well. Each time so far I've seen a sock pointer that looks normal but either points to a bunch of 0x5a or some garbage bytes. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 [Switching to process 3535 thread 0x2413] 0x0000000100e35598 in nr_socket_getfd (sock=0x116ac7c01, fd=0x1122035c8) at nr_socket.c:94 94 return sock->vtbl->getfd(sock->obj, fd); (gdb) bt #0 0x0000000100e35598 in nr_socket_getfd (sock=0x116ac7c01, fd=0x1122035c8) at nr_socket.c:94 #1 0x0000000100e34d67 in nr_ice_socket_close (isock=0x11423498c) at ice_socket.c:257 #2 0x0000000100e34c06 in nr_ice_socket_destroy (isockp=0x1122036b0) at ice_socket.c:228 #3 0x0000000100e27f4d in nr_ice_component_destroy (componentp=0x112203720) at ice_component.c:157 #4 0x0000000100e2e24e in nr_ice_media_stream_destroy (streamp=0x112203798) at ice_media_stream.c:101 #5 0x0000000100e2cea0 in nr_ice_ctx_destroy_cb (s=0x0, how=0, cb_arg=0x11c6eca0c) at ice_ctx.c:415 #6 0x00000001000560c9 in mozilla::nrappkitTimerCallback::Notify (this=0x118bc1f80, timer=0x11172eb20) at /Users/ehugg/mozilla/work/media/mtransport/nr_timer.cpp:99 #7 0x00000001067c9b21 in nsTimerImpl::Fire (this=0x11172eb20) at /Users/ehugg/mozilla/work/xpcom/threads/nsTimerImpl.cpp:549 #8 0x00000001067c9f11 in nsTimerEvent::Run (this=0x113014380) at /Users/ehugg/mozilla/work/xpcom/threads/nsTimerImpl.cpp:630 #9 0x00000001067bf8c3 in nsThread::ProcessNextEvent (this=0x11174aa10, mayWait=true, result=0x112203b2e) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:622 #10 0x0000000106716cd9 in NS_ProcessNextEvent (thread=0x11174aa10, mayWait=true) at nsThreadUtils.cpp:238 #11 0x00000001039deb60 in nsSocketTransportService::Run (this=0x112785310) at /Users/ehugg/mozilla/work/netwerk/base/src/nsSocketTransportService2.cpp:688 #12 0x00000001039df55c in non-virtual thunk to nsSocketTransportService::Run() (this=0x112785328) at /Users/ehugg/mozilla/work/netwerk/base/src/nsSocketTransportService2.cpp:725 #13 0x00000001067bf8c3 in nsThread::ProcessNextEvent (this=0x11174aa10, mayWait=true, result=0x112203dde) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:622 #14 0x0000000106716cd9 in NS_ProcessNextEvent (thread=0x11174aa10, mayWait=true) at nsThreadUtils.cpp:238 #15 0x00000001067be237 in nsThread::ThreadFunc (arg=0x11174aa10) at /Users/ehugg/mozilla/work/xpcom/threads/nsThread.cpp:250
Could this be related to bug 922068?
Sounds sort of like a use-after-free so marking critical for now. Feel free to adjust as needed.
>Could this be related to bug 922068? Probably, but I cannot say for sure. I think this one also appeared with the trickle ICE patch and is probably also a sync bug. I've had this crash twenty times or so and it always has getfd at the top of the stack and never the snprintf from 922068. If a patch appears for that bug I can re-test and see if it fixes this one as well.
Jesup - assigning to you for starters, can you investigate further here or find an assignee?
This is my code, Iw will investigate.
Created attachment 815192 [details] [diff] [review] WIP Fix for CloseSendStream memory error In trying to reproduce this under ASan I ran into a bad cast in the unit test that causes a reliable ASan failure. There's no really good reason to believe that this is the source of your problem, but since it just scribbles memory somewhere, can you try the attached change and see if it fixes your problem.
I have not been able to repro this bug now that I included this patch 5/5 crashless runs when it used to crash almost every time.
I filed a separate bug for the issue fixed by the candidate patch (which is clearly an issue) https://bugzilla.mozilla.org/show_bug.cgi?id=925226 I have r+ on that from ABR, so I propose to land it and then keep a watch on this bug and see if it goes away.
Current thinking is that this is a bug solely in the unit test framework. Ethan, any updates? If we can't repro, I propose we: 1. Un-mark this as security since it appears to not be a bug in the code. 2. Close. Thoughts?
After hundreds of attempts I am unable to repro this now with the patch from bug 925226. I am hesitant to unmark as security because I hold an irrational belief that it's simply hidden. Marking resolved and worksforme, if it re-appears we can re-open.
Removing from release tracking per comment #11.