Closed
Bug 798323
Opened 12 years ago
Closed 12 years ago
Peer connections are leaking a lot of threads and memory, which can not be released by GC
Categories
(Core :: WebRTC, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla20
Tracking | Status | |
---|---|---|
firefox18 | --- | affected |
firefox19 | --- | unaffected |
firefox20 | --- | fixed |
People
(Reporter: whimboo, Assigned: jesup)
References
Details
(Keywords: crash, memory-leak, testcase, Whiteboard: [WebRTC][blocking-webrtc+][MemShrink:P2][qa-])
Attachments
(1 file, 1 obsolete file)
3.22 KB,
text/html
|
Details |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:18.0) Gecko/18.0 Firefox/18.0 While working on the Mochitest for bug 796890 I have seen a crash on exit of the browser. Checking its details I have noticed a strange amount of running threads which were about 200 of them. I considered this as a massive leak and continued investigation. To reproduce the thread leak you have to do the following steps: 1. Checkout the latest code from the alder tree 2. Apply the attachment 668290 [details] [diff] [review] from bug 795367 to add the Mochitest makefiles 3. Apply the attachment 668414 [details] [diff] [review] of the video and audio peer connection WIP test 4. Run: TEST_PATH=dom/tests/mochitest/media/test_peerconnection_videoAudio.html make -C $(OBJ) mochitest-plain 5. Reload the page a couple of times or open new tabs with the test loaded 6. Close all the tabs 7. Shutdown Firefox It doesn't matter how long you wait the aquired threads are never released. Each time you load the test the number of thread will be increased by about 15. The crash information for my debug build is not that helpful so far: Operating system: Mac OS X 10.7.5 11G56 CPU: amd64 family 6 model 42 stepping 7 4 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x0 Thread 15 (crashed) 0 XUL + 0x267f168 rbx = 0x0000000000001000 r12 = 0x0000000000000046 r13 = 0x0000000000000046 r14 = 0x000000013d7dd000 r15 = 0x00007fff7f2040a8 rip = 0x0000000103a4f168 rsp = 0x000000013671aef0 rbp = 0x000000013671b150 Found by: given as instruction pointer in context I will try to get more details. Also I will make it a standalone testcase so that steps 1-3 are not necessary. I consider this important enough to ask for blocking Firefox 18.0.
Comment 1•12 years ago
|
||
Adding Anant. I don't think we should block all of WebRTC on this bug since WebRTC is preffed off and there is a great deal of value in getting WebRTC onto mozilla-central and into 18. But we will want a fix for this ASAP (ideally before uplift), and I will strongly push for getting this uplifted into Aurora within the first few days if we can't fix it before uplift.
Reporter | ||
Comment 2•12 years ago
|
||
Simplified test which only has to be loaded once. It will auto-refresh each 5 seconds.
Reporter | ||
Comment 3•12 years ago
|
||
I have to say that the crash only happens in my debug build so far. I can't the tinderbox build from Tuesday to crash.
Comment 5•12 years ago
|
||
I can't reproduce this. The example in this bug doesn't work - you're using .src instead of .mozSrcObject, and createFakeMediaStream is no longer a valid function, you have to use gUM({fake:true}). On another test page, I see some leaks, but I don't see a shutdown crash. I'm investigating the leaks. I don't think this should block landing as we're not going to flip the pref so unless someone specifically turns the functionality on they will not see the leak or the crash.
Comment 6•12 years ago
|
||
I should explicitly mention that this should definitely block turning on the pref, which is a different question from getting PeerConnection landed into mozilla-inbound today.
Reporter | ||
Comment 7•12 years ago
|
||
Oh, so my build was from Monday so probably before that change went in. That would explain why it doesn't work for you anymore. You will be able to reproduce when you use a tinderbox build from Monday. I can update the test to make it work with the latest tinderbox build. But one question ahead, does it mean I can no longer create a peer connection without requesting the media via gUM before?
Comment 8•12 years ago
|
||
You can definitely create a PeerConnection, but you can't call addStream until you've acquired a valid stream (real or fake) from gUM.
Reporter | ||
Comment 9•12 years ago
|
||
(In reply to Anant Narayanan [:anant] from comment #8) > You can definitely create a PeerConnection, but you can't call addStream > until you've acquired a valid stream (real or fake) from gUM. I'm still not sure why I need gUM here. As the latest source on the alder branch shows the createFakeMediaStream() method is still available for the PeerConnection object: http://hg.mozilla.org/projects/alder/file/38146b56dbdd/dom/media/PeerConnection.js#l482 So I have tested my attached testcase with the latest tinderbox and own debug build and I can still reproduce this problem. Within a minute the threads are up to about 300 and cpu load >100%. On which platform are you testing on that you can't reproduce it? I talked with Erk on IRC and he was able to reproduce it. As he mentioned it's somewhat related to the event loop.
Reporter | ||
Updated•12 years ago
|
tracking-firefox18:
? → ---
Comment 10•12 years ago
|
||
createFakeMediaStream is on alder right now, but it won't be shortly. gUM is the correct way to obtain a fake stream. Oh, I can definitely reproduce the thread problem, just not the shutdown crash.
Reporter | ||
Comment 11•12 years ago
|
||
With the latest alder build from about 4h ago I get the following stack information from the Apple crash report: Thread 5 Crashed:: Socket Thread 0 libsystem_kernel.dylib 0x00007fff8e1a6ce2 __pthread_kill + 10 1 libsystem_c.dylib 0x00007fff98dc87d2 pthread_kill + 95 2 libsystem_c.dylib 0x00007fff98db9a7a abort + 143 3 libnspr4.dylib 0x00000001005090fd PR_Assert + 109 4 libnss3.dylib 0x000000010061c1d2 PK11_GetInternalKeySlot + 98 (pk11slot.c:1754) 5 XUL 0x00000001042f13bc _ZN7mozillaL18nr_crypto_nss_hmacEPhiS0_iS0_ + 188 (nricectx.cpp:114) 6 XUL 0x00000001042d4bc1 nr_stun_compute_message_integrity + 209 (stun_codec.c:757) 7 XUL 0x00000001042d1eb8 nr_stun_attr_codec_message_integrity_encode + 88 (stun_codec.c:775) 8 XUL 0x00000001042d3180 nr_stun_encode_message + 1056 (stun_codec.c:1270) 9 XUL 0x00000001042ced89 nr_stun_client_send_request + 1577 (stun_client_ctx.c:358) 10 XUL 0x00000001042ce6e5 nr_stun_client_start + 149 (stun_client_ctx.c:112) 11 XUL 0x00000001042bdea6 nr_ice_candidate_pair_start + 230 (ice_candidate_pair.c:366) 12 XUL 0x00000001042c4df7 nr_ice_media_stream_check_timer_cb + 567 (ice_media_stream.c:333) 13 XUL 0x00000001042fb54a mozilla::nrappkitTimerCallback::Notify(nsITimer*) + 90 (nr_timer.cpp:53) 14 XUL 0x00000001036d9869 nsTimerImpl::Fire() + 1065 (nsTimerImpl.cpp:477) 15 XUL 0x00000001036d9bd6 nsTimerEvent::Run() + 230 (nsTimerImpl.cpp:558) 16 XUL 0x00000001036cf60d nsThread::ProcessNextEvent(bool, bool*) + 1373 (nsThread.cpp:613) 17 XUL 0x000000010363d13f NS_ProcessNextEvent_P(nsIThread*, bool) + 159 (nsThreadUtils.cpp:220) 18 XUL 0x000000010133fb1b nsSocketTransportService::Run() + 331 (nsSocketTransportService2.cpp:652) 19 XUL 0x00000001013404bc non-virtual thunk to nsSocketTransportService::Run() + 28 20 XUL 0x00000001036cf60d nsThread::ProcessNextEvent(bool, bool*) + 1373 (nsThread.cpp:613) 21 XUL 0x000000010363d13f NS_ProcessNextEvent_P(nsIThread*, bool) + 159 (nsThreadUtils.cpp:220) 22 XUL 0x00000001036ce1f7 nsThread::ThreadFunc(void*) + 295 (nsThread.cpp:256) 23 libnspr4.dylib 0x0000000100538793 _pt_root + 355 (ptthread.c:159) 24 libsystem_c.dylib 0x00007fff98dc68bf _pthread_start + 335 25 libsystem_c.dylib 0x00007fff98dc9b75 thread_start + 13
Reporter | ||
Comment 12•12 years ago
|
||
Can we get some progress on it? As it looks like it will block all of our mochitests. Some minutes ago I have noticed the leak too when doing the following steps: 1. Open http://people.mozilla.com/~anarayanan/webrtc/gum_test.html 2. Activate Video and stop it right away 3. Activate Audio Some seconds later you will see a constant increase of the memory used. Even closing the tab and forcing a GC doesn't reduce the memory.
Reporter | ||
Comment 13•12 years ago
|
||
Updated testcase for the new gUM API.
Attachment #668421 -
Attachment is obsolete: true
Reporter | ||
Comment 14•12 years ago
|
||
I can't reproduce the crash on shutdown anymore on OS X. If I can I will file it as a separate bug.
OS: Mac OS X → All
Hardware: x86 → All
Summary: Peer connections are leaking a lot of threads which also cause a crash on shutdown → Peer connections are leaking a lot of threads and memory, which can not be released by GC
Comment 15•12 years ago
|
||
I'm pretty sure we'll need to hook up a bunch of the WebRTC classes to cycle collection. As an example, PeerConnectionImpl holds a window which makes it trivial to cause leaks if it's not CC'ed.
Updated•12 years ago
|
Whiteboard: [WebRTC], [blocking-webrtc+] → [WebRTC], [blocking-webrtc+][MemShrink]
Comment 16•12 years ago
|
||
> I should explicitly mention that this should definitely block turning on the
> pref
+1
Whiteboard: [WebRTC], [blocking-webrtc+][MemShrink] → [WebRTC][blocking-webrtc+][MemShrink:P2]
Updated•12 years ago
|
Priority: -- → P1
Updated•12 years ago
|
Flags: in-testsuite?
Reporter | ||
Comment 17•12 years ago
|
||
Randell, has one of the latest patches fixed that issue? Can you directly point to one? If not I can test the latest nightly builds so that we can figure that out. Good to see such a progress here!
Reporter | ||
Comment 18•12 years ago
|
||
Just to note, the dependent fix has not been landed for Firefox 18.0 yet, so Aurora is still affected.
status-firefox18:
--- → affected
status-firefox19:
--- → unaffected
Comment 19•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #18) > Just to note, the dependent fix has not been landed for Firefox 18.0 yet Can you add it to the Depends On field?
Reporter | ||
Comment 20•12 years ago
|
||
I don't know of this other landed patch. Please see my comment 17. I hope Randell can help us.
Comment 21•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #18) > Just to note, the dependent fix has not been landed for Firefox 18.0 yet, so > Aurora is still affected. Hmm...if this unaffected on Nightly 19, does this imply we should be able to enable crash tests on mozilla central now? Also - I think a larger decision (during the weekly meeting) needs to be made if we want to do a large spin of aurora uplifts or not to FF 18, cause there's a lot of them. I'm wondering if it's worth it vs. the overhead of maintaining two builds when there's a lot of stability issues we are still fighting through. I'll get more information on this.
Reporter | ||
Comment 22•12 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #21) > Hmm...if this unaffected on Nightly 19, does this imply we should be able to > enable crash tests on mozilla central now? I don't think so. There are probably some more leaks left which need to be fixed first. But you can try to run the crashtests on mozilla-central. If the leak check passes we would indeed be able to. That would be great!
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → rjesup
Reporter | ||
Comment 23•12 years ago
|
||
So the threading issue is fixed in Firefox 20 now but we still have a slight increase of memory over time. It's not that much but consistently growing.
Reporter | ||
Comment 24•12 years ago
|
||
Can't see see it leaking anymore with the last dependency bug fixed. Good work! I don't think that we can have a test for this specific issue. It takes too long and not sure how to measure memory and thread leakage. Instead a manual case would probably a good idea.
Status: NEW → RESOLVED
Closed: 12 years ago
status-firefox20:
--- → fixed
Flags: in-testsuite?
Flags: in-testsuite-
Flags: in-moztrap?(jsmith)
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Comment 25•12 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [away 12/21 - 01/01] from comment #24) > Can't see see it leaking anymore with the last dependency bug fixed. Good > work! > > I don't think that we can have a test for this specific issue. It takes too > long and not sure how to measure memory and thread leakage. Instead a manual > case would probably a good idea. Manual test cases defined typically don't test for leaks. We might do one-off investigations for leaks though. In future test runs, I'll probably be running debug builds with leak checking turned on to mitigate for this. That will help mitigate catching issues such as this bug.
Flags: in-moztrap?(jsmith) → in-moztrap-
Whiteboard: [WebRTC][blocking-webrtc+][MemShrink:P2] → [WebRTC][blocking-webrtc+][MemShrink:P2][qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•