Closed Bug 1150574 Opened 10 years ago Closed 10 years ago

Intermittent test_peerConnection_bug1013809.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack

Categories

(Core :: WebRTC, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1194397
Tracking Status
e10s + ---
firefox40 --- affected

People

(Reporter: RyanVM, Assigned: jesup)

References

(Blocks 1 open bug)

Details

06:22:43 INFO - 1992 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): legal ICE state transition from new to closed 06:22:43 INFO - 1993 INFO PeerConnectionWrapper (pcRemote): "onsignalingstatechange" event fired 06:22:43 INFO - 1994 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | signalingState is closed 06:22:43 INFO - 1995 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): legal signaling state transition from have-remote-offer to closed 06:22:43 INFO - 1996 INFO PeerConnectionWrapper (pcRemote): Closed connection. 06:22:43 INFO - 1997 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts 06:22:43 INFO - 1998 INFO MEMORY STAT vsize after test: 1194369024 06:22:43 INFO - 1999 INFO MEMORY STAT residentFast after test: 160112640 06:22:43 INFO - 2000 INFO MEMORY STAT heapAllocated after test: 106315008 06:22:43 INFO - 2001 INFO canplaythrough fired for media element pcLocal_local1_audio 06:22:43 INFO - 2002 INFO timeupdate fired for media element pcLocal_local1_audio 06:22:43 INFO - 2003 INFO time passed for media element pcLocal_local1_audio 06:22:43 INFO - 2004 INFO timeupdate fired for media element pcRemote_local1_audio 06:22:43 INFO - 2005 INFO canplaythrough fired for media element pcRemote_local1_audio 06:22:43 INFO - 2006 INFO timeupdate fired for media element pcRemote_local1_audio 06:22:43 WARNING - TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application terminated with exit code 11 06:22:43 INFO - runtests.py | Application ran for: 0:18:20.130956 06:22:43 INFO - zombiecheck | Reading PID log: /tmp/tmpF3lQo2pidlog 06:22:43 INFO - ==> process 2061 launched child process 2108 06:22:43 INFO - ==> process 2108 launched child process 5808 06:22:43 INFO - zombiecheck | Checking for orphan process with PID: 2108 06:22:43 INFO - zombiecheck | Checking for orphan process with PID: 5808 06:22:43 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/yQi0OfJuT1SyIhJEYhkmUA/artifacts/public/build/firefox-40.0a1.en-US.linux-x86_64.crashreporter-symbols.zip 06:22:56 INFO - mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.dmp 06:22:56 INFO - mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.extra 06:22:56 WARNING - PROCESS-CRASH | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] 06:22:56 INFO - Crash dump filename: /tmp/tmpivaNrv.mozrunner/minidumps/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.dmp 06:22:56 INFO - Operating system: Linux 06:22:56 INFO - 0.0.0 Linux 3.2.0-76-generic #111-Ubuntu SMP Tue Jan 13 22:16:09 UTC 2015 x86_64 06:22:56 INFO - CPU: amd64 06:22:56 INFO - family 6 model 62 stepping 4 06:22:56 INFO - 1 CPU 06:22:56 INFO - Crash reason: SIGSEGV 06:22:56 INFO - Crash address: 0x0 06:22:56 INFO - Thread 28 (crashed) 06:22:56 INFO - 0 firefox!mozalloc_abort(char const*) [mozalloc_abort.cpp:e99e71fa5089 : 33 + 0x0] 06:22:56 INFO - rbx = 0x00007fcfdb1cb828 r12 = 0x00007fcfd268cfc9 06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000 06:22:56 INFO - r15 = 0x000000000000001d rip = 0x0000000000407381 06:22:56 INFO - rsp = 0x00007fcfb87fe390 rbp = 0x00007fcfb87fe3a0 06:22:56 INFO - Found by: given as instruction pointer in context 06:22:56 INFO - 1 firefox!abort [mozalloc_abort.cpp:e99e71fa5089 : 71 + 0x4] 06:22:56 INFO - rbx = 0x00007fcfb444dd30 r12 = 0x00007fcfd268cfc9 06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000 06:22:56 INFO - r15 = 0x000000000000001d rip = 0x000000000040735f 06:22:56 INFO - rsp = 0x00007fcfb87fe3b0 rbp = 0x00007fcfb87fe3b0 06:22:56 INFO - Found by: call frame info 06:22:56 INFO - 2 libglib-2.0.so.0.3200.1 + 0x67f5c 06:22:56 INFO - rbx = 0x00007fcfb444dd30 r12 = 0x00007fcfd268cfc9 06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000 06:22:56 INFO - r15 = 0x000000000000001d rip = 0x00007fcfd3700f5d 06:22:56 INFO - rsp = 0x00007fcfb87fe3c0 rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: call frame info 06:22:56 INFO - 3 libglib-2.0.so.0.3200.1 + 0x8fb64 06:22:56 INFO - rip = 0x00007fcfd3728b65 rsp = 0x00007fcfb87fe3c8 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 4 libgdk-x11-2.0.so.0.2400.10 + 0x7afc8 06:22:56 INFO - rip = 0x00007fcfd268cfc9 rsp = 0x00007fcfb87fe3d0 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 5 libglib-2.0.so.0.3200.1 + 0x8fb64 06:22:56 INFO - rip = 0x00007fcfd3728b65 rsp = 0x00007fcfb87fe3d8 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 6 libglib-2.0.so.0.3200.1 + 0x8e30b 06:22:56 INFO - rip = 0x00007fcfd372730c rsp = 0x00007fcfb87fe3e0 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 7 libgdk-x11-2.0.so.0.2400.10 + 0x7ad2d 06:22:56 INFO - rip = 0x00007fcfd268cd2e rsp = 0x00007fcfb87fe410 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 8 libglib-2.0.so.0.3200.1 + 0x9130a 06:22:56 INFO - rip = 0x00007fcfd372a30b rsp = 0x00007fcfb87fe418 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 9 libgdk-x11-2.0.so.0.2400.10 + 0x7ad2d 06:22:56 INFO - rip = 0x00007fcfd268cd2e rsp = 0x00007fcfb87fe428 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning 06:22:56 INFO - 10 libgdk-x11-2.0.so.0.2400.10 + 0x7ae2f 06:22:56 INFO - rip = 0x00007fcfd268ce30 rsp = 0x00007fcfb87fe430 06:22:56 INFO - rbp = 0x00007fcfd398d260 06:22:56 INFO - Found by: stack scanning
Summary: Intermittent test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack → Intermittent test_peerConnection_bug1013809.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack
Linux e10s is pretty crappy for webrtc tests lately. Anything we can do to get some attention on it?
Flags: needinfo?(mreavy)
Yeah, jesup and pkerr are taking the lead this quarter (Q2) to resolve any e10s blockers in the WebRTC code. Part of this work will include investigating and improving WebRTC perf under e10s and cleaning up the tests that are problematic. I believe this falls in the last category. I'm assigning this to jesup and cc'ing pkerr, but I expect both of them will work on it.
Assignee: nobody → rjesup
Flags: needinfo?(mreavy)
Bug 1154981 may well be the same problem (but triggering an assertion instead)
This reads for all the world like a memory-trashing leading to failure deep in glib/x11 land. However, it seems to have shown up around 4/2, and it seems to only hit during one of several WebRTC tests. *Assuming* it has something to do with WebRTC, and that whatever it was landed shortly before the first hits, it's roughly in the range of Bug 1149298 (dcf3bcce815d on inbound) and Bug 1143694 (21e4e3a2b33d). Does anyone see anything with *any* risk of memory trashing or refcount issues/etc in hg diff -r dcf3bcce815d:21e4e3a2b33d media/webrtc media/mtransport dom/media/webrtc dom/media (or dom/media/webrtc dom/media/MediaManager* dom/media/MediaStream* dom/media/Graph*) I'm not seeing much; there are some conversions to use refcounting in media/mtransport/nricectx.cpp, but that looks sane to me. NI-ing people with patches in the range or knowledge of the code.
Flags: needinfo?(pkerr)
Flags: needinfo?(martin.thomson)
Flags: needinfo?(jib)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
It looks like lots of the webrtc tests are in this bug, including the getUserMedia tests, although since the peerConnection tests run first it could be that a previous peerConnection test is causing a crash in the getUserMedia tests.
Comment 16 is a failure in the browser chrome mochitest, but sadly the logs are not still around so I can't verify whether the stack looks similar...
Flags: needinfo?(docfaraday)
Comment 33 has a similar looking failure in browser/base/content/test/general/browser_devices_get_user_media.js (which is a browser chrome mochitest). Maybe this is GUM related?
FWIW, I don't see any failures in our datachannel-only test.
Wow that's a big diff, and I don't recognize much of this code. Perhaps we could bisect this with a couple try jobs? That's the only thing I can think of.
Flags: needinfo?(jib)
jib: the problem is that it's very low-frequency... (<1/day) and spread across N tests And the diff I gave is 300 lines or so, unless you expand it to more of dom/media. Even *all* of dom/media added in is only 1600 lines (and that includes context lines)
Your first hg command produced 11000 lines of diff for me, over 84 files, none of which include MediaManager or MediaStream.
Try hg diff -r 60d47f603817:8c068f0ce341 media/webrtc media/mtransport dom/media/webrtc You probably got dom/media too, and for some reason the hashes I gave were a slightly wider range (though check that, but not all of dom/media for that range)
I checked the diff; I'd say that we're not going to get much out of what is in that range. There's some changes to memory management, but nothing that is obviously bad, unless some of the code triggers other bad code.
Flags: needinfo?(martin.thomson)
backlog: --- → webRTC+
Rank: 31
Priority: -- → P3
I did not see anything in my first pass through the diffs.
Flags: needinfo?(pkerr)
A lot of these debug crashes are caused by assertions down in cairo, but not all of them.
See Also: → 1186989
this is another deep-in-glib crash set with no symbols; we need symbols for glib on the test machines Mike - do you know the bug for that?
Flags: needinfo?(mh+mozilla)
Unfortunately, I don't.
Flags: needinfo?(mh+mozilla)
I'm going to optimistically call this fixed by bug 1194397.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Clearing some old NI.
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.