Closed Bug 1150574 Opened 9 years ago Closed 9 years ago

Intermittent test_peerConnection_bug1013809.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack

Categories

(Core :: WebRTC, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1194397
Tracking Status
e10s + ---
firefox40 --- affected

People

(Reporter: RyanVM, Assigned: jesup)

References

(Blocks 1 open bug)

Details

06:22:43 INFO - 1992 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): legal ICE state transition from new to closed
06:22:43 INFO - 1993 INFO PeerConnectionWrapper (pcRemote): "onsignalingstatechange" event fired
06:22:43 INFO - 1994 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | signalingState is closed
06:22:43 INFO - 1995 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | PeerConnectionWrapper (pcRemote): legal signaling state transition from have-remote-offer to closed
06:22:43 INFO - 1996 INFO PeerConnectionWrapper (pcRemote): Closed connection.
06:22:43 INFO - 1997 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
06:22:43 INFO - 1998 INFO MEMORY STAT vsize after test: 1194369024
06:22:43 INFO - 1999 INFO MEMORY STAT residentFast after test: 160112640
06:22:43 INFO - 2000 INFO MEMORY STAT heapAllocated after test: 106315008
06:22:43 INFO - 2001 INFO canplaythrough fired for media element pcLocal_local1_audio
06:22:43 INFO - 2002 INFO timeupdate fired for media element pcLocal_local1_audio
06:22:43 INFO - 2003 INFO time passed for media element pcLocal_local1_audio
06:22:43 INFO - 2004 INFO timeupdate fired for media element pcRemote_local1_audio
06:22:43 INFO - 2005 INFO canplaythrough fired for media element pcRemote_local1_audio
06:22:43 INFO - 2006 INFO timeupdate fired for media element pcRemote_local1_audio
06:22:43 WARNING - TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application terminated with exit code 11
06:22:43 INFO - runtests.py | Application ran for: 0:18:20.130956
06:22:43 INFO - zombiecheck | Reading PID log: /tmp/tmpF3lQo2pidlog
06:22:43 INFO - ==> process 2061 launched child process 2108
06:22:43 INFO - ==> process 2108 launched child process 5808
06:22:43 INFO - zombiecheck | Checking for orphan process with PID: 2108
06:22:43 INFO - zombiecheck | Checking for orphan process with PID: 5808
06:22:43 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/yQi0OfJuT1SyIhJEYhkmUA/artifacts/public/build/firefox-40.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
06:22:56 INFO - mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.dmp
06:22:56 INFO - mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.extra
06:22:56 WARNING - PROCESS-CRASH | dom/media/tests/mochitest/test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)]
06:22:56 INFO - Crash dump filename: /tmp/tmpivaNrv.mozrunner/minidumps/7272984c-ae72-5e80-4eb5f6f7-41f76ea6.dmp
06:22:56 INFO - Operating system: Linux
06:22:56 INFO - 0.0.0 Linux 3.2.0-76-generic #111-Ubuntu SMP Tue Jan 13 22:16:09 UTC 2015 x86_64
06:22:56 INFO - CPU: amd64
06:22:56 INFO - family 6 model 62 stepping 4
06:22:56 INFO - 1 CPU
06:22:56 INFO - Crash reason: SIGSEGV
06:22:56 INFO - Crash address: 0x0
06:22:56 INFO - Thread 28 (crashed)
06:22:56 INFO - 0 firefox!mozalloc_abort(char const*) [mozalloc_abort.cpp:e99e71fa5089 : 33 + 0x0]
06:22:56 INFO - rbx = 0x00007fcfdb1cb828 r12 = 0x00007fcfd268cfc9
06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000
06:22:56 INFO - r15 = 0x000000000000001d rip = 0x0000000000407381
06:22:56 INFO - rsp = 0x00007fcfb87fe390 rbp = 0x00007fcfb87fe3a0
06:22:56 INFO - Found by: given as instruction pointer in context
06:22:56 INFO - 1 firefox!abort [mozalloc_abort.cpp:e99e71fa5089 : 71 + 0x4]
06:22:56 INFO - rbx = 0x00007fcfb444dd30 r12 = 0x00007fcfd268cfc9
06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000
06:22:56 INFO - r15 = 0x000000000000001d rip = 0x000000000040735f
06:22:56 INFO - rsp = 0x00007fcfb87fe3b0 rbp = 0x00007fcfb87fe3b0
06:22:56 INFO - Found by: call frame info
06:22:56 INFO - 2 libglib-2.0.so.0.3200.1 + 0x67f5c
06:22:56 INFO - rbx = 0x00007fcfb444dd30 r12 = 0x00007fcfd268cfc9
06:22:56 INFO - r13 = 0x00007fcfb10e6b00 r14 = 0x0000000000000000
06:22:56 INFO - r15 = 0x000000000000001d rip = 0x00007fcfd3700f5d
06:22:56 INFO - rsp = 0x00007fcfb87fe3c0 rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: call frame info
06:22:56 INFO - 3 libglib-2.0.so.0.3200.1 + 0x8fb64
06:22:56 INFO - rip = 0x00007fcfd3728b65 rsp = 0x00007fcfb87fe3c8
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 4 libgdk-x11-2.0.so.0.2400.10 + 0x7afc8
06:22:56 INFO - rip = 0x00007fcfd268cfc9 rsp = 0x00007fcfb87fe3d0
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 5 libglib-2.0.so.0.3200.1 + 0x8fb64
06:22:56 INFO - rip = 0x00007fcfd3728b65 rsp = 0x00007fcfb87fe3d8
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 6 libglib-2.0.so.0.3200.1 + 0x8e30b
06:22:56 INFO - rip = 0x00007fcfd372730c rsp = 0x00007fcfb87fe3e0
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 7 libgdk-x11-2.0.so.0.2400.10 + 0x7ad2d
06:22:56 INFO - rip = 0x00007fcfd268cd2e rsp = 0x00007fcfb87fe410
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 8 libglib-2.0.so.0.3200.1 + 0x9130a
06:22:56 INFO - rip = 0x00007fcfd372a30b rsp = 0x00007fcfb87fe418
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 9 libgdk-x11-2.0.so.0.2400.10 + 0x7ad2d
06:22:56 INFO - rip = 0x00007fcfd268cd2e rsp = 0x00007fcfb87fe428
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
06:22:56 INFO - 10 libgdk-x11-2.0.so.0.2400.10 + 0x7ae2f
06:22:56 INFO - rip = 0x00007fcfd268ce30 rsp = 0x00007fcfb87fe430
06:22:56 INFO - rbp = 0x00007fcfd398d260
06:22:56 INFO - Found by: stack scanning
Summary: Intermittent test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack → Intermittent test_peerConnection_bug1013809.html,test_peerConnection_setLocalOfferInHaveRemoteOffer.html | application crashed [@ mozalloc_abort(char const*)] with libglib-2.0.so.0.3200.1 on the stack
Linux e10s is pretty crappy for webrtc tests lately. Anything we can do to get some attention on it?
Flags: needinfo?(mreavy)
Yeah, jesup and pkerr are taking the lead this quarter (Q2) to resolve any e10s blockers in the WebRTC code. Part of this work will include investigating and improving WebRTC perf under e10s and cleaning up the tests that are problematic.  I believe this falls in the last category.  I'm assigning this to jesup and cc'ing pkerr, but I expect both of them will work on it.
Assignee: nobody → rjesup
Flags: needinfo?(mreavy)
Bug 1154981 may well be the same problem (but triggering an assertion instead)
This reads for all the world like a memory-trashing leading to failure deep in glib/x11 land.  However, it seems to have shown up around 4/2, and it seems to only hit during one of several WebRTC tests.

*Assuming* it has something to do with WebRTC, and that whatever it was landed shortly before the first hits, it's roughly in the range of Bug 1149298 (dcf3bcce815d on inbound) and Bug 1143694 (21e4e3a2b33d).

Does anyone see anything with *any* risk of memory trashing or refcount issues/etc in 
hg diff -r dcf3bcce815d:21e4e3a2b33d media/webrtc media/mtransport dom/media/webrtc dom/media (or dom/media/webrtc dom/media/MediaManager* dom/media/MediaStream* dom/media/Graph*)

I'm not seeing much; there are some conversions to use refcounting in media/mtransport/nricectx.cpp, but that looks sane to me.

NI-ing people with patches in the range or knowledge of the code.
Flags: needinfo?(pkerr)
Flags: needinfo?(martin.thomson)
Flags: needinfo?(jib)
Flags: needinfo?(drno)
Flags: needinfo?(docfaraday)
It looks like lots of the webrtc tests are in this bug, including the getUserMedia tests, although since the peerConnection tests run first it could be that a previous peerConnection test is causing a crash in the getUserMedia tests.
Comment 16 is a failure in the browser chrome mochitest, but sadly the logs are not still around so I can't verify whether the stack looks similar...
Flags: needinfo?(docfaraday)
Comment 33 has a similar looking failure in browser/base/content/test/general/browser_devices_get_user_media.js (which is a browser chrome mochitest). Maybe this is GUM related?
FWIW, I don't see any failures in our datachannel-only test.
Wow that's a big diff, and I don't recognize much of this code. Perhaps we could bisect this with a couple try jobs? That's the only thing I can think of.
Flags: needinfo?(jib)
jib: the problem is that it's very low-frequency... (<1/day) and spread across N tests

And the diff I gave is 300 lines or so, unless you expand it to more of dom/media.  Even *all* of dom/media added in is only 1600 lines (and that includes context lines)
Your first hg command produced 11000 lines of diff for me, over 84 files, none of which include MediaManager or MediaStream.
Try
hg diff -r 60d47f603817:8c068f0ce341 media/webrtc media/mtransport dom/media/webrtc

You probably got dom/media too, and for some reason the hashes I gave were a slightly wider range (though check that, but not all of dom/media for that range)
I checked the diff; I'd say that we're not going to get much out of what is in that range.  There's some changes to memory management, but nothing that is obviously bad, unless some of the code triggers other bad code.
Flags: needinfo?(martin.thomson)
backlog: --- → webRTC+
Rank: 31
Priority: -- → P3
I did not see anything in my first pass through the diffs.
Flags: needinfo?(pkerr)
A lot of these debug crashes are caused by assertions down in cairo, but not all of them.
See Also: → 1186989
this is another deep-in-glib crash set with no symbols; we need symbols for glib on the test machines

Mike - do you know the bug for that?
Flags: needinfo?(mh+mozilla)
Unfortunately, I don't.
Flags: needinfo?(mh+mozilla)
I'm going to optimistically call this fixed by bug 1194397.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Clearing some old NI.
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.