Closed Bug 1564902 Opened 2 years ago Closed 2 years ago

Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ abort] | Fatal error in src/media/webrtc/trunk/webrtc/modules/pacing/paced_sender.cc

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: dminor)

References

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

Attachments

(1 file)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=255686560&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/HcUEWDGlS7GAvo2DvBaj9A/runs/0/artifacts/public/logs/live_backing.log


10:52:16 INFO - TEST-START | dom/media/tests/mochitest/test_peerConnection_simulcastAnswer.html
10:52:16 INFO - [1632, Main Thread] WARNING: No active window: file /builds/worker/workspace/build/src/js/xpconnect/src/XPCJSContext.cpp, line 662
10:52:16 INFO - GECKO(1656) | --DOCSHELL 0x120920800 == 2 [pid = 1657] [id = {ea4357ff-e759-9943-ab5f-eee5634ce6b7}] [url = about:blank]
10:52:16 INFO - GECKO(1656) | ++DOMWINDOW == 8 (0x1377fb800) [pid = 1657] [serial = 451] [outer = 0x1209403e0]
10:52:16 INFO - GECKO(1656) | Timecard created 1562755935.671743

10:52:44 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ abort]
10:52:44 INFO - Crash dump filename: /var/folders/t8/jf1n2tdd58n8f53fqq6hzvnr000017/T/tmpp7j6P7.mozrunner/minidumps/ABD1894C-93A8-4FB7-8297-3CE9DC82FC4A.dmp
10:52:44 INFO - Operating system: Mac OS X
10:52:44 INFO - 10.14.5 18F132
10:52:44 INFO - CPU: amd64
10:52:44 INFO - family 6 model 69 stepping 1
10:52:44 INFO - 4 CPUs
10:52:44 INFO -
10:52:44 INFO - GPU: UNKNOWN
10:52:44 INFO -
10:52:44 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
10:52:44 INFO - Crash address: 0x0
10:52:44 INFO - Process uptime: 328 seconds
10:52:44 INFO -
10:52:44 INFO - Thread 51 (crashed)
10:52:44 INFO - 0 libmozglue.dylib!mozalloc_abort [mozalloc_abort.cpp:26592a2a32aaebbb33bd03042b6c283775b99745 : 33 + 0x16]
10:52:44 INFO - rax = 0x000000010867b3ea rdx = 0x0000000000000000
10:52:44 INFO - rcx = 0x0000000108689da8 rbx = 0x00007fffa501b9a0
10:52:44 INFO - rsi = 0x00000000000120a8 rdi = 0x00007fffa501b028
10:52:44 INFO - rbp = 0x00007000063e7a20 rsp = 0x00007000063e7a10
10:52:44 INFO - r8 = 0x00000000000130a8 r9 = 0x00007fffa501b048
10:52:44 INFO - r10 = 0x0000000000000000 r11 = 0x00007fffa501b040
10:52:44 INFO - r12 = 0x00007000063e7a40 r13 = 0x00007fffa501b9a0
10:52:44 INFO - r14 = 0x000000000000000a r15 = 0x00007000063e7aa0
10:52:44 INFO - rip = 0x000000010863b56a
10:52:44 INFO - Found by: given as instruction pointer in context
10:52:44 INFO - 1 libmozglue.dylib!abort [mozalloc_abort.cpp:26592a2a32aaebbb33bd03042b6c283775b99745 : 79 + 0xc]
10:52:44 INFO - rbp = 0x00007000063e7a30 rsp = 0x00007000063e7a30
10:52:44 INFO - rip = 0x000000010863b5d0
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 2 XUL!rtc::FatalMessage::~FatalMessage() [checks.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 69 + 0x5]
10:52:44 INFO - rbp = 0x00007000063e7a80 rsp = 0x00007000063e7a40
10:52:44 INFO - rip = 0x0000000111f46408
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 3 XUL!rtc::FatalMessage::~FatalMessage() [checks.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 63 + 0x9]
10:52:44 INFO - rbp = 0x00007000063e7a90 rsp = 0x00007000063e7a90
10:52:44 INFO - rip = 0x0000000111f43ff9
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 4 XUL!webrtc::PacedSender::SendPacket(webrtc::PacketQueue::Packet const&, webrtc::PacedPacketInfo const&) [paced_sender.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 318 + 0x3b]
10:52:44 INFO - rbp = 0x00007000063e7bd0 rsp = 0x00007000063e7aa0
10:52:44 INFO - rip = 0x0000000111ea78b1
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 5 XUL!webrtc::PacedSender::Process() [paced_sender.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 277 + 0xe]
10:52:44 INFO - rbp = 0x00007000063e7d60 rsp = 0x00007000063e7be0
10:52:44 INFO - rip = 0x0000000111ea74ee
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 6 XUL!webrtc::ProcessThreadImpl::Process() [process_thread_impl.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 195 + 0xb]
10:52:44 INFO - rbp = 0x00007000063e7e30 rsp = 0x00007000063e7d70
10:52:44 INFO - rip = 0x0000000111f0137a
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 7 XUL!rtc::PlatformThread::Run() [platform_thread.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 264 + 0x7]
10:52:44 INFO - rbp = 0x00007000063e9f00 rsp = 0x00007000063e7e40
10:52:44 INFO - rip = 0x0000000111f504d2
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 8 XUL!rtc::PlatformThread::StartThread(void*) [platform_thread.cc:26592a2a32aaebbb33bd03042b6c283775b99745 : 158 + 0x6]
10:52:44 INFO - rbp = 0x00007000063e9f10 rsp = 0x00007000063e9f10
10:52:44 INFO - rip = 0x0000000111f4ff0a
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 9 0x7fff6e8a72eb
10:52:44 INFO - rbp = 0x00007000063e9f30 rsp = 0x00007000063e9f20
10:52:44 INFO - rip = 0x00007fff6e8a72eb
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 10 0x7fff6e8aa249
10:52:44 INFO - rbp = 0x00007000063e9f50 rsp = 0x00007000063e9f40
10:52:44 INFO - rip = 0x00007fff6e8aa249
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 11 0x7fff6e8a640d
10:52:44 INFO - rbp = 0x00007000063e9f78 rsp = 0x00007000063e9f60
10:52:44 INFO - rip = 0x00007fff6e8a640d
10:52:44 INFO - Found by: previous frame's frame pointer
10:52:44 INFO - 12 XUL + 0x4fe9f00
10:52:44 INFO - rsp = 0x00007000063ea090 rip = 0x0000000111f4ff00
10:52:44 INFO - Found by: stack scanning

Type: -- → defect

The priority flag is not set for this bug.
:drno, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(drno)

Some of these failures are miscategorized, but we do have an intermittent failure here:

#
# Fatal error in /builds/worker/workspace/build/src/media/webrtc/trunk/webrtc/modules/pacing/paced_sender.cc, line 318
# last system error: 316
# Check failed: !paused_
#

Is there a way to make this bug more specific to this failure?

Flags: needinfo?(apavel)
Priority: -- → P2
Component: WebRTC → WebRTC: Audio/Video

Let's see how this goes.

Flags: needinfo?(apavel)
Summary: Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ abort] → Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ abort] | Fatal error in src/media/webrtc/trunk/webrtc/modules/pacing/paced_sender.cc

Looking closer, |paused_| seems to be guarded by |critsect_| in all cases, so this does not look like a simple data race. This also happens right after we close the PeerConnection, so it is likely teardown-related.

I think I see how this might happen. We exit the critical section briefly here:

https://searchfox.org/mozilla-central/rev/01d1011ca4a460f751da030d455d35c267c3e210/media/webrtc/trunk/webrtc/modules/pacing/paced_sender.cc#324-328

It is possible that we could set |paused_| during that window.

SendPacket is being called in a loop here:

https://searchfox.org/mozilla-central/rev/01d1011ca4a460f751da030d455d35c267c3e210/media/webrtc/trunk/webrtc/modules/pacing/paced_sender.cc#277

We are not checking |paused_| on each iteration of that loop, so if |paused_| is set during the window of opportunity described above, we could hit this assertion.

We didn't update the webrtc.org code in july, right? This may simply be a case of timing changes triggering an old bug. Is this fixed on webrtc.org now?

Flags: needinfo?(dminor)
Duplicate of this bug: 1567880
Crash Signature: [@ abort] → [@ abort] [@ ReceivePort::WaitForMessage(MachReceiveMessage*, unsigned int)]

No, no webrtc.org updates. The code has changed around a bit, but it appears that not checking paused is in fact the problem based upon this comment and the associated code [1], which has moved from paced_sender.cc to pacing_controller.cc.

[1] https://cs.chromium.org/chromium/src/third_party/webrtc/modules/pacing/pacing_controller.cc?q=pacing_controller.cc&sq=package:chromium&g=0&l=334.

Crash Signature: [@ abort] [@ ReceivePort::WaitForMessage(MachReceiveMessage*, unsigned int)] → [@ abort] [@ ReceivePort::WaitForMessage(MachReceiveMessage*, unsigned int)]
Flags: needinfo?(dminor)

:bwc, were you planning to fix this or do you want me to have a look?

Flags: needinfo?(drno) → needinfo?(docfaraday)

No, I was not, feel free to take a crack at it.

Flags: needinfo?(docfaraday)

It looks like we can check paused_ in that loop without ill effect. I'll do some testing and see how it goes.

Assignee: nobody → dminor
Status: NEW → ASSIGNED

There is a window inside of SendPacket where the critical section is released
which means that other code could set paused_. This would lead to us hitting
the assertion at the top of SendPacket. Checking paused_ in the while loop
will avoid this. Upstream has fixed this problem in a similar way, but the
code has changed there enough that it doesn't make sense to try to bring in
their fix directly.

Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2b676ea4de81
Checked paused_ prior to calling SendPacket; r=bwc
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
No longer depends on: 1646904
You need to log in before you can comment on or make changes to this bug.