Closed Bug 974004 Opened 6 years ago Closed 2 years ago

Investigate spike in Linux mochitest shutdown hangs beginning ~10-Feb

Categories

(Core :: Audio/Video: Playback, defect)

All
Linux
defect
Not set

Tracking

()

RESOLVED INACTIVE

People

(Reporter: RyanVM, Unassigned)

References

Details

Beginning around 10-February, we've seen a spike in Linux mochitest shutdown hangs. Not sure if the media stuff on the stack is relevant.

Logs look like this:
https://tbpl.mozilla.org/php/getParsedLog.php?id=34506820&tree=Mozilla-Inbound

Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-1 on 2014-02-11 11:32:30 PST for push b6200f26d4fc
slave: tst-linux64-ec2-029

12:49:04  WARNING -  TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
12:49:05     INFO -  Xlib:  extension "RANDR" missing on display ":0".
12:49:07  WARNING -  TEST-UNEXPECTED-FAIL | Shutdown | application terminated with exit code 6
12:49:07     INFO -  INFO | runtests.py | Application ran for: 1:12:43.060513
12:49:07     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpO5MeSUpidlog
12:49:07     INFO -  ==> process 2362 launched child process 2465
12:49:07     INFO -  ==> process 2362 launched child process 2474
12:49:07     INFO -  ==> process 2362 launched child process 3811
12:49:07     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2465
12:49:07     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2474
12:49:07     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3811
12:49:17  WARNING -  PROCESS-CRASH | Shutdown | application crashed [@ libpthread-2.15.so + 0xbd84]
12:49:17     INFO -  Crash dump filename: /tmp/tmph4y85_/minidumps/6fe47f9b-a3b8-f8de-2f4202cc-66156387.dmp
12:49:17     INFO -  Operating system: Linux
12:49:17     INFO -                    0.0.0 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
12:49:17     INFO -  CPU: amd64
12:49:17     INFO -       family 6 model 62 stepping 4
12:49:17     INFO -       1 CPU
12:49:17     INFO -  Crash reason:  SIGABRT
12:49:17     INFO -  Crash address: 0x3e800000910
12:49:17     INFO -  Thread 0 (crashed)
12:49:17     INFO -   0  libpthread-2.15.so + 0xbd84
12:49:17     INFO -      rbx = 0x0000000001c62bd0   r12 = 0x0000000001c62be0
12:49:17     INFO -      r13 = 0x0000000001c62c10   r14 = 0x00000000ffffffff
12:49:17     INFO -      r15 = 0x00007f8757047740   rip = 0x00007f8756c33d84
12:49:17     INFO -      rsp = 0x00007fff689ac470   rbp = 0x00007fff689ac4f0
12:49:17     INFO -      Found by: given as instruction pointer in context
12:49:17     INFO -   1  libnspr4.so!PR_Wait [ptsynch.c:b6200f26d4fc : 689 + 0xb]
12:49:17     INFO -      rip = 0x00007f8755a2dabe   rsp = 0x00007fff689ac4a0
12:49:17     INFO -      rbp = 0x00007fff689ac4f0
12:49:17     INFO -      Found by: stack scanning
12:49:17     INFO -   2  libxul.so!mozilla::ReentrantMonitor::Wait(unsigned int) [BlockingResourceBase.cpp:b6200f26d4fc : 309 + 0x4]
12:49:17     INFO -      rbx = 0x0000000001c628e0   r12 = 0x0000000000000000
12:49:17     INFO -      r13 = 0x0000000001bce9f0   r14 = 0x0000000000000001
12:49:17     INFO -      r15 = 0x0000000000000001   rip = 0x00007f874fda4764
12:49:17     INFO -      rsp = 0x00007fff689ac500   rbp = 0x00007fff689ac530
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   3  libxul.so!nsEventQueue::GetEvent(bool, nsIRunnable**) [ReentrantMonitor.h:b6200f26d4fc : 192 + 0xb]
12:49:17     INFO -      rbx = 0x0000000001c628e0   r12 = 0x00007fff689ac598
12:49:17     INFO -      r13 = 0x00007fff689ac601   r14 = 0x00000000048d7f01
12:49:17     INFO -      r15 = 0x0000000000000001   rip = 0x00007f874fe09e27
12:49:17     INFO -      rsp = 0x00007fff689ac540   rbp = 0x00007fff689ac570
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.h:b6200f26d4fc : 97 + 0x14]
12:49:17     INFO -      rbx = 0x0000000001c62860   r12 = 0x00000000080f7b01
12:49:17     INFO -      r13 = 0x00007fff689ac60f   r14 = 0x00000000048d7f01
12:49:17     INFO -      r15 = 0x0000000000000001   rip = 0x00007f874fe0efcb
12:49:17     INFO -      rsp = 0x00007fff689ac580   rbp = 0x00007fff689ac5f0
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   5  libxul.so!mozilla::StateMachineThread::SpinUntilShutdownComplete() [MediaShutdownManager.cpp:b6200f26d4fc : 72 + 0x16]
12:49:17     INFO -      rbx = 0x0000000008b27420   r12 = 0x00000000080f7b70
12:49:17     INFO -      r13 = 0x00007f8750ca65f2   r14 = 0x00000000048d7ff8
12:49:17     INFO -      r15 = 0x0000000000000010   rip = 0x00007f8750ca94ce
12:49:17     INFO -      rsp = 0x00007fff689ac600   rbp = 0x00007fff689ac620
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   6  libxul.so!mozilla::JoinStateMachineThreads [MediaShutdownManager.cpp:b6200f26d4fc : 201 + 0x7]
12:49:17     INFO -      rbx = 0x0000000008b27420   r12 = 0x00000000080f7b70
12:49:17     INFO -      r13 = 0x00007f8750ca65f2   r14 = 0x00000000048d7ff8
12:49:17     INFO -      r15 = 0x0000000000000010   rip = 0x00007f8750caa68c
12:49:17     INFO -      rsp = 0x00007fff689ac630   rbp = 0x00007fff689ac640
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   7  libxul.so!PL_DHashTableEnumerate [pldhash.cpp:b6200f26d4fc : 632 + 0x16]
12:49:17     INFO -      rbx = 0x0000000000000010   r12 = 0x00000000080f7b70
12:49:17     INFO -      r13 = 0x00007f8750ca65f2   r14 = 0x00000000048d7ff8
12:49:17     INFO -      r15 = 0x0000000000000010   rip = 0x00007f874fdacbc3
12:49:17     INFO -      rsp = 0x00007fff689ac650   rbp = 0x00007fff689ac6a0
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   8  libxul.so!mozilla::MediaShutdownManager::Shutdown() [nsTHashtable.h:b6200f26d4fc : 230 + 0x4]
12:49:17     INFO -      rbx = 0x00000000048d7fb0   r12 = 0x00007f8751dfa383
12:49:17     INFO -      r13 = 0x0000000001c8c258   r14 = 0x00007f8751dfa383
12:49:17     INFO -      r15 = 0x0000000000000000   rip = 0x00007f8750cb3892
12:49:17     INFO -      rsp = 0x00007fff689ac6b0   rbp = 0x00007fff689ac6d0
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -   9  libxul.so!mozilla::MediaShutdownManager::Observe(nsISupports*, char const*, char16_t const*) [MediaShutdownManager.cpp:b6200f26d4fc : 160 + 0x7]
12:49:17     INFO -      rbx = 0x00000000048d7fb0   r12 = 0x00007f8751dfa383
12:49:17     INFO -      r13 = 0x0000000001c8c258   r14 = 0x00007f8751dfa383
12:49:17     INFO -      r15 = 0x0000000000000000   rip = 0x00007f8750cb3915
12:49:17     INFO -      rsp = 0x00007fff689ac6e0   rbp = 0x00007fff689ac6f0
12:49:17     INFO -      Found by: call frame info
12:49:17     INFO -  10  libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:b6200f26d4fc : 96 + 0xe]
12:49:17     INFO -      rbx = 0x0000000000000024   r12 = 0x00007fff689ac708
12:49:17     INFO -      r13 = 0x0000000001c8c258   r14 = 0x00007f8751dfa383
12:49:17     INFO -      r15 = 0x0000000000000000   rip = 0x00007f874fde6888
12:49:17     INFO -      rsp = 0x00007fff689ac700   rbp = 0x00007fff689ac740
12:49:17     INFO -      Found by: call frame info
These are currently the #1 orange on trunk.
Roc/et al:  This appears to be a hang in mozilla::MediaShutdownManager::Shutdown().  Can you take a look and/or correlate against media checkins that affect threading/locking in that timeframe?  Thanks
Flags: needinfo?(roc)
Related to bug 961197 (just a guess)?
Maybe fixed by bug 949346, bug 973379, or bug 973139. Or something else.

There's active work in those bugs right now to fix issues like this.

Ryan: When you see shutdown waiting in MediaShutdownManager::Shutdown(), it means we're waiting for something else in the media pipeline to shutdown. So we can gleam more information by looking at the other active threads' stacks.

For example in the full log at:
https://tbpl.mozilla.org/php/getParsedLog.php?id=34506820&tree=Mozilla-Inbound

You can see a thread (31) blocked in StopDecodeThread(), and another thread (32) inside MediaDecoderStateMachine::DecodeThreadRun() that is blocked in GStreamerReader::ReadMetadata(). Which indicates Andrew's guess is on the money. ;)

(Just an FYI, so you can more effectively correlate the failures that are happening).
These seem to have fallen way down the list again. Any idea why?
Today's Windows nightly hung for me with MediaShutdownManager::Shutdown() spinning and another thread blocked in AudioStream::Write(). Different bug or would you like the stacks here?
Flags: needinfo?(cpearce)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #6)
> Today's Windows nightly hung for me with MediaShutdownManager::Shutdown()
> spinning and another thread blocked in AudioStream::Write(). Different bug
> or would you like the stacks here?

Windows and Linux have different AudioStream implementations, the Linux and Windows hanging cases are probably different, and should be different bugs.

Incidentally, I've seen this hang sporadically on Windows, but I don't know how to reliably reproduce it.
Flags: needinfo?(cpearce)
Component: Audio/Video → Audio/Video: Playback
Mass closing do to inactivity.
Feel free to re-open if still needed.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.