Closed Bug 1379081 Opened 7 years ago Closed 4 years ago

Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox56 --- affected
firefox57 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(4 keywords)

This is actually a hang during shutdown of the application. I see a couple of warnings/errors and a final assertion:

19:02:39     INFO -  1499392959713	Marionette	TRACE	2 -> [0,174,"quit",{"flags":["eAttemptQuit"]}]
19:02:39     INFO -  1499392959716	Marionette	INFO	New connections will no longer be accepted
19:02:39     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(weak) failed: file /home/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 343
19:02:40     INFO -  --DOCSHELL 0x12527f000 == 8 [pid = 1612] [id = {51337ebd-ca6a-104e-b30e-499164194a11}]
19:02:40     INFO -  1499392960284	Marionette	TRACE	2 <- [1,174,null,{"cause":"shutdown"}]
19:02:40     INFO -  [Parent 1612] WARNING: cannot post event if not initialized: file /home/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 290
19:02:40     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(weak) failed: file /home/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 343
19:02:40     INFO -  1499392960355	Marionette	DEBUG	Closed connection 2
19:02:40     INFO -  [Child 1613] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
19:02:40     INFO -  [Child 1613] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1614] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
19:02:40     INFO -  [Child 1614] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1615] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  [Child 1615] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1613] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330
[..]
19:02:40     INFO -  Assertion failure: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:870
19:02:40     INFO -  Redirecting call to abort() to mozalloc_abort
19:02:40     INFO -  Hit MOZ_CRASH() at /home/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
19:02:40     INFO -  nsStringStats
19:02:40     INFO -   => mAllocCount:          18517
19:02:40     INFO -   => mReallocCount:          344
19:02:40     INFO -   => mFreeCount:           18517
19:02:40     INFO -   => mShareCount:          15093
19:02:40     INFO -   => mAdoptCount:           1967
19:02:40     INFO -  Assertion failure: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292
19:02:40     INFO -  Redirecting call to abort() to mozalloc_abort
19:02:40     INFO -  Hit MOZ_CRASH() at /home/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
[..]
19:02:41     INFO -  *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
19:02:43     INFO -  --DOCSHELL 0x118aaa800 == 7 [pid = 1612] [id = {4aff02b3-457b-d14f-8ed3-9f932689ed2e}]
19:02:43     INFO -  --DOCSHELL 0x126b71800 == 6 [pid = 1612] [id = {73ba26ce-d4f9-fb45-8d34-ee8c815103e3}]
19:02:43     INFO -  1499392963204	Marionette	DEBUG	Received observer notification "xpcom-shutdown"
19:02:43     INFO -  1499392963205	Marionette	DEBUG	Received observer notification "xpcom-shutdown"
19:02:43     INFO -  [Parent 1612] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:43     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:44     INFO -  [Parent 1612] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330
19:02:44     INFO -  [Parent 1612] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330

Benjamin, is that information helpful to see what could have been caused this hang?
Component: Marionette → Startup and Profile System
Flags: needinfo?(benjamin)
Keywords: assertion, hang
Product: Testing → Toolkit
Version: Version 3 → unspecified
Without stacks this is mostly meaningless. Automation should be providing stacks from assertions (typically because they save minidumps). For the hang itself, if the harness is killing the process after a hang it should take a minidump before it kills it off and dump the stack.
Flags: needinfo?(benjamin)
Hm, we have a 65s timeout for the process to be killed. Isn't Firefox killed after 60s if a thread hangs? Has this timeout been increased? Or should we wait a bit longer similar to application update which has 70s set?
Flags: needinfo?(benjamin)
Don't count on the internal shutdown hang monitor to be reliable. Depending on the context, it might itself deadlock or otherwise fail. The harness should do a dump+kill procedure after some reasonable timeout.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> Don't count on the internal shutdown hang monitor to be reliable. Depending
> on the context, it might itself deadlock or otherwise fail. The harness
> should do a dump+kill procedure after some reasonable timeout.

Which timeout would you find reasonable? I don't think it would hurt to increase it to somewhat 90 or even 120s.
Oh, I don't care about the actual value. It could be 30s for all I care. What I care about is the harness collecting a stack.
Ok, lets see if I can get to bug 1348961 soon.
Depends on: 1348961
Component: Startup and Profile System → Untriaged
Product: Toolkit → Core
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
The problem here is actually a too long shutdown which causes a crash:

01:18:42     INFO -  Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160
01:18:42     INFO -  #01: PR_GetThreadName[/Users/cltbld/tasks/task_1505808525/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x144389]
01:18:42     INFO -  #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
01:18:42     INFO -  #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Summary: Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | IOError: Process killed because a requested application quit did not happen within 65s. Check gecko.log for errors. → Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
Moving from Core::Untriaged to Core::General https://bugzilla.mozilla.org/show_bug.cgi?id=1407598
Component: Untriaged → General
Status: REOPENED → RESOLVED
Closed: 7 years ago4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.