Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]

REOPENED
Unassigned

Status

()

REOPENED
a year ago
11 months ago

People

(Reporter: intermittent-bug-filer, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, 4 keywords)

unspecified
assertion, crash, hang, intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 affected, firefox57 affected)

Details

This is actually a hang during shutdown of the application. I see a couple of warnings/errors and a final assertion:

19:02:39     INFO -  1499392959713	Marionette	TRACE	2 -> [0,174,"quit",{"flags":["eAttemptQuit"]}]
19:02:39     INFO -  1499392959716	Marionette	INFO	New connections will no longer be accepted
19:02:39     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(weak) failed: file /home/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 343
19:02:40     INFO -  --DOCSHELL 0x12527f000 == 8 [pid = 1612] [id = {51337ebd-ca6a-104e-b30e-499164194a11}]
19:02:40     INFO -  1499392960284	Marionette	TRACE	2 <- [1,174,null,{"cause":"shutdown"}]
19:02:40     INFO -  [Parent 1612] WARNING: cannot post event if not initialized: file /home/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 290
19:02:40     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(weak) failed: file /home/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 343
19:02:40     INFO -  1499392960355	Marionette	DEBUG	Closed connection 2
19:02:40     INFO -  [Child 1613] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
19:02:40     INFO -  [Child 1613] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1614] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
19:02:40     INFO -  [Child 1614] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1615] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:40     INFO -  [Child 1615] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:40     INFO -  [Child 1613] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330
[..]
19:02:40     INFO -  Assertion failure: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:870
19:02:40     INFO -  Redirecting call to abort() to mozalloc_abort
19:02:40     INFO -  Hit MOZ_CRASH() at /home/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
19:02:40     INFO -  nsStringStats
19:02:40     INFO -   => mAllocCount:          18517
19:02:40     INFO -   => mReallocCount:          344
19:02:40     INFO -   => mFreeCount:           18517
19:02:40     INFO -   => mShareCount:          15093
19:02:40     INFO -   => mAdoptCount:           1967
19:02:40     INFO -  Assertion failure: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292
19:02:40     INFO -  Redirecting call to abort() to mozalloc_abort
19:02:40     INFO -  Hit MOZ_CRASH() at /home/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33
[..]
19:02:41     INFO -  *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
19:02:43     INFO -  --DOCSHELL 0x118aaa800 == 7 [pid = 1612] [id = {4aff02b3-457b-d14f-8ed3-9f932689ed2e}]
19:02:43     INFO -  --DOCSHELL 0x126b71800 == 6 [pid = 1612] [id = {73ba26ce-d4f9-fb45-8d34-ee8c815103e3}]
19:02:43     INFO -  1499392963204	Marionette	DEBUG	Received observer notification "xpcom-shutdown"
19:02:43     INFO -  1499392963205	Marionette	DEBUG	Received observer notification "xpcom-shutdown"
19:02:43     INFO -  [Parent 1612] WARNING: nsAppShell::Exit() called redundantly: file /home/worker/workspace/build/src/widget/cocoa/nsAppShell.mm, line 690
19:02:43     INFO -  [Parent 1612] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 1040
19:02:44     INFO -  [Parent 1612] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330
19:02:44     INFO -  [Parent 1612] WARNING: '!mMainThread', file /home/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 330

Benjamin, is that information helpful to see what could have been caused this hang?
status-firefox56: --- → affected
Component: Marionette → Startup and Profile System
Flags: needinfo?(benjamin)
Keywords: assertion, hang
Product: Testing → Toolkit
Version: Version 3 → unspecified

Comment 2

a year ago
Without stacks this is mostly meaningless. Automation should be providing stacks from assertions (typically because they save minidumps). For the hang itself, if the harness is killing the process after a hang it should take a minidump before it kills it off and dump the stack.
Flags: needinfo?(benjamin)
Hm, we have a 65s timeout for the process to be killed. Isn't Firefox killed after 60s if a thread hangs? Has this timeout been increased? Or should we wait a bit longer similar to application update which has 70s set?
Flags: needinfo?(benjamin)

Comment 4

a year ago
Don't count on the internal shutdown hang monitor to be reliable. Depending on the context, it might itself deadlock or otherwise fail. The harness should do a dump+kill procedure after some reasonable timeout.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> Don't count on the internal shutdown hang monitor to be reliable. Depending
> on the context, it might itself deadlock or otherwise fail. The harness
> should do a dump+kill procedure after some reasonable timeout.

Which timeout would you find reasonable? I don't think it would hurt to increase it to somewhat 90 or even 120s.

Comment 6

a year ago
Oh, I don't care about the actual value. It could be 30s for all I care. What I care about is the harness collecting a stack.
Ok, lets see if I can get to bug 1348961 soon.
Depends on: 1348961

Updated

a year ago
Component: Startup and Profile System → Untriaged
Product: Toolkit → Core
2 failures in 656 pushes (0.003 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1
* autoland: 1

Platform breakdown:
* osx-cross: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1379081&startday=2017-07-03&endday=2017-07-09&tree=all
1 failures in 1008 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1379081&startday=2017-07-24&endday=2017-07-30&tree=all
1 failures in 888 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1379081&startday=2017-07-31&endday=2017-08-06&tree=all
Depends on: 1394377
Status: NEW → RESOLVED
Last Resolved: a year ago
Keywords: bulk-close-intermittents
Resolution: --- → INCOMPLETE
The problem here is actually a too long shutdown which causes a crash:

01:18:42     INFO -  Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160
01:18:42     INFO -  #01: PR_GetThreadName[/Users/cltbld/tasks/task_1505808525/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x144389]
01:18:42     INFO -  #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
01:18:42     INFO -  #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
Blocks: 1358898
Status: RESOLVED → REOPENED
status-firefox57: --- → affected
Keywords: bulk-close-intermittents → crash
Resolution: INCOMPLETE → ---
Summary: Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | IOError: Process killed because a requested application quit did not happen within 65s. Check gecko.log for errors. → Intermittent test_tabbar_session_restore_button.py TestTabbarSessionRestoreButton.test_click_restore | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
Moving from Core::Untriaged to Core::General https://bugzilla.mozilla.org/show_bug.cgi?id=1407598
Component: Untriaged → General
You need to log in before you can comment on or make changes to this bug.