Intermittent test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]

RESOLVED INVALID

Status

defect
P5
critical
RESOLVED INVALID
2 years ago
Last year

People

(Reporter: intermittent-bug-filer, Unassigned)

Tracking

(Blocks 1 bug, {crash, intermittent-failure})

Version 3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 disabled, firefox58 disabled)

Details

(Whiteboard: [stockwell disabled], crash signature)

So there is a MOZCRASH assertion first which triggers a shutdown of Firefox:

19:12:41     INFO -  Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160
19:12:41     INFO -  #01: PR_GetThreadName[/Users/cltbld/tasks/task_1504400945/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x142359]
19:12:41     INFO -  #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
19:12:41     INFO -  #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]

But then the shutdown takes that long that Firefox gets killed:

19:14:01     INFO - PROCESS-CRASH | test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
19:14:01     INFO - Crash dump filename: /Users/cltbld/tasks/task_1504400945/build/tmppoVHnH.mozrunner-1504404463/minidumps/FB125D99-01D5-488B-ABD7-F971BE297EF7.dmp
19:14:01     INFO - Operating system: Mac OS X
19:14:01     INFO -                   10.10.5 14F27
19:14:01     INFO - CPU: amd64
19:14:01     INFO -      family 6 model 69 stepping 1
19:14:01     INFO -      4 CPUs
19:14:01     INFO - 
19:14:01     INFO - GPU: UNKNOWN
19:14:01     INFO - 
19:14:01     INFO - Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
19:14:01     INFO - Crash address: 0x0
19:14:01     INFO - Process uptime: 298 seconds
19:14:01     INFO - 
19:14:01     INFO - Thread 6 (crashed)
19:14:01     INFO -  0  XUL!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:4147186b278a : 160 + 0x0]
19:14:01     INFO -     rax = 0x0000000000000000   rdx = 0x00007fff7435f1f8
19:14:01     INFO -     rcx = 0x0000000000000000   rbx = 0x000000000000003f
19:14:01     INFO -     rsi = 0x00049b0000049b00   rdi = 0x00049a0000049b03
19:14:01     INFO -     rbp = 0x000000012ed80ec0   rsp = 0x000000012ed80eb0
19:14:01     INFO -      r8 = 0x000000012ed80e60    r9 = 0x000000012ed81000
19:14:01     INFO -     r10 = 0x00007fff849953ef   r11 = 0x00007fff849953c0
19:14:01     INFO -     r12 = 0x0000000000022a27   r13 = 0x00000000000008ff
19:14:01     INFO -     r14 = 0x000000011b744ff0   r15 = 0x0000000000000001
19:14:01     INFO -     rip = 0x000000010fc62130
19:14:01     INFO -     Found by: given as instruction pointer in context
19:14:01     INFO -  1  libnss3.dylib!_pt_root [ptthread.c:4147186b278a : 216 + 0x3]
19:14:01     INFO -     rbx = 0x000000012925f4a0   rbp = 0x000000012ed80ef0
19:14:01     INFO -     rsp = 0x000000012ed80ed0   r12 = 0x0000000000022a27
19:14:01     INFO -     r13 = 0x00000000000008ff   r14 = 0x000000012ed81000
19:14:01     INFO -     r15 = 0x0000000000000001   rip = 0x000000010b442359
19:14:01     INFO -     Found by: call frame info
19:14:01     INFO -  2  libsystem_pthread.dylib!_pthread_body + 0x83
19:14:01     INFO -     rbx = 0x000000012ed81000   rbp = 0x000000012ed80f10
19:14:01     INFO -     rsp = 0x000000012ed80f00   r12 = 0x0000000000022a27
19:14:01     INFO -     r13 = 0x00000000000008ff   r14 = 0x000000012925f4a0
19:14:01     INFO -     r15 = 0x000000010b442240   rip = 0x00007fff8499805a
19:14:01     INFO -     Found by: call frame info
19:14:01     INFO -  3  libsystem_pthread.dylib!_pthread_start + 0xb0
19:14:01     INFO -     rbp = 0x000000012ed80f50   rsp = 0x000000012ed80f20
19:14:01     INFO -     rip = 0x00007fff84997fd7
19:14:01     INFO -     Found by: previous frame's frame pointer
19:14:01     INFO -  4  libsystem_pthread.dylib!thread_start + 0xd
19:14:01     INFO -     rbp = 0x000000012ed80f78   rsp = 0x000000012ed80f60
19:14:01     INFO -     rip = 0x00007fff849953ed
19:14:01     INFO -     Found by: previous frame's frame pointer
19:14:01     INFO -  5  libnss3.dylib + 0x142240
19:14:01     INFO -     rsp = 0x000000012ed81030   rip = 0x000000010b442240
19:14:01     INFO -     Found by: stack scanning

Isn't the watchdog thread used to terminate hanging threads after ~60s? So is it hanging itself here?

I will leave this bug in fx-ui tests for now given the low number of crashes. I will take another look if the crash is spiking up.
Maybe this is related to bug 1395504, which doesn't let Firefox successfully shutdown.
Depends on: 1395504
Duplicate of this bug: 1385605
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
this started on september 26 as osx/debug failures for en-US local tests.

here is a recent log file:
https://public-artifacts.taskcluster.net/QkNSqMfwT8Sm446EvbEp6w/0/public/logs/live_backing.log

and data from the log:
14:58:54     INFO -  Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160
14:58:55     INFO -  #01: PR_GetThreadName[/Users/cltbld/tasks/task_1506462723/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x1443a9]
14:58:55     INFO -  #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
14:58:55     INFO -  #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
14:59:50     INFO - TEST-UNEXPECTED-ERROR | test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show | IOError: Process killed because a requested application quit did not happen within 120s. Check gecko.log for errors.
14:59:50     INFO - Traceback (most recent call last):
14:59:50     INFO -   File "/Users/cltbld/tasks/task_1506462723/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 137, in run
14:59:50     INFO -     self.setUp()
14:59:50     INFO -   File "/Users/cltbld/tasks/task_1506462723/build/tests/firefox-ui/tests/functional/sessionstore/test_tabbar_session_restore_button.py", line 99, in setUp
14:59:50     INFO -     super(TestNoTabbarSessionRestoreButton, self).setUp(restore_button_pref=0)
14:59:50     INFO -   File "/Users/cltbld/tasks/task_1506462723/build/tests/firefox-ui/tests/functional/sessionstore/test_tabbar_session_restore_button.py", line 24, in setUp
14:59:50     INFO -     self.marionette.quit(in_app=True)
14:59:50     INFO -   File "/Users/cltbld/tasks/task_1506462723/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _
14:59:50     INFO -     return func(*args, **kwargs)
14:59:50     INFO -   File "/Users/cltbld/tasks/task_1506462723/


I am not familiar with the crash/error, possibly this is known already.


I did some backfilling/retriggers:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=osx%20debug%20en-us%20local&tochange=44c358447666fd00d38fe9b72724d194b874e682&fromchange=eab929170ab0898bc23d88509efef9f8b3817152

Ideally when those results come in we will have an idea of when this started and how frequent it is.

:whimboo, can you help prioritize this and ensure the root cause is documented.
Flags: needinfo?(hskupin)
Whiteboard: [stockwell needswork]
This crash happens because a shutdown happened longer than 120s! Usually this should not happen because the background monitor of Firefox should kill any thread itself after ~60s during shutdown. But that didn't happen here.

Maybe that is related to bug 1395504 and there was added on condition which also triggers it.

But checking recent logs I also see such kind of assertions:

11:42:52     INFO -  Assertion failure: tree.empty(), at /builds/worker/workspace/build/src/js/src/ds/MemoryProtectionExceptionHandler.cpp:67
11:42:52     INFO -  Assertion failure: 0 == rv, at /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292

This brings me to bug 1404258, which was also filed recently due to the exact assertion error. Bobby removed the diagnostic code yesterday via bug 1403397. So in that case we should see a drop of failures today for that overlaying issue.
Depends on: 1404258
Flags: needinfo?(hskupin)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Not sure whom exactly to ask about this crash during a long shutdown. But maybe Mike can give us a hint.
Flags: needinfo?(mconley)
Comment hidden (Intermittent Failures Robot)
The test here is covering session restore. It's component might not be the right, but better than Testing/Marionette.
:whimboo any idea why this would be osx only?
It might be bug 1395504 here too which is blocking Firefox from shutting down correctly. It's already in the dependency list.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)

Comment 19

2 years ago
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a27d06c28634
Disable test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show on osx for frequent failures. r=me, a=test-only
please remember to enable this test when fixing the failure!
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
no traction in bug 1395504, I pinged again there- we will be at danger of disable-recommended by the end of the month.
in looking at this again, it looks fixed on oct 12th when I disabled the test- my mistake for overlooking the high number of failures.
Flags: needinfo?(mconley)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
This test doesn't exist anymore. Bug 1451709 removed it.
Status: NEW → RESOLVED
Closed: Last year
Depends on: 1451709
Keywords: leave-open
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.