Closed Bug 1396803 Opened 3 years ago Closed 2 years ago
_tabbar _session _restore _button .py Test No Tabbar Session Restore Button .test _pref _off _button _does _not _show | application crashed [@ mozilla::(anonymous namespace)::Run Watchdog(void*)]
So there is a MOZCRASH assertion first which triggers a shutdown of Firefox: 19:12:41 INFO - Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160 19:12:41 INFO - #01: PR_GetThreadName[/Users/cltbld/tasks/task_1504400945/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x142359] 19:12:41 INFO - #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a] 19:12:41 INFO - #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7] But then the shutdown takes that long that Firefox gets killed: 19:14:01 INFO - PROCESS-CRASH | test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] 19:14:01 INFO - Crash dump filename: /Users/cltbld/tasks/task_1504400945/build/tmppoVHnH.mozrunner-1504404463/minidumps/FB125D99-01D5-488B-ABD7-F971BE297EF7.dmp 19:14:01 INFO - Operating system: Mac OS X 19:14:01 INFO - 10.10.5 14F27 19:14:01 INFO - CPU: amd64 19:14:01 INFO - family 6 model 69 stepping 1 19:14:01 INFO - 4 CPUs 19:14:01 INFO - 19:14:01 INFO - GPU: UNKNOWN 19:14:01 INFO - 19:14:01 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 19:14:01 INFO - Crash address: 0x0 19:14:01 INFO - Process uptime: 298 seconds 19:14:01 INFO - 19:14:01 INFO - Thread 6 (crashed) 19:14:01 INFO - 0 XUL!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:4147186b278a : 160 + 0x0] 19:14:01 INFO - rax = 0x0000000000000000 rdx = 0x00007fff7435f1f8 19:14:01 INFO - rcx = 0x0000000000000000 rbx = 0x000000000000003f 19:14:01 INFO - rsi = 0x00049b0000049b00 rdi = 0x00049a0000049b03 19:14:01 INFO - rbp = 0x000000012ed80ec0 rsp = 0x000000012ed80eb0 19:14:01 INFO - r8 = 0x000000012ed80e60 r9 = 0x000000012ed81000 19:14:01 INFO - r10 = 0x00007fff849953ef r11 = 0x00007fff849953c0 19:14:01 INFO - r12 = 0x0000000000022a27 r13 = 0x00000000000008ff 19:14:01 INFO - r14 = 0x000000011b744ff0 r15 = 0x0000000000000001 19:14:01 INFO - rip = 0x000000010fc62130 19:14:01 INFO - Found by: given as instruction pointer in context 19:14:01 INFO - 1 libnss3.dylib!_pt_root [ptthread.c:4147186b278a : 216 + 0x3] 19:14:01 INFO - rbx = 0x000000012925f4a0 rbp = 0x000000012ed80ef0 19:14:01 INFO - rsp = 0x000000012ed80ed0 r12 = 0x0000000000022a27 19:14:01 INFO - r13 = 0x00000000000008ff r14 = 0x000000012ed81000 19:14:01 INFO - r15 = 0x0000000000000001 rip = 0x000000010b442359 19:14:01 INFO - Found by: call frame info 19:14:01 INFO - 2 libsystem_pthread.dylib!_pthread_body + 0x83 19:14:01 INFO - rbx = 0x000000012ed81000 rbp = 0x000000012ed80f10 19:14:01 INFO - rsp = 0x000000012ed80f00 r12 = 0x0000000000022a27 19:14:01 INFO - r13 = 0x00000000000008ff r14 = 0x000000012925f4a0 19:14:01 INFO - r15 = 0x000000010b442240 rip = 0x00007fff8499805a 19:14:01 INFO - Found by: call frame info 19:14:01 INFO - 3 libsystem_pthread.dylib!_pthread_start + 0xb0 19:14:01 INFO - rbp = 0x000000012ed80f50 rsp = 0x000000012ed80f20 19:14:01 INFO - rip = 0x00007fff84997fd7 19:14:01 INFO - Found by: previous frame's frame pointer 19:14:01 INFO - 4 libsystem_pthread.dylib!thread_start + 0xd 19:14:01 INFO - rbp = 0x000000012ed80f78 rsp = 0x000000012ed80f60 19:14:01 INFO - rip = 0x00007fff849953ed 19:14:01 INFO - Found by: previous frame's frame pointer 19:14:01 INFO - 5 libnss3.dylib + 0x142240 19:14:01 INFO - rsp = 0x000000012ed81030 rip = 0x000000010b442240 19:14:01 INFO - Found by: stack scanning Isn't the watchdog thread used to terminate hanging threads after ~60s? So is it hanging itself here? I will leave this bug in fx-ui tests for now given the low number of crashes. I will take another look if the crash is spiking up.
Maybe this is related to bug 1395504, which doesn't let Firefox successfully shutdown.
Depends on: 1395504
this started on september 26 as osx/debug failures for en-US local tests. here is a recent log file: https://public-artifacts.taskcluster.net/QkNSqMfwT8Sm446EvbEp6w/0/public/logs/live_backing.log and data from the log: 14:58:54 INFO - Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:160 14:58:55 INFO - #01: PR_GetThreadName[/Users/cltbld/tasks/task_1506462723/build/application/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x1443a9] 14:58:55 INFO - #02: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a] 14:58:55 INFO - #03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7] 14:59:50 INFO - TEST-UNEXPECTED-ERROR | test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show | IOError: Process killed because a requested application quit did not happen within 120s. Check gecko.log for errors. 14:59:50 INFO - Traceback (most recent call last): 14:59:50 INFO - File "/Users/cltbld/tasks/task_1506462723/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 137, in run 14:59:50 INFO - self.setUp() 14:59:50 INFO - File "/Users/cltbld/tasks/task_1506462723/build/tests/firefox-ui/tests/functional/sessionstore/test_tabbar_session_restore_button.py", line 99, in setUp 14:59:50 INFO - super(TestNoTabbarSessionRestoreButton, self).setUp(restore_button_pref=0) 14:59:50 INFO - File "/Users/cltbld/tasks/task_1506462723/build/tests/firefox-ui/tests/functional/sessionstore/test_tabbar_session_restore_button.py", line 24, in setUp 14:59:50 INFO - self.marionette.quit(in_app=True) 14:59:50 INFO - File "/Users/cltbld/tasks/task_1506462723/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _ 14:59:50 INFO - return func(*args, **kwargs) 14:59:50 INFO - File "/Users/cltbld/tasks/task_1506462723/ I am not familiar with the crash/error, possibly this is known already. I did some backfilling/retriggers: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=osx%20debug%20en-us%20local&tochange=44c358447666fd00d38fe9b72724d194b874e682&fromchange=eab929170ab0898bc23d88509efef9f8b3817152 Ideally when those results come in we will have an idea of when this started and how frequent it is. :whimboo, can you help prioritize this and ensure the root cause is documented.
This crash happens because a shutdown happened longer than 120s! Usually this should not happen because the background monitor of Firefox should kill any thread itself after ~60s during shutdown. But that didn't happen here. Maybe that is related to bug 1395504 and there was added on condition which also triggers it. But checking recent logs I also see such kind of assertions: 11:42:52 INFO - Assertion failure: tree.empty(), at /builds/worker/workspace/build/src/js/src/ds/MemoryProtectionExceptionHandler.cpp:67 11:42:52 INFO - Assertion failure: 0 == rv, at /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292 This brings me to bug 1404258, which was also filed recently due to the exact assertion error. Bobby removed the diagnostic code yesterday via bug 1403397. So in that case we should see a drop of failures today for that overlaying issue.
Depends on: 1404258
Not sure whom exactly to ask about this crash during a long shutdown. But maybe Mike can give us a hint.
The test here is covering session restore. It's component might not be the right, but better than Testing/Marionette.
:whimboo any idea why this would be osx only?
It might be bug 1395504 here too which is blocking Firefox from shutting down correctly. It's already in the dependency list.
If it helps, the spike started on Beta with this push: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=6e3b901aec51
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/a27d06c28634 Disable test_tabbar_session_restore_button.py TestNoTabbarSessionRestoreButton.test_pref_off_button_does_not_show on osx for frequent failures. r=me, a=test-only
please remember to enable this test when fixing the failure!
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
no traction in bug 1395504, I pinged again there- we will be at danger of disable-recommended by the end of the month.
in looking at this again, it looks fixed on oct 12th when I disabled the test- my mistake for overlooking the high number of failures.
This test doesn't exist anymore. Bug 1451709 removed it.
You need to log in before you can comment on or make changes to this bug.