Closed Bug 1525281 Opened 5 years ago Closed 2 years ago

Intermittent Last test finished | application timed out after 370 seconds with no output

Categories

(Testing :: Mochitest, defect, P5)

Version 3
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell unknown])

#[markdown(off)]
Filed by: apavel [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=226133317&repo=autoland

https://queue.taskcluster.net/v1/task/UE7jhBpzSiu3H5Yg0dImpA/runs/0/artifacts/public/logs/live_backing.log

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/UE7jhBpzSiu3H5Yg0dImpA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

12:57:18 INFO - REFTEST TEST-START | file:///Z:/task_1549370432/build/tests/reftest/tests/layout/reftests/position-dynamic-changes/mixed-abspos-root/mixed-dimentionA.html?margin_parent == file:///Z:/task_1549370432/build/tests/reftest/tests/layout/reftests/position-dynamic-changes/mixed-abspos-root/mixed-dimentionA-ref.html?margin_parent

12:57:19 INFO - REFTEST INFO | Running tests in file:///Z:/task_1549370432/build/tests/reftest/tests/layout/reftests/transform-3d/reftest.list
12:57:19 INFO - REFTEST INFO | Running with e10s: True
12:57:19 INFO - REFTEST INFO | Application command: Z:\task_1549370432\build\application\firefox\firefox.exe -marionette --wait-for-browser -profile c:\users\task_1549370432\appdata\local\temp\tmpv61iyw.mozrunner
12:57:20 INFO - 1549371440302 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
12:57:20 INFO - 1549371440302 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
12:57:20 INFO - 1549371440303 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
12:57:20 INFO - 1549371440488 Marionette TRACE Received observer notification profile-after-change
12:57:20 INFO - 1549371440553 Marionette TRACE Received observer notification toplevel-window-ready
12:57:20 INFO - 1549371440646 Marionette TRACE Received observer notification command-line-startup
12:57:20 INFO - 1549371440647 Marionette TRACE Received observer notification nsPref:changed
12:57:20 INFO - 1549371440647 Marionette DEBUG Init aborted (running=false, enabled=true, finalUIStartup=false)
13:03:30 ERROR - REFTEST ERROR | Last test finished | application timed out after 370 seconds with no output
13:03:30 ERROR - REFTEST ERROR | Force-terminating active process(es).
13:03:30 INFO - REFTEST TEST-INFO | started process screenshot
13:03:30 INFO - REFTEST TEST-INFO | screenshot: exit 0
13:03:30 INFO - TEST-INFO | crashinject: exit 0
13:06:35 INFO - WARNING | IO Completion Port failed to signal process shutdown
13:06:35 INFO - Parent process 2344 exited with children alive:
13:06:35 INFO - PIDS: 3080, 3448
13:06:35 INFO - Attempting to kill them, but no guarantee of success
13:06:35 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 3221225477
13:06:35 INFO - Traceback (most recent call last):
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 972, in <module>
13:06:35 INFO - sys.exit(run_test_harness(parser, options))
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 964, in run_test_harness
13:06:35 INFO - result = reftest.runTests(options.tests, options)
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 564, in runTests
13:06:35 INFO - return self.runSerialTests(manifests, options, cmdargs)
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 912, in runSerialTests
13:06:35 INFO - status = run(tests=tests)
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 889, in run
13:06:35 INFO - **kwargs)
13:06:35 INFO - File "Z:\task_1549370432\build\tests\reftest\runreftest.py", line 788, in runApp
13:06:35 INFO - marionette.start_session()
13:06:35 INFO - File "Z:\task_1549370432\build\venv\lib\site-packages\marionette_driver\decorators.py", line 36, in _
13:06:35 INFO - m._handle_socket_failure()
13:06:35 INFO - File "Z:\task_1549370432\build\venv\lib\site-packages\marionette_driver\marionette.py", line 816, in _handle_socket_failure
13:06:35 INFO - reraise(exc, val, tb)
13:06:35 INFO - File "Z:\task_1549370432\build\venv\lib\site-packages\marionette_driver\decorators.py", line 26, in _
13:06:35 INFO - return func(*args, **kwargs)
13:06:35 INFO - File "Z:\task_1549370432\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1274, in start_session
13:06:35 INFO - self.raise_for_port(timeout=timeout)
13:06:35 INFO - File "Z:\task_1549370432\build\venv\lib\site-packages\marionette_driver\marionette.py", line 735, in raise_for_port
13:06:35 INFO - self.host, self.port))
13:06:35 INFO - socket.timeout: Timed out waiting for connection on 127.0.0.1:2828!
13:06:35 ERROR - Return code: 1

See Also: → 1570514

(In reply to Cristina Coroiu [:ccoroiu] from comment #20)

Retriggered:
https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=os%2Cx%2C10.14%2Cshippable%2Copt%2Cmochitests%2Ctest-macosx1014-64-shippable%2Fopt-browser-screenshots-e10s%2Cm%28ss%29&tochange=f5f5cbb377ac92a59c26a0f1de8b971ef878dee3&fromchange=4a66bfcbaca59d9aa9352431941773a91b77deee&selectedJob=259810594

It seems that this started to fail with:
https://hg.mozilla.org/integration/autoland/rev/8d423bbe5469cab10053b20a929dd949acf8222b
https://bugzilla.mozilla.org/show_bug.cgi?id=1555454

Edwin, can you please take a look?

I migrated the platform on which the test are run from macosx1010 to macosx1014, and that platform is sometimes prone to this failure (a known issue with bugs filed for it).

I'll try further disabling the test that is causing this timeout to see if that helps.

Flags: needinfo?(egao)

There are 21 total failures in the last 7 days on linux64 debug

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270486776&repo=autoland&lineNumber=33715

[task 2019-10-09T14:13:22.990Z] 14:13:22 INFO - TEST-START | browser/components/extensions/test/browser/browser_ext_windows_update.js
[task 2019-10-09T14:13:23.006Z] 14:13:23 INFO - GECKO(1158) | ++DOCSHELL 0x7fabe0e10000 == 41 [pid = 1158] [id = {53aff1c3-2dff-4b37-ba28-f0caf37bc365}]
[task 2019-10-09T14:13:23.006Z] 14:13:23 INFO - GECKO(1158) | ++DOMWINDOW == 163 (0x7fac3059e6a0) [pid = 1158] [serial = 1253] [outer = (nil)]
[task 2019-10-09T14:13:23.006Z] 14:13:23 INFO - GECKO(1158) | ++DOMWINDOW == 164 (0x7fabe9c9d400) [pid = 1158] [serial = 1254] [outer = 0x7fac3059e6a0]
[task 2019-10-09T14:13:23.183Z] 14:13:23 INFO - GECKO(1158) | [Parent 1158, Main Thread] WARNING: Attempting to get a displayport from a content with no primary frame!: file /builds/worker/workspace/build/src/layout/base/nsLayoutUtils.cpp, line 767
[task 2019-10-09T14:13:23.353Z] 14:13:23 INFO - GECKO(1158) | ++DOCSHELL 0x7fabfa39c800 == 42 [pid = 1158] [id = {33fed89c-671b-4905-b97e-5dbf7f1ca45b}]

[task 2019-10-09T14:14:57.062Z] 14:14:57 INFO - GECKO(1158) | Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:213
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | #01: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO -
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | #02: libpthread.so.0 + 0x76ba
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO -
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | #03: libc.so.6 + 0x10741d
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO -
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | #04: ??? (???:???)
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | ExceptionHandler::GenerateDump cloned child 8242
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-10-09T14:14:57.099Z] 14:14:57 INFO - GECKO(1158) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-10-09T14:16:51.202Z] 14:16:51 INFO - GECKO(1158) | [Child 4421, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/ipc/glue/ProtocolUtils.cpp, line 481
[task 2019-10-09T14:16:51.202Z] 14:16:51 INFO - GECKO(1158) | [Child 4421, Main Thread] WARNING: '!ipcActor->SendAccumulateChildKeyedHistograms(keyedHistogramsToSend)', file /builds/worker/workspace/build/src/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp, line 286
[task 2019-10-09T14:16:51.203Z] 14:16:51 INFO - GECKO(1158) | [Child 4421, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/ipc/glue/ProtocolUtils.cpp, line 481
[task 2019-10-09T14:16:51.203Z] 14:16:51 INFO - GECKO(1158) | [Child 4421, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/workspace/build/src/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp, line 301
[task 2019-10-09T14:23:01.221Z] 14:23:01 INFO - Buffered messages finished
[task 2019-10-09T14:23:01.221Z] 14:23:01 ERROR - TEST-UNEXPECTED-TIMEOUT | Last test finished | application timed out after 370 seconds with no output
[task 2019-10-09T14:23:01.222Z] 14:23:01 ERROR - Force-terminating active process(es).
[task 2019-10-09T14:23:01.222Z] 14:23:01 INFO - Determining child pids from psutil...
[task 2019-10-09T14:23:01.223Z] 14:23:01 INFO - []
[task 2019-10-09T14:23:01.223Z] 14:23:01 INFO - ==> process 1158 launched child process 1176

Andrew can you take a look or assign someone?

Flags: needinfo?(ahal)
Whiteboard: [stockwell needswork:owner]

Looks like this is almost exclusively happening on fission, I'll defer to :kmag here.

Flags: needinfo?(ahal)

There was a spike between the 17th and 19th of november with 31 total failures.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276915217&repo=autoland&lineNumber=12717

[task 2019-11-19T04:55:47.955Z] 04:55:47 INFO - GECKO(6628) | XPCOM_MEM_BLOAT_LOG: /tmp/tmpB06fI9.mozrunner/runtests_leaks.log
[task 2019-11-19T04:55:47.955Z] 04:55:47 INFO - GECKO(6628) | Writing to log: /tmp/tmpB06fI9.mozrunner/runtests_leaks_default_pid6628.log
[task 2019-11-19T04:55:47.955Z] 04:55:47 INFO - GECKO(6628) | Hit MOZ_CRASH(Shutdown too long, probably frozen, causing a crash.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:217
[task 2019-11-19T04:55:47.995Z] 04:55:47 INFO - GECKO(6628) | #01: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-11-19T04:55:47.995Z] 04:55:47 INFO -
[task 2019-11-19T04:55:47.996Z] 04:55:47 INFO - GECKO(6628) | #02: libpthread.so.0 + 0x76ba
[task 2019-11-19T04:55:47.996Z] 04:55:47 INFO -
[task 2019-11-19T04:55:47.996Z] 04:55:47 INFO - GECKO(6628) | #03: libc.so.6 + 0x10741d
[task 2019-11-19T04:55:47.997Z] 04:55:47 INFO -
[task 2019-11-19T04:55:47.997Z] 04:55:47 INFO - GECKO(6628) | #04: ??? (???:???)
[task 2019-11-19T04:55:47.998Z] 04:55:47 INFO - GECKO(6628) | ExceptionHandler::GenerateDump cloned child 7602
[task 2019-11-19T04:55:47.998Z] 04:55:47 INFO - GECKO(6628) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2019-11-19T04:55:47.999Z] 04:55:47 INFO - GECKO(6628) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2019-11-19T05:01:57.997Z] 05:01:57 INFO - Buffered messages finished
[task 2019-11-19T05:01:57.998Z] 05:01:57 ERROR - TEST-UNEXPECTED-TIMEOUT | Last test finished | application timed out after 370 seconds with no output
[task 2019-11-19T05:01:57.998Z] 05:01:57 ERROR - Force-terminating active process(es).
[task 2019-11-19T05:01:57.999Z] 05:01:57 INFO - Determining child pids from psutil...
[task 2019-11-19T05:01:58.006Z] 05:01:58 INFO - []
[task 2019-11-19T05:01:58.007Z] 05:01:58 INFO - ==> process 6628 launched child process 6646
[task 2019-11-19T05:01:58.008Z] 05:01:58 INFO - ==> process 6628 launched child process 6678
[task 2019-11-19T05:01:58.008Z] 05:01:58 INFO - ==> process 6628 launched child process 6700
[task 2019-11-19T05:01:58.009Z] 05:01:58 INFO - ==> process 6628 launched child process 6764
[task 2019-11-19T05:01:58.009Z] 05:01:58 INFO - ==> process 6628 launched child process 6792

Joel any suggestion here since there is no reply in a month?

Flags: needinfo?(jmaher)
Whiteboard: [stockwell unknown] → [stockwell needswork]

This failure is seen after the toolkit/components/reader/test/ directory is run, specifically on linux64/debug/fission. The browser doesn't shutdown properly and is hung, then we force kill the browser.

There are 6 tests in the manifest:
https://searchfox.org/mozilla-central/source/toolkit/components/reader/test/browser.ini

One is skipped on windows and fission as of a few days ago. The next step is to do a bunch of try pushes and see which of these tests (could be one or more) is causing the browser to not terminate with fission. Once identified we can skip on linux/debug/fission.

:bc, can you bisect this down with a bunch of try pushes and find the culprit?

Flags: needinfo?(jmaher) → needinfo?(bob)

try runs with single test per run

  • browser_readerMode_readingTime 5/20 (0 leak)
  • browser_bug1453818_samesite_cookie 1/20 (0 leak)
  • browser_bug1124271_readerModePinnedTab 3/20 (1 leak)
  • browser_readerMode_with_anchor 1/20 (1 leak)
  • browser_readerMode_hidden_nodes 0/20 (0 leak)
  • browser_readerMode 3/20 (5 leak)

Probably with more runs all would leak and/or hang.

Flags: needinfo?(bob)

:kmag, can we disable all but browser_readerMode_hidden_nodes for fission linux64 debug?

We should not disable these. I have a better workaround that does not disable the test entirely. The most recent spike is likely caused by bug 1586139, which re-enabled these tests with Fission.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(kmadan)

thanks :mccr8, I will trust that you can get these errors resolved or significantly reduced.

Depends on: 1598075

Thank you!

Whiteboard: [stockwell needswork]

Some of the failures here are actually bug 1358898

No failures since the 17th.

Clearing the needinfo for now as the problem seems to have subsided. Please needinfo again if it picks back up.

Aside: I think this is one of those bugs that likely has many issues rolled up into it due to the generic title. We should try avoid classifying issues against this bug if there is a more specific log message. And in the absence of something more specific, we should try to surface something better in the logs.

Flags: needinfo?(ahal)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.