Closed Bug 1400580 Opened 7 years ago Closed 4 years ago

Intermittent TEST-UNEXPECTED-ERROR | damp | timeout

Categories

(DevTools :: Console, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Assigned: ochameau)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=131481341&repo=autoland https://mozilla-releng-blobs.s3.amazonaws.com/blobs/Autoland/sha512/7292c49501f306be6752e989d39ec048410d0a9c7dce53c3327b675f268066d3299302a8242e6880a6247d4956ad98c58734d3a99f9af8d107eb02ad5125bbd1 08:26:32 INFO - SUITE-START | Running 2 tests 08:26:32 INFO - TEST-START | damp 08:26:32 INFO - Initialising browser for damp test... 08:26:32 INFO - Application command: /builds/slave/test/build/application/firefox/firefox http://localhost:45957/getInfo.html -profile /tmp/tmpkEHyWm/profile 08:26:32 INFO - TEST-INFO | started process 25764 (/builds/slave/test/build/application/firefox/firefox http://localhost:45957/getInfo.html) 08:26:39 INFO - TEST-INFO | 25764: exit 0 08:26:39 INFO - Browser initialized. 08:26:39 INFO - Running cycle 1/1 for damp test... 08:26:39 INFO - Using env: {'DISPLAY': ':0', 08:26:39 INFO - 'HOME': '/home/cltbld', 08:26:39 INFO - 'JSGC_DISABLE_POISONING': '1', 08:26:39 INFO - 'LANG': 'en_US.UTF-8', 08:26:39 INFO - 'LANGUAGE': 'en_US:en', 08:26:39 INFO - 'LD_LIBRARY_PATH': '/builds/slave/test/build/application/firefox', 08:26:39 INFO - 'LOGNAME': 'cltbld', 08:26:39 INFO - 'MAIL': '/var/mail/cltbld', 08:26:39 INFO - 'MINIDUMP_SAVE_PATH': '/builds/slave/test/build/blobber_upload_dir', 08:26:39 INFO - 'MINIDUMP_STACKWALK': '/builds/slave/test/build/linux64-minidump_stackwalk', 08:26:39 INFO - 'MOZ_CRASHREPORTER': '1', 08:26:39 INFO - 'MOZ_CRASHREPORTER_NO_REPORT': '1', 08:26:39 INFO - 'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1', 08:26:39 INFO - 'MOZ_NO_REMOTE': '1', 08:26:39 INFO - 'MOZ_UPLOAD_DIR': '/builds/slave/test/build/blobber_upload_dir', 08:26:39 INFO - 'NODE_PATH': '/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript', 08:26:39 INFO - 'NO_EM_RESTART': '1', 08:26:39 INFO - 'PATH': '/builds/slave/test/build/venv/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games', 08:26:39 INFO - 'PROPERTIES_FILE': '/builds/slave/test/buildprops.json', 08:26:39 INFO - 'PWD': '/builds/slave/test', 08:26:39 INFO - 'PYTHONPATH': '/builds/slave/test/build/tests/talos', 08:26:39 INFO - 'RUST_BACKTRACE': 'full', 08:26:39 INFO - 'SCRIPTSPATH': '/builds/slave/test/scripts', 08:26:39 INFO - 'SHELL': '/bin/bash', 08:26:39 INFO - 'SHLVL': '1', 08:26:39 INFO - 'TERM': 'linux', 08:26:39 INFO - 'TMOUT': '86400', 08:26:39 INFO - 'USER': 'cltbld', 08:26:39 INFO - 'XDG_SESSION_COOKIE': '5cf47e208febbcd81da23b2e0000026f-1505574715.193907-2033749545', 08:26:39 INFO - 'XPCOM_DEBUG_BREAK': 'warn', 08:26:39 INFO - '_': '/tools/buildbot/bin/python'} 08:26:39 INFO - TEST-INFO | started process 26004 (/builds/slave/test/build/application/firefox/firefox -profile /tmp/tmpkEHyWm/profile -tp file:/builds/slave/test/build/tests/talos/talos/tests/devtools/damp.manifest.develop -tpchrome -tpnoisy -tploadnocache -tpcycles 1 -tppagecycles 25) 08:26:40 INFO - PID 26004 | 08:26:40 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26052): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:40 INFO - PID 26004 | 08:26:50 INFO - PID 26004 | RSS: Main: 157704192 08:26:50 INFO - PID 26004 | 08:26:50 INFO - PID 26004 | 08:26:50 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26126): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:50 INFO - PID 26004 | 08:26:52 INFO - PID 26004 | 08:26:52 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26179): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:52 INFO - PID 26004 | 08:26:54 INFO - PID 26004 | 08:26:54 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26226): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:54 INFO - PID 26004 | 08:26:55 INFO - PID 26004 | 08:26:55 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26280): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:55 INFO - PID 26004 | 08:26:56 INFO - PID 26004 | 08:26:56 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26328): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:56 INFO - PID 26004 | 08:26:57 INFO - PID 26004 | 08:26:57 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26375): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:57 INFO - PID 26004 | 08:26:58 INFO - PID 26004 | 08:26:58 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26421): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:26:58 INFO - PID 26004 | 08:27:00 INFO - PID 26004 | 08:27:00 INFO - PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26469): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 08:27:00 INFO - PID 26004 | 09:26:39 INFO - Timeout waiting for test completion; killing browser... 09:26:39 INFO - Terminating psutil.Process(pid=26004, name='firefox') 09:26:39 INFO - PID 26004 | ExceptionHandler::WaitForContinueSignal waiting for continue signal... 09:26:39 INFO - PID 26004 | ExceptionHandler::GenerateDump cloned child 27101 09:26:39 INFO - PID 26004 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child 09:26:39 INFO - TEST-UNEXPECTED-ERROR | damp | timeout 09:26:39 ERROR - Traceback (most recent call last): 09:26:39 INFO - File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 281, in run_tests 09:26:39 INFO - talos_results.add(mytest.runTest(browser_config, test)) 09:26:39 INFO - File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 61, in runTest 09:26:39 INFO - return self._runTest(browser_config, test_config, setup) 09:26:39 INFO - File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 209, in _runTest 09:26:39 INFO - if counter_management else None), 09:26:39 INFO - File "/builds/slave/test/build/tests/talos/talos/talos_process.py", line 130, in run_browser 09:26:39 INFO - raise TalosError("timeout") 09:26:39 INFO - TalosError: timeout
Whiteboard: [stockwell unknown]
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
There are 48 failures in the last 7 days, with a spike from 25th -> 27th of Nov. They occur on Linux x64 and the affected build types are pgo and opt. Here is a recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=148022423&lineNumber=2381 :rwood, can you please take a look?
Flags: needinfo?(rwood)
Whiteboard: [stockwell unknown] → [stockwell needswork]
It happens much more often on PGO vs OPT, and the recent failures seem pretty consistent in failing at reloading of 'complicated.netmonitor' (the test is timing out after 60 minutes of waiting for that reload): 15:02:44 INFO - PID 7372 | Reload page on 'complicated.performance' 15:02:45 INFO - PID 7372 | Close toolbox on 'complicated.performance' 15:02:46 INFO - PID 7372 | 15:02:46 INFO - PID 7372 | (/builds/slave/test/build/application/firefox/firefox:13575): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 15:02:46 INFO - PID 7372 | 15:02:47 INFO - PID 7372 | Open toolbox on 'complicated.netmonitor' 15:02:48 INFO - PID 7372 | Reload page on 'complicated.netmonitor' 16:00:25 INFO - Timeout waiting for test completion; killing browser... 16:00:25 INFO - Terminating psutil.Process(pid=7372, name='firefox') :bgrins, any ideas here as to why reload of 'complicated.performance' could be failing?
Flags: needinfo?(rwood) → needinfo?(bgrinstead)
(In reply to Robert Wood [:rwood] from comment #9) > It happens much more often on PGO vs OPT, and the recent failures seem > pretty consistent in failing at reloading of 'complicated.netmonitor' (the > test is timing out after 60 minutes of waiting for that reload): > > 15:02:44 INFO - PID 7372 | Reload page on 'complicated.performance' > 15:02:45 INFO - PID 7372 | Close toolbox on 'complicated.performance' > 15:02:46 INFO - PID 7372 | > 15:02:46 INFO - PID 7372 | > (/builds/slave/test/build/application/firefox/firefox:13575): Pango-WARNING > **: error opening config file '/home/cltbld/.pangorc': Permission denied > 15:02:46 INFO - PID 7372 | > 15:02:47 INFO - PID 7372 | Open toolbox on 'complicated.netmonitor' > 15:02:48 INFO - PID 7372 | Reload page on 'complicated.netmonitor' > 16:00:25 INFO - Timeout waiting for test completion; killing browser... > 16:00:25 INFO - Terminating psutil.Process(pid=7372, name='firefox') > > :bgrins, any ideas here as to why reload of 'complicated.performance' could > be failing? I think you mean 'complicated.netmonitor' reload? Forwarding the question to Alex as we've recently been making some changes as to what constitutes 'reload' in DAMP for various tools, although I don't see anything that looks particularly related for the netmonitor since Bug 1328553.
Flags: needinfo?(bgrinstead) → needinfo?(poirot.alex)
Yes sorry, 'netmonitor', thanks!
I think I identified something possibly wrong with this test while working on bug 1419327. This huge spike of failure on November 26th is surprising, it looks like it regressed on this day and was fixed immediately?
Assignee: nobody → poirot.alex
Flags: needinfo?(poirot.alex)
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → INCOMPLETE
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=157365676&repo=autoland&lineNumber=6647 20:52:33 INFO - PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709 20:52:33 INFO - PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709 20:52:33 INFO - PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709 20:52:33 INFO - PID 1743 | Open toolbox on 'complicated.performance' 20:52:34 INFO - PID 1743 | Garbage collect 20:52:35 INFO - PID 1743 | Reload page on 'complicated.performance' 20:52:36 INFO - PID 1743 | Close toolbox on 'complicated.performance' 20:52:36 INFO - PID 1743 | Garbage collect 20:52:37 INFO - PID 1743 | 20:52:37 INFO - PID 1743 | (/builds/slave/test/build/application/firefox/firefox:9760): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied 20:52:37 INFO - PID 1743 | 20:52:38 INFO - PID 1743 | Open toolbox on 'complicated.netmonitor' 20:52:39 INFO - PID 1743 | Garbage collect 20:52:39 INFO - PID 1743 | Reload page on 'complicated.netmonitor' 21:48:51 INFO - Timeout waiting for test completion; killing browser... 21:48:51 INFO - Terminating psutil.Process(pid=1743, name='firefox') 21:48:51 INFO - PID 1743 | ExceptionHandler::GenerateDump cloned child 10482 21:48:51 INFO - PID 1743 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child 21:48:51 INFO - PID 1743 | ExceptionHandler::WaitForContinueSignal waiting for continue signal... 21:48:51 INFO - TEST-UNEXPECTED-ERROR | damp | timeout 21:48:51 ERROR - Traceback (most recent call last): 21:48:51 INFO - File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 289, in run_tests 21:48:51 INFO - talos_results.add(mytest.runTest(browser_config, test)) 21:48:51 INFO - File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 62, in runTest 21:48:51 INFO - return self._runTest(browser_config, test_config, setup) 21:48:51 INFO - File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 214, in _runTest 21:48:51 INFO - debugger_args=browser_config['debugger_args'] 21:48:51 INFO - File "/builds/slave/test/build/tests/talos/talos/talos_process.py", line 139, in run_browser 21:48:51 INFO - raise TalosError("timeout") 21:48:51 INFO - TalosError: timeout 21:48:51 INFO - TEST-INFO took 4502915ms 21:48:51 INFO - SUITE-END | took 4502s 21:48:51 INFO - PID 1743 | Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash10085.extra 21:48:51 ERROR - Return code: 2 21:48:51 WARNING - setting return code to 2 21:48:51 ERROR - # TBPL FAILURE # 21:48:51 INFO - Running post-action listener: _package_coverage_data 21:48:51 INFO - Running post-action listener: _resource_record_post_action 21:48:51 INFO - [mozharness: 2018-01-19 05:48:51.940442Z] Finished run-tests step (success) 21:48:51 INFO - Running post-run listener: _resource_record_post_run 21:48:52 INFO - Total resource usage - Wall time: 4522s; CPU: 7.0%; Read bytes: 10911744; Write bytes: 513163264; Read time: 4120; Write time: 299708 21:48:52 INFO - TinderboxPrint: CPU usage<br/>6.8% 21:48:52 INFO - TinderboxPrint: I/O read bytes / time<br/>10,911,744 / 4,120 21:48:52 INFO - TinderboxPrint: I/O write bytes / time<br/>513,163,264 / 299,708 21:48:52 INFO - TinderboxPrint: CPU idle<br/>33,578.1 (93.2%) 21:48:52 INFO - TinderboxPrint: CPU system<br/>382.1 (1.1%) 21:48:52 INFO - TinderboxPrint: CPU user<br/>1,990.6 (5.5%) 21:48:52 INFO - TinderboxPrint: Swap in / out<br/>0 / 0 21:48:52 INFO - install - Wall time: 19s; CPU: 13.0%; Read bytes: 0; Write bytes: 7876608; Read time: 0; Write time: 1928
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
On the new Windows 10 talos taskcluster hardware (see Bug 1431161) this issue (test hanging on damp 'complicated.netmonitor') happens consistently. In Bug 1431161 I'm going to turn off the 'complicated.netmonitor' part of the damp test.
(In reply to Robert Wood [:rwood] from comment #21) > On the new Windows 10 talos taskcluster hardware (see Bug 1431161) this > issue (test hanging on damp 'complicated.netmonitor') happens consistently. > In Bug 1431161 I'm going to turn off the 'complicated.netmonitor' part of > the damp test. See https://bugzilla.mozilla.org/show_bug.cgi?id=1431161#c14
Depends on: 1437028
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

:ochameau can you take a look over these intermittent failures ?

Flags: needinfo?(poirot.alex)
Component: Talos → Console
Product: Testing → DevTools
Version: Version 3 → unspecified

This timeout doesn't seem to occur anymore, or is it triaged in another bug?

Flags: needinfo?(poirot.alex)
Status: REOPENED → RESOLVED
Closed: 6 years ago4 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.