Closed Bug 1400580 Opened 7 years ago Closed 3 years ago

Intermittent TEST-UNEXPECTED-ERROR | damp | timeout

Categories

(DevTools :: Console, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Assigned: ochameau)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=131481341&repo=autoland

https://mozilla-releng-blobs.s3.amazonaws.com/blobs/Autoland/sha512/7292c49501f306be6752e989d39ec048410d0a9c7dce53c3327b675f268066d3299302a8242e6880a6247d4956ad98c58734d3a99f9af8d107eb02ad5125bbd1

08:26:32     INFO -  SUITE-START | Running 2 tests
08:26:32     INFO -  TEST-START | damp
08:26:32     INFO -  Initialising browser for damp test...
08:26:32     INFO -  Application command: /builds/slave/test/build/application/firefox/firefox  http://localhost:45957/getInfo.html -profile /tmp/tmpkEHyWm/profile
08:26:32     INFO -  TEST-INFO | started process 25764 (/builds/slave/test/build/application/firefox/firefox  http://localhost:45957/getInfo.html)
08:26:39     INFO -  TEST-INFO | 25764: exit 0
08:26:39     INFO -  Browser initialized.
08:26:39     INFO -  Running cycle 1/1 for damp test...
08:26:39     INFO -  Using env: {'DISPLAY': ':0',
08:26:39     INFO -   'HOME': '/home/cltbld',
08:26:39     INFO -   'JSGC_DISABLE_POISONING': '1',
08:26:39     INFO -   'LANG': 'en_US.UTF-8',
08:26:39     INFO -   'LANGUAGE': 'en_US:en',
08:26:39     INFO -   'LD_LIBRARY_PATH': '/builds/slave/test/build/application/firefox',
08:26:39     INFO -   'LOGNAME': 'cltbld',
08:26:39     INFO -   'MAIL': '/var/mail/cltbld',
08:26:39     INFO -   'MINIDUMP_SAVE_PATH': '/builds/slave/test/build/blobber_upload_dir',
08:26:39     INFO -   'MINIDUMP_STACKWALK': '/builds/slave/test/build/linux64-minidump_stackwalk',
08:26:39     INFO -   'MOZ_CRASHREPORTER': '1',
08:26:39     INFO -   'MOZ_CRASHREPORTER_NO_REPORT': '1',
08:26:39     INFO -   'MOZ_DISABLE_NONLOCAL_CONNECTIONS': '1',
08:26:39     INFO -   'MOZ_NO_REMOTE': '1',
08:26:39     INFO -   'MOZ_UPLOAD_DIR': '/builds/slave/test/build/blobber_upload_dir',
08:26:39     INFO -   'NODE_PATH': '/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript',
08:26:39     INFO -   'NO_EM_RESTART': '1',
08:26:39     INFO -   'PATH': '/builds/slave/test/build/venv/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games',
08:26:39     INFO -   'PROPERTIES_FILE': '/builds/slave/test/buildprops.json',
08:26:39     INFO -   'PWD': '/builds/slave/test',
08:26:39     INFO -   'PYTHONPATH': '/builds/slave/test/build/tests/talos',
08:26:39     INFO -   'RUST_BACKTRACE': 'full',
08:26:39     INFO -   'SCRIPTSPATH': '/builds/slave/test/scripts',
08:26:39     INFO -   'SHELL': '/bin/bash',
08:26:39     INFO -   'SHLVL': '1',
08:26:39     INFO -   'TERM': 'linux',
08:26:39     INFO -   'TMOUT': '86400',
08:26:39     INFO -   'USER': 'cltbld',
08:26:39     INFO -   'XDG_SESSION_COOKIE': '5cf47e208febbcd81da23b2e0000026f-1505574715.193907-2033749545',
08:26:39     INFO -   'XPCOM_DEBUG_BREAK': 'warn',
08:26:39     INFO -   '_': '/tools/buildbot/bin/python'}
08:26:39     INFO -  TEST-INFO | started process 26004 (/builds/slave/test/build/application/firefox/firefox -profile /tmp/tmpkEHyWm/profile -tp file:/builds/slave/test/build/tests/talos/talos/tests/devtools/damp.manifest.develop -tpchrome -tpnoisy -tploadnocache -tpcycles 1 -tppagecycles 25)
08:26:40     INFO -  PID 26004 |
08:26:40     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26052): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:40     INFO -  PID 26004 |
08:26:50     INFO -  PID 26004 | RSS: Main: 157704192
08:26:50     INFO -  PID 26004 |
08:26:50     INFO -  PID 26004 |
08:26:50     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26126): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:50     INFO -  PID 26004 |
08:26:52     INFO -  PID 26004 |
08:26:52     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26179): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:52     INFO -  PID 26004 |
08:26:54     INFO -  PID 26004 |
08:26:54     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26226): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:54     INFO -  PID 26004 |
08:26:55     INFO -  PID 26004 |
08:26:55     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26280): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:55     INFO -  PID 26004 |
08:26:56     INFO -  PID 26004 |
08:26:56     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26328): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:56     INFO -  PID 26004 |
08:26:57     INFO -  PID 26004 |
08:26:57     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26375): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:57     INFO -  PID 26004 |
08:26:58     INFO -  PID 26004 |
08:26:58     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26421): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:26:58     INFO -  PID 26004 |
08:27:00     INFO -  PID 26004 |
08:27:00     INFO -  PID 26004 | (/builds/slave/test/build/application/firefox/firefox:26469): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
08:27:00     INFO -  PID 26004 |
09:26:39     INFO -  Timeout waiting for test completion; killing browser...
09:26:39     INFO -  Terminating psutil.Process(pid=26004, name='firefox')
09:26:39     INFO -  PID 26004 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
09:26:39     INFO -  PID 26004 | ExceptionHandler::GenerateDump cloned child 27101
09:26:39     INFO -  PID 26004 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
09:26:39     INFO -  TEST-UNEXPECTED-ERROR | damp | timeout
09:26:39    ERROR -  Traceback (most recent call last):
09:26:39     INFO -    File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 281, in run_tests
09:26:39     INFO -      talos_results.add(mytest.runTest(browser_config, test))
09:26:39     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 61, in runTest
09:26:39     INFO -      return self._runTest(browser_config, test_config, setup)
09:26:39     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 209, in _runTest
09:26:39     INFO -      if counter_management else None),
09:26:39     INFO -    File "/builds/slave/test/build/tests/talos/talos/talos_process.py", line 130, in run_browser
09:26:39     INFO -      raise TalosError("timeout")
09:26:39     INFO -  TalosError: timeout
Whiteboard: [stockwell unknown]
https://wiki.mozilla.org/Bugmasters#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
There are 48 failures in the last 7 days, with a spike from 25th -> 27th of Nov.
They occur on Linux x64 and the affected build types are pgo and opt.
Here is a recent log example: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=148022423&lineNumber=2381
:rwood, can you please take a look?
Flags: needinfo?(rwood)
Whiteboard: [stockwell unknown] → [stockwell needswork]
It happens much more often on PGO vs OPT, and the recent failures seem pretty consistent in failing at reloading of 'complicated.netmonitor' (the test is timing out after 60 minutes of waiting for that reload):

15:02:44     INFO -  PID 7372 | Reload page on 'complicated.performance'
15:02:45     INFO -  PID 7372 | Close toolbox on 'complicated.performance'
15:02:46     INFO -  PID 7372 |
15:02:46     INFO -  PID 7372 | (/builds/slave/test/build/application/firefox/firefox:13575): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
15:02:46     INFO -  PID 7372 |
15:02:47     INFO -  PID 7372 | Open toolbox on 'complicated.netmonitor'
15:02:48     INFO -  PID 7372 | Reload page on 'complicated.netmonitor'
16:00:25     INFO -  Timeout waiting for test completion; killing browser...
16:00:25     INFO -  Terminating psutil.Process(pid=7372, name='firefox')

:bgrins, any ideas here as to why reload of 'complicated.performance' could be failing?
Flags: needinfo?(rwood) → needinfo?(bgrinstead)
(In reply to Robert Wood [:rwood] from comment #9)
> It happens much more often on PGO vs OPT, and the recent failures seem
> pretty consistent in failing at reloading of 'complicated.netmonitor' (the
> test is timing out after 60 minutes of waiting for that reload):
> 
> 15:02:44     INFO -  PID 7372 | Reload page on 'complicated.performance'
> 15:02:45     INFO -  PID 7372 | Close toolbox on 'complicated.performance'
> 15:02:46     INFO -  PID 7372 |
> 15:02:46     INFO -  PID 7372 |
> (/builds/slave/test/build/application/firefox/firefox:13575): Pango-WARNING
> **: error opening config file '/home/cltbld/.pangorc': Permission denied
> 15:02:46     INFO -  PID 7372 |
> 15:02:47     INFO -  PID 7372 | Open toolbox on 'complicated.netmonitor'
> 15:02:48     INFO -  PID 7372 | Reload page on 'complicated.netmonitor'
> 16:00:25     INFO -  Timeout waiting for test completion; killing browser...
> 16:00:25     INFO -  Terminating psutil.Process(pid=7372, name='firefox')
> 
> :bgrins, any ideas here as to why reload of 'complicated.performance' could
> be failing?

I think you mean 'complicated.netmonitor' reload? Forwarding the question to Alex as we've recently been making some changes as to what constitutes 'reload' in DAMP for various tools, although I don't see anything that looks particularly related for the netmonitor since Bug 1328553.
Flags: needinfo?(bgrinstead) → needinfo?(poirot.alex)
Yes sorry, 'netmonitor', thanks!
I think I identified something possibly wrong with this test while working on bug 1419327.
This huge spike of failure on November 26th is surprising, it looks like it regressed on this day and was fixed immediately?
Assignee: nobody → poirot.alex
Flags: needinfo?(poirot.alex)
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → INCOMPLETE
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=157365676&repo=autoland&lineNumber=6647

20:52:33     INFO -  PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709
20:52:33     INFO -  PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709
20:52:33     INFO -  PID 1743 | [Child 9109, Chrome_ChildThread] WARNING: pipe error: Broken pipe: file /builds/worker/workspace/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 709
20:52:33     INFO -  PID 1743 | Open toolbox on 'complicated.performance'
20:52:34     INFO -  PID 1743 | Garbage collect
20:52:35     INFO -  PID 1743 | Reload page on 'complicated.performance'
20:52:36     INFO -  PID 1743 | Close toolbox on 'complicated.performance'
20:52:36     INFO -  PID 1743 | Garbage collect
20:52:37     INFO -  PID 1743 |
20:52:37     INFO -  PID 1743 | (/builds/slave/test/build/application/firefox/firefox:9760): Pango-WARNING **: error opening config file '/home/cltbld/.pangorc': Permission denied
20:52:37     INFO -  PID 1743 |
20:52:38     INFO -  PID 1743 | Open toolbox on 'complicated.netmonitor'
20:52:39     INFO -  PID 1743 | Garbage collect
20:52:39     INFO -  PID 1743 | Reload page on 'complicated.netmonitor'
21:48:51     INFO -  Timeout waiting for test completion; killing browser...
21:48:51     INFO -  Terminating psutil.Process(pid=1743, name='firefox')
21:48:51     INFO -  PID 1743 | ExceptionHandler::GenerateDump cloned child 10482
21:48:51     INFO -  PID 1743 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
21:48:51     INFO -  PID 1743 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
21:48:51     INFO -  TEST-UNEXPECTED-ERROR | damp | timeout
21:48:51    ERROR -  Traceback (most recent call last):
21:48:51     INFO -    File "/builds/slave/test/build/tests/talos/talos/run_tests.py", line 289, in run_tests
21:48:51     INFO -      talos_results.add(mytest.runTest(browser_config, test))
21:48:51     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 62, in runTest
21:48:51     INFO -      return self._runTest(browser_config, test_config, setup)
21:48:51     INFO -    File "/builds/slave/test/build/tests/talos/talos/ttest.py", line 214, in _runTest
21:48:51     INFO -      debugger_args=browser_config['debugger_args']
21:48:51     INFO -    File "/builds/slave/test/build/tests/talos/talos/talos_process.py", line 139, in run_browser
21:48:51     INFO -      raise TalosError("timeout")
21:48:51     INFO -  TalosError: timeout
21:48:51     INFO -  TEST-INFO took 4502915ms
21:48:51     INFO -  SUITE-END | took 4502s
21:48:51     INFO -  PID 1743 | Sandbox: Unexpected EOF, op 0 flags 01101 path /tmp/GeckoChildCrash10085.extra
21:48:51    ERROR - Return code: 2
21:48:51  WARNING - setting return code to 2
21:48:51    ERROR - # TBPL FAILURE #
21:48:51     INFO - Running post-action listener: _package_coverage_data
21:48:51     INFO - Running post-action listener: _resource_record_post_action
21:48:51     INFO - [mozharness: 2018-01-19 05:48:51.940442Z] Finished run-tests step (success)
21:48:51     INFO - Running post-run listener: _resource_record_post_run
21:48:52     INFO - Total resource usage - Wall time: 4522s; CPU: 7.0%; Read bytes: 10911744; Write bytes: 513163264; Read time: 4120; Write time: 299708
21:48:52     INFO - TinderboxPrint: CPU usage<br/>6.8%
21:48:52     INFO - TinderboxPrint: I/O read bytes / time<br/>10,911,744 / 4,120
21:48:52     INFO - TinderboxPrint: I/O write bytes / time<br/>513,163,264 / 299,708
21:48:52     INFO - TinderboxPrint: CPU idle<br/>33,578.1 (93.2%)
21:48:52     INFO - TinderboxPrint: CPU system<br/>382.1 (1.1%)
21:48:52     INFO - TinderboxPrint: CPU user<br/>1,990.6 (5.5%)
21:48:52     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
21:48:52     INFO - install - Wall time: 19s; CPU: 13.0%; Read bytes: 0; Write bytes: 7876608; Read time: 0; Write time: 1928
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
On the new Windows 10 talos taskcluster hardware (see Bug 1431161) this issue (test hanging on damp 'complicated.netmonitor') happens consistently. In Bug 1431161 I'm going to turn off the 'complicated.netmonitor' part of the damp test.
(In reply to Robert Wood [:rwood] from comment #21)
> On the new Windows 10 talos taskcluster hardware (see Bug 1431161) this
> issue (test hanging on damp 'complicated.netmonitor') happens consistently.
> In Bug 1431161 I'm going to turn off the 'complicated.netmonitor' part of
> the damp test.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1431161#c14
Depends on: 1437028
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

:ochameau can you take a look over these intermittent failures ?

Flags: needinfo?(poirot.alex)
Component: Talos → Console
Product: Testing → DevTools
Version: Version 3 → unspecified

This timeout doesn't seem to occur anymore, or is it triaged in another bug?

Flags: needinfo?(poirot.alex)
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.