Closed
Bug 951704
Opened 11 years ago
Closed 10 years ago
10.6 talos jobs are starting to timeout since Dec. 9th
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: armenzg, Unassigned)
Details
On Dec. 9th [1], we started seeing a lot of talos timeouts [2] and it seems to be infra related. 13:07:57 ERROR - Traceback (most recent call last): 13:07:57 INFO - File "/tools/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner 13:07:57 INFO - self.run() 13:07:57 INFO - File "/tools/python27/lib/python2.7/threading.py", line 504, in run 13:07:57 INFO - self.__target(*self.__args, **self.__kwargs) 13:07:57 INFO - File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 710, in _processOutput 13:07:57 INFO - self.onTimeout() 13:07:57 INFO - File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/talosProcess.py", line 67, in onTimeout 13:07:57 CRITICAL - raise talosError("timeout") 13:07:57 CRITICAL - talosError: timeout There's a talos change on the 3rd [3] that it would be unlikely to be related but we would like to be sure through a try push [4]. This is happening for 10.6 talos svgr and tp5o jobs. I see it happening quite often on m-i: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=10.6.*talos but not on Aurora or Beta: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=10.6.*talos https://tbpl.mozilla.org/?tree=Mozilla-Beta&jobname=10.6.*talos I see the tbpl push robot reporting a bunch of beta and aurora jobs, however, it's worth pointing out that the talos lines where the exceptions are happening do not match exactly what we see on trunk trees (different talos code?): 16:03:16 ERROR - Traceback (most recent call last): 16:03:16 INFO - File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/run_tests.py", line 277, in run_tests 16:03:16 INFO - talos_results.add(mytest.runTest(browser_config, test)) 16:03:16 INFO - File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/ttest.py", line 407, in runTest 16:03:16 CRITICAL - raise talosError("timeout exceeded") Sources of external changes that could affect a talos job outside of in-code landings: * http://hg.mozilla.org/build/puppet * http://hg.mozilla.org/build/talos * http://hg.mozilla.org/build/buildbot-configs/graph * http://hg.mozilla.org/build/buildbotcustom/graph * http://hg.mozilla.org/build/mozharness/log/default/mozharness/mozilla/testing/talos.py * http://hg.mozilla.org/build/mozharness/log/d163222e0366/configs/talos/mac_config.py * Deployed python packages (which should show up on mozharness code) There were two reconfigs on that day: http://hg.mozilla.org/build/buildbot-configs/rev/d6e1e8576ad5 http://hg.mozilla.org/build/buildbot-configs/rev/e2705dee682b This landed on mozharness on that day: http://hg.mozilla.org/build/mozharness/rev/1913406e2d96 The puppet changes did not seem relevant. Nothing on buildbotcustom seems relevant. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=798219#c3067 [2] https://tbpl.mozilla.org/php/getParsedLog.php?id=32108592&tree=Mozilla-Inbound&full=1#error1 [3] http://hg.mozilla.org/build/talos/rev/2bcf422011d1 [4] https://tbpl.mozilla.org/?tree=Try&rev=e06e059534f9
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•