Closed
Bug 994321
Opened 11 years ago
Closed 11 years ago
10.5 hour wait time for mtnlion try test builds due to around 50 mtnlion slaves breaking last night
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Unassigned)
Details
No description provided.
Reporter | ||
Updated•11 years ago
|
Summary: 10.5 hour wait time for 10.8 try test builds due to around 50 10.8 slaves braking last night → 10.5 hour wait time for mtnlion try test builds due to around 50 mtnlion slaves breaking last night
Reporter | ||
Comment 1•11 years ago
|
||
last job from one of the test slaves that stopped taking jobs:
00:01:54 INFO - Calling ['/builds/slave/talos-slave/test/build/venv/bin/python', '-u', '/builds/slave/talos-slave/test/build/tests/mochitest/runtests.py', '--appname=/builds/slave/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS/firefox', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/mwoodrow@mozilla.com-1061354610d7/try-macosx64/firefox-31.0a1.en-US.mac.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--autorun', '--close-when-done', '--console-level=INFO', '--quiet', '--chrome'] with output_timeout 1000
00:01:54 INFO - /builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/mozrunner/utils.py:19: UserWarning: Module manifestparser was already imported from /builds/slave/talos-slave/test/build/tests/mochitest/manifestparser.py, but /builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages is being added to sys.path
00:01:54 INFO - import pkg_resources
00:01:55 INFO - MochitestServer : launching [u'/builds/slave/talos-slave/test/build/tests/bin/xpcshell', '-g', '/builds/slave/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS', '-v', '170', '-f', '/builds/slave/talos-slave/test/build/tests/mochitest/httpd.js', '-e', "const _PROFILE_PATH = '/var/folders/vl/6t1nwr3x54v2b5fs6p8qnxjc00000w/T/tmp1ugbYm'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', './server.js']
00:01:55 INFO - runtests.py | Server pid: 1259
00:01:56 INFO - runtests.py | Websocket server pid: 1260
00:01:56 INFO - Warning: test_bug357450.js from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/content/base/test/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug380418.html^headers^ from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/content/base/test/chrome/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_operator_app_install.js from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/dom/apps/tests/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug336682.js from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/dom/events/test/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_offsets.css from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/dom/tests/mochitest/general/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_offsets.js from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/dom/tests/mochitest/general/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug883784.jsm from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/dom/workers/test/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug467669.css from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/layout/inspector/tests/chrome/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug695639.css from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/layout/inspector/tests/chrome/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug708874.css from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/layout/inspector/tests/chrome/chrome.ini is not a valid test
00:01:56 INFO - Warning: test_bug727834.css from manifest /builds/slave/talos-slave/test/build/tests/mochitest/chrome/layout/inspector/tests/chrome/chrome.ini is not a valid test
00:01:56 INFO - runtests.py | Running tests: start.
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - pk12util: PKCS12 IMPORT SUCCESSFUL
00:01:56 INFO - TEST-INFO | pk2util: exit 0
00:01:56 INFO - TEST-INFO | certutil: exit 0
00:01:56 INFO - INFO | runtests.py | SSL tunnel pid: 1269
00:01:56 INFO - INFO | runtests.py | Application pid: 1270
00:02:04 INFO - Apr 9 00:02:04 talos-mtnlion-r5-038.test.releng.scl3.mozilla.com firefox[1270] <Error>: clip: empty path.
00:02:04 INFO - Apr 9 00:02:04 talos-mtnlion-r5-038.test.releng.scl3.mozilla.com firefox[1270] <Error>: clip: empty path.
00:02:04 INFO - Apr 9 00:02:04 talos-mtnlion-r5-038.test.releng.scl3.mozilla.com firefox[1270] <Error>: clip: empty path.
Reporter | ||
Comment 2•11 years ago
|
||
comment 1 was talos-mtnlion-r5-038
slavealloc says it's still enabled.
I can not reach it:
jlund@Hastings163:~
> ping talos-mtnlion-r5-038.test.releng.scl3.mozilla.com
PING talos-mtnlion-r5-038.test.releng.scl3.mozilla.com (10.26.56.58): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
^C
--- talos-mtnlion-r5-038.test.releng.scl3.mozilla.com ping statistics ---
4 packets transmitted, 0 packets received, 100.0% packet loss
Reporter | ||
Comment 3•11 years ago
|
||
I can reach next slave in list that went down: talos-mtnlion-r5-050
buildbot.tac reports slave is disabled. slavealloc says it's enabled.
twistd.log just has last commend as last line of output: 'desktop_unittest.py --suite reftest'
but it's last job log points to same failure as well:
00:02:34 INFO - Apr 9 00:02:34 talos-mtnlion-r5-050.test.releng.scl3.mozilla.com firefox-bin[1255] <Error>: clip: empty path.
Reporter | ||
Comment 4•11 years ago
|
||
mac kernel issues involving 'clip:' logs:
https://bugzilla.mozilla.org/show_bug.cgi?id=536444#c5
not sure if this is related or why we lost so many mtnlion machines.
Reporter | ||
Comment 5•11 years ago
|
||
callek discovered that slaveapi wasn't able to reboot due to the new requirement of 'https' in our configs.
rebooting broken slaves now.
looks like a cset caused: <Error>: clip: empty path.
on a bunch of slaves which did not let them reboot.
Reporter | ||
Comment 6•11 years ago
|
||
still can not manually (ssh) reach almost all mtnlion machines.
slaveapi is still having issues so I can not reboot via that either
https://callek.pastebin.mozilla.org/4780984
Reporter | ||
Comment 7•11 years ago
|
||
mtnlion machines were hung hard from a bad cset in try: https://tbpl.mozilla.org/?tree=Try&rev=1061354610d7
buildbot set each of these builds to RETRY causing this build to eat away at many of our mtnlion machines. many reboots failed to recover even after pdu request.
here is what slaveapi filed after failed reboots:
994373 talos-mtnlion-r5-067 is unreachable 16:16:03
994375 talos-mtnlion-r5-085 is unreachable 16:16:29
994378 talos-mtnlion-r5-003 is unreachable 16:16:56
994379 talos-mtnlion-r5-063 is unreachable 16:17:10
994380 talos-mtnlion-r5-084 is unreachable 16:17:19
994381 talos-mtnlion-r5-079 is unreachable 16:17:47
994382 talos-mtnlion-r5-075 is unreachable 16:18:00
994383 talos-mtnlion-r5-069 is unreachable 16:18:02
994385 talos-mtnlion-r5-004 is unreachable 16:18:15
994387 talos-mtnlion-r5-081 is unreachable 16:18:28
994389 talos-mtnlion-r5-064 is unreachable 16:18:56
994390 talos-mtnlion-r5-060 is unreachable 16:21:09
994393 talos-mtnlion-r5-066 is unreachable 16:24:56
994396 talos-mtnlion-r5-008 is unreachable 16:25:13
994398 talos-mtnlion-r5-077 is unreachable 16:25:35
994399 talos-mtnlion-r5-002 is unreachable 16:25:58
994403 talos-mtnlion-r5-041 is unreachable 16:26:51
994404 talos-mtnlion-r5-073 is unreachable 16:27:04
Comment 8•11 years ago
|
||
Thanks Jordan. It looks like Van has resolved these, and I've begun rebooting them. So far looks good. Will update here again once reboots have finished.
Comment 9•11 years ago
|
||
Still problems with 7 slaves that can't be rebooted:
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-005
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-006
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-061
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-065
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-074
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-086
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=talos-mtnlion-r5&name=talos-mtnlion-r5-089
Slave API has open bugs on all of these.
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•