Closed
Bug 753257
(tegra-049)
Opened 13 years ago
Closed 13 years ago
tegra-049 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslave][capacity])
So tegra-049 needs some TLC.
I was going through last-job-per-slave and saw this was over 2 weeks old, pingable but not telnetable (over 20701 like normal), so I killed cp, powercycled it then brought it up.
After it was up, tailing the run of cp itself I saw another obvious problem, that is likely a code problem.
2012-05-09 00:18:41,660 INFO MainProcess: Running verify code
2012-05-09 00:18:41,660 DEBUG MainProcess: calling [python /builds/sut_tools/verify.py tegra-049]
2012-05-09 00:24:13,128 DEBUG dialback: error during dialback loop
2012-05-09 00:24:13,130 DEBUG dialback: Traceback (most recent call last):
2012-05-09 00:24:13,131 DEBUG dialback:
2012-05-09 00:24:13,131 DEBUG dialback: File "clientproxy.py", line 219, in handleDialback
2012-05-09 00:24:13,131 DEBUG dialback: asyncore.loop(timeout=1, count=1)
2012-05-09 00:24:13,131 DEBUG dialback:
2012-05-09 00:24:13,131 DEBUG dialback: File "/opt/local/Library/Frameworks/Python.framework/Ver
sions/2.6/lib/python2.6/asyncore.py", line 214, in loop
2012-05-09 00:24:13,131 DEBUG dialback: poll_fun(timeout, map)
2012-05-09 00:24:13,132 DEBUG dialback:
2012-05-09 00:24:13,132 DEBUG dialback: File "/opt/local/Library/Frameworks/Python.framework/Ver
sions/2.6/lib/python2.6/asyncore.py", line 140, in poll
2012-05-09 00:24:13,132 DEBUG dialback: r, w, e = select.select(r, w, e, timeout)
2012-05-09 00:24:13,132 DEBUG dialback:
2012-05-09 00:24:13,132 DEBUG dialback: File "clientproxy.py", line 456, in handleSigTERM
2012-05-09 00:24:13,132 DEBUG dialback: db.close()
2012-05-09 00:24:13,133 DEBUG dialback:
2012-05-09 00:24:13,133 DEBUG dialback: AttributeError: 'Process' object has no attribute 'close'
2012-05-09 00:24:13,133 DEBUG dialback:
2012-05-09 00:24:13,133 DEBUG dialback: Traceback End
2012-05-09 00:24:13,133 INFO dialback: process shutting down
2012-05-09 00:24:13,133 DEBUG dialback: running all "atexit" finalizers with priority >= 0
2012-05-09 00:24:13,134 DEBUG dialback: running the remaining "atexit" finalizers
2012-05-09 00:24:13,134 INFO dialback: process exiting with exitcode 0
^C
bash-3.2$
So it lost the connection to tegra-049 shortly after it was brought up, from a powercycle, then hurt itself with a code-error where it can't recover itself.
Ran stop_cp.sh and moved to offline
Reporter | ||
Updated•13 years ago
|
Reporter | ||
Comment 1•13 years ago
|
||
So its back up, pingable. BUT we still cannot telnet in to it.
Was reimaged in Bug 752954 but did not help, filed Bug 754406 to have it pulled for ateam to investigate.
Whiteboard: [buildslave][capacity]
I ran mochitests 5-6 times, I rebooted it in succession 4 times. It came back up every time. It does take a little while - 20-30s to come back from a reboot, so perhaps this is a race condition in the clientproxy code where clientproxy thinks it rebooted but it is still working on starting up.
At any rate, I can't find anything to fix on this one. Going to go put it on Jake's desk with the others.
Updated•13 years ago
|
Whiteboard: [buildslave][capacity] → [buildduty][buildslave][capacity]
Comment 4•13 years ago
|
||
i've flash/formatted/reinitialized it and put it back into its rackspace
Comment 5•13 years ago
|
||
What do we do with this Tegra? It seems to be back, but it's not listed in the dashboard, so I don't know how to bring it up.
Assignee: jwatkins → bear
Comment 6•13 years ago
|
||
Added to foopy08, but it's down and not rescuable via pdu reboots.
Assignee: bear → nobody
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #6)
> Added to foopy08, but it's down and not rescuable via pdu reboots.
Ummm, can you take it back off here, unless there is in-fact only 11 tegras on foopy08 including this one?
$ python tegras_per_foopy.py
PRODUCTION:
foopy07 contains 11 tegras
foopy08 contains 11 tegras
foopy09 contains 11 tegras
foopy10 contains 11 tegras
foopy11 contains 11 tegras
foopy12 contains 13 tegras
foopy13 contains 13 tegras
foopy14 contains 13 tegras
foopy15 contains 13 tegras
foopy16 contains 13 tegras
foopy17 contains 13 tegras
foopy18 contains 13 tegras
foopy19 contains 13 tegras
foopy20 contains 13 tegras
foopy22 contains 13 tegras
foopy23 contains 13 tegras
foopy24 contains 13 tegras
We have 211 tegras in 17 foopies which means a ratio of 12 tegras per foopy
Either way, wherever you put it, it needs a tegras.json update from build-tools/buildfarm/mobile
Foopy08, from investigation can't reliably handle more than 11 tegras.
Assignee: nobody → aki
Updated•13 years ago
|
Assignee: aki → bugspam.Callek
Comment 8•13 years ago
|
||
Removed, still needs rescuing.
Comment 9•13 years ago
|
||
We have flashed and reimaged the Tegra board but it is still not coming up. Also tried a different switch port. Appears that the board is bad.
Van
Comment 10•13 years ago
|
||
Let's decommission it then. I'll file bugs to get monitoring udpated.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•12 years ago
|
||
So, DCOps claims this tegra is still happily attached to a PDU, but it is already removed from DNS.
We need to finish the decomm story here.
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Updated•12 years ago
|
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Updated•11 years ago
|
Assignee: bugspam.Callek → nobody
QA Contact: armenzg → bugspam.Callek
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•