753257 - (tegra-049) tegra-049 problem tracking

Reporter

Description

•

13 years ago

So tegra-049 needs some TLC. I was going through last-job-per-slave and saw this was over 2 weeks old, pingable but not telnetable (over 20701 like normal), so I killed cp, powercycled it then brought it up. After it was up, tailing the run of cp itself I saw another obvious problem, that is likely a code problem. 2012-05-09 00:18:41,660 INFO MainProcess: Running verify code 2012-05-09 00:18:41,660 DEBUG MainProcess: calling [python /builds/sut_tools/verify.py tegra-049] 2012-05-09 00:24:13,128 DEBUG dialback: error during dialback loop 2012-05-09 00:24:13,130 DEBUG dialback: Traceback (most recent call last): 2012-05-09 00:24:13,131 DEBUG dialback: 2012-05-09 00:24:13,131 DEBUG dialback: File "clientproxy.py", line 219, in handleDialback 2012-05-09 00:24:13,131 DEBUG dialback: asyncore.loop(timeout=1, count=1) 2012-05-09 00:24:13,131 DEBUG dialback: 2012-05-09 00:24:13,131 DEBUG dialback: File "/opt/local/Library/Frameworks/Python.framework/Ver sions/2.6/lib/python2.6/asyncore.py", line 214, in loop 2012-05-09 00:24:13,131 DEBUG dialback: poll_fun(timeout, map) 2012-05-09 00:24:13,132 DEBUG dialback: 2012-05-09 00:24:13,132 DEBUG dialback: File "/opt/local/Library/Frameworks/Python.framework/Ver sions/2.6/lib/python2.6/asyncore.py", line 140, in poll 2012-05-09 00:24:13,132 DEBUG dialback: r, w, e = select.select(r, w, e, timeout) 2012-05-09 00:24:13,132 DEBUG dialback: 2012-05-09 00:24:13,132 DEBUG dialback: File "clientproxy.py", line 456, in handleSigTERM 2012-05-09 00:24:13,132 DEBUG dialback: db.close() 2012-05-09 00:24:13,133 DEBUG dialback: 2012-05-09 00:24:13,133 DEBUG dialback: AttributeError: 'Process' object has no attribute 'close' 2012-05-09 00:24:13,133 DEBUG dialback: 2012-05-09 00:24:13,133 DEBUG dialback: Traceback End 2012-05-09 00:24:13,133 INFO dialback: process shutting down 2012-05-09 00:24:13,133 DEBUG dialback: running all "atexit" finalizers with priority >= 0 2012-05-09 00:24:13,134 DEBUG dialback: running the remaining "atexit" finalizers 2012-05-09 00:24:13,134 INFO dialback: process exiting with exitcode 0 ^C bash-3.2$ So it lost the connection to tegra-049 shortly after it was brought up, from a powercycle, then hurt itself with a code-error where it can't recover itself. Ran stop_cp.sh and moved to offline

Justin Wood (:Callek)

Reporter

Updated

•

13 years ago

No longer blocks: 752954

Depends on: 752954

Justin Wood (:Callek)

Reporter

Updated

•

13 years ago

Depends on: 754406

Justin Wood (:Callek)

Reporter

Comment 1

•

13 years ago

So its back up, pingable. BUT we still cannot telnet in to it. Was reimaged in Bug 752954 but did not help, filed Bug 754406 to have it pulled for ateam to investigate.

Whiteboard: [buildslave][capacity]

cmtalbert

Comment 2

•

13 years ago

I ran mochitests 5-6 times, I rebooted it in succession 4 times. It came back up every time. It does take a little while - 20-30s to come back from a reboot, so perhaps this is a race condition in the clientproxy code where clientproxy thinks it rebooted but it is still working on starting up. At any rate, I can't find anything to fix on this one. Going to go put it on Jake's desk with the others.

Aki Sasaki (not active)

Updated

•

13 years ago

Whiteboard: [buildslave][capacity] → [buildduty][buildslave][capacity]

Chris AtLee [:catlee]

Comment 3

•

13 years ago

Jake - what's up with this tegra?

Assignee: nobody → jwatkins

Jake Watkins [:dividehex]

•

13 years ago

Assignee: aki → bugspam.Callek

Aki Sasaki (not active)

Comment 8

•

13 years ago

Removed, still needs rescuing.

Van Le [:van]

Comment 9

•

13 years ago

We have flashed and reimaged the Tegra board but it is still not coming up. Also tried a different switch port. Appears that the board is bad. Van

Chris Cooper [:coop] (he/him)

Comment 10

•

13 years ago

Let's decommission it then. I'll file bugs to get monitoring udpated.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Chris Cooper [:coop] (he/him)

Updated

•

13 years ago

Blocks: 778192

Justin Wood (:Callek)

Reporter

Comment 11

•

12 years ago

So, DCOps claims this tegra is still happily attached to a PDU, but it is already removed from DNS. We need to finish the decomm story here.

Flags: needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Reporter

Updated

•

12 years ago

URL: https://secure.pub.build.mozilla.org/...

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Justin Wood (:Callek)

Reporter

Updated

•

11 years ago

Flags: needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Reporter

Updated

•

11 years ago

Assignee: bugspam.Callek → nobody

QA Contact: armenzg → bugspam.Callek

BMO Automation

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

Bugzilla

tegra-049 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Tracking

(Not tracked)

People

(Reporter: Callek, Unassigned)

References

(
URL
)

Details

(Whiteboard: [buildduty][buildslave][capacity])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Updated

Updated

Updated

Updated

Updated