Closed
Bug 806152
Opened 12 years ago
Closed 12 years ago
Panda relay boards issueing connection refused....
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: Callek, Unassigned)
References
Details
Sooo, in an attempt to help recover some panda boards, and while testing code, I notice we can get connection refused on the relay boards (log from a manual py shell below)
Do we have a rate-limiting here, if so, getting multiple connections to the relay board at *once* is entirely possible in production, and will be needed.
-----------------
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/builds/sut_tools/sut_lib.py", line 520, in reboot_relay
return relayModule.powercycle(relay_host, bank, relay)
File "/builds/sut_tools/relay.py", line 109, in powercycle
with connected_socket(relay_hostname, PORT) as sock:
File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/builds/sut_tools/relay.py", line 53, in connected_socket
sock.connect((hostname, port))
File "<string>", line 1, in connect
socket.error: [Errno 111] Connection refused
>>> for x in sut_lib.pandas:
... print x
... try:
... sut_lib.reboot_relay(x)
... except:
... pass
...
panda-0048
10/27/2012 17:55:56: INFO: Calling PDU powercycle for panda-0048, panda-relay-004.build.scl1.mozilla.com:1:3
True
panda-0028
10/27/2012 17:55:57: INFO: Calling PDU powercycle for panda-0028, panda-relay-002.build.scl1.mozilla.com:2:3
True
panda-0032
10/27/2012 17:55:58: INFO: Calling PDU powercycle for panda-0032, panda-relay-002.build.scl1.mozilla.com:2:7
panda-0033
10/27/2012 17:55:58: INFO: Calling PDU powercycle for panda-0033, panda-relay-002.build.scl1.mozilla.com:2:8
panda-0030
10/27/2012 17:55:58: INFO: Calling PDU powercycle for panda-0030, panda-relay-002.build.scl1.mozilla.com:2:5
panda-0031
10/27/2012 17:55:58: INFO: Calling PDU powercycle for panda-0031, panda-relay-002.build.scl1.mozilla.com:2:6
panda-0054
10/27/2012 17:55:58: INFO: Calling PDU powercycle for panda-0054, panda-relay-004.build.scl1.mozilla.com:2:5
True
panda-0055
10/27/2012 17:55:59: INFO: Calling PDU powercycle for panda-0055, panda-relay-004.build.scl1.mozilla.com:2:6
panda-0056
10/27/2012 17:55:59: INFO: Calling PDU powercycle for panda-0056, panda-relay-004.build.scl1.mozilla.com:2:7
......
Note how the successful ones return True
Reporter | ||
Comment 1•12 years ago
|
||
Adding in a time.sleep(20) works so far, so either we need a 2-10 second delay+retry when it fails to connect here in the relay script, or there might be a relay config thing that could/should be tweaked.
Comment 2•12 years ago
|
||
I don't actually know much about how these boards are configured, maybe Jake knows. It wouldn't entirely surprise me to find that they only support one simultaneous connection.
Comment 3•12 years ago
|
||
Agreed. That should be fairly easy to add to BMM. Presumably you could do the same in foopy code?
Comment 4•12 years ago
|
||
I filed bug 806337 for BMM.
Assignee: server-ops-releng → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: arich
Comment 5•12 years ago
|
||
Is bmm production ready right now? if so we should plan on using it. Otherwise we need to have a retry loop to ensure we can handle the load until we have BMM or some api that queues up the requests.
Comment 6•12 years ago
|
||
It is, but we decided last week not to use it until B2G is ready to go. In the interim, DCOps will be using it for reboots and reimages.
Reporter | ||
Comment 7•12 years ago
|
||
Marking need-info?jake for if the relay boards can be configured to accept multiple simultaneous connections, otherwise we'll need to bake in some retry magic to our sut code [and bmm]
Flags: needinfo?(jwatkins)
Comment 8•12 years ago
|
||
I'm fairly certain the boards themselves can't be configured that way - they're not particularly sophisticated. I haven't verified that the boards actually *don't* accept multiple connections, but it sounds like you have. In that case, bake away (and otherwise, verify then bake).
Flags: needinfo?(jwatkins)
Comment 9•12 years ago
|
||
I highly doubt the relay board supports multiple connections it is essentially a tcp socket to serial port bridge. And I don't the believe the firmware holds a serial command queue. Calls to reboot a panda board should ultimately be called to a central api (probably mozpool/lifeguard/bmm) to be able to prevent overlapping and to allow for locks and queues.
Reporter | ||
Comment 10•12 years ago
|
||
Ok, WONTFIX in favor of short-term 811641 and long term MozPool/BMM supporting retries/being our interface.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•