Closed Bug 835420 Opened 13 years ago Closed 12 years ago

add 2way comm test to relay.py plus api call for nagios to check relay board status

Categories

(Testing Graveyard :: Mozpool, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dividehex, Assigned: dividehex)

References

Details

Attachments

(1 file, 2 obsolete files)

We need an additional bmm api so nagios can check a relay boards 2 way serial communications. The test command for the relay board is 33(dec) or 0x21(hex). This should be fairly simple except for the fact that the target won't be a 'device' but a relay board.
Attached patch Adds comms test api (obsolete) — Splinter Review
Attachment #744028 - Flags: review?(dustin)
Comment on attachment 744028 [details] [diff] [review] Adds comms test api Review of attachment 744028 [details] [diff] [review]: ----------------------------------------------------------------- This will need an UPDATING.md change, too, to describe the schema changes. Also, this will need tests for the test_comms method and for the API handler. This patch doesn't link the `devices` table to the `relay_boards` table. I'm not sure if that's problematic or not. On the one hand, duplicating data in a DB (in this case, relay board fqdns) is frowned upon, and it makes it tricky to tie a given relay to its controlled devices. On the other hand, if we end up controlling devices with something *other* than relay boards, such as PDUs, we'll want the flexibility of the `relay_info` string, rather than a link to a relay. What do you think? ::: API.txt @@ +180,5 @@ > half-second. > > +/api/relay/{id}/test/ > +* GET to test the two way comms of this relay board. Returns a JSON object with a `success` key, and > + value true or false. The two happens synchronously per relay board, and times out after about 10 secs. "The two" -> "The test" ::: mozpool/bmm/api.py @@ +31,5 @@ > self.db = db > > @async_operation(max_time=30) > + def test_comms(self, relay_name): > + """ Shouldn't this be max_time=10? ::: mozpool/db/inventorysync.py @@ +98,5 @@ > model.hardware_types.c.model==hardware_model))) > return self.singleton(res) > > + def _insert_relay_board(self, fqdn, imaging_server_id): > + # try inserting, ignoring failures (most likely due to duplicate row) I was going to complain that this assumes a relay is associated with exactly one imaging server. But we assume that anyway by serializing access to a relay on a single imaging server. It might be worth checking whether our current implementation has any devices that violate this assumption, and maybe adding code to the inventorysync to do that check regularly. Probably not in this bug. ::: mozpool/db/model.py @@ +135,5 @@ > + sa.ForeignKey('imaging_servers.id', ondelete='RESTRICT'), > + nullable=False), > + sa.Column('state', sa.String(32), nullable=False), > + sa.Column('state_counters', sa.Text, nullable=False), > + sa.Column('state_timeout', sa.DateTime, nullable=True), Do you still intend to make state machines for these? If so, the states should be initialized in the insert so that when they are used, they have a sane value. Otherwise, they should be removed.
Attachment #744028 - Flags: review?(dustin) → review-
(In reply to Dustin J. Mitchell [:dustin] from comment #2) > This will need an UPDATING.md change, too, to describe the schema changes. > Also, this will need tests for the test_comms method and for the API handler. > > This patch doesn't link the `devices` table to the `relay_boards` table. > I'm not sure if that's problematic or not. On the one hand, duplicating > data in a DB (in this case, relay board fqdns) is frowned upon, and it makes > it tricky to tie a given relay to its controlled devices. On the other > hand, if we end up controlling devices with something *other* than relay > boards, such as PDUs, we'll want the flexibility of the `relay_info` string, > rather than a link to a relay. What do you think? This patch was full of 'forks in the road' but in this case, I think have the flexibility would be worth the dup data. Although I don't intend to ever have use PDUs for future device power control, it may end up happening anyway. In fact, I considered calling the table power_control_devices. Do you think there is a enough cause for that? > "The two" -> "The test" I must have had 2's on the mind :} > > Shouldn't this be max_time=10? Yes it should > I was going to complain that this assumes a relay is associated with exactly > one imaging server. But we assume that anyway by serializing access to a > relay on a single imaging server. It might be worth checking whether our > current implementation has any devices that violate this assumption, and > maybe adding code to the inventorysync to do that check regularly. Probably > not in this bug. Unfortunately we need to enforce this assumption with the current relay board hardware since by its nature of being an ethernet serial bridge with out a buffer, it doesn't handle multiple tcp connections to the serial socket. This is really a failure of the relay board design and I'm looking into ways to fix this on the firmware level. > Do you still intend to make state machines for these? If so, the states > should be initialized in the insert so that when they are used, they have a > sane value. Otherwise, they should be removed. I'm not a fan of mozpool maintaining and controlling states on devices used to help control states of devices. And honestly, I thought our back and forth jokes about having mozpool control power devices all the way up to the Santa Clara power substation was enough to conclude it might be a silly idea. But... the pragmatic side of me says, being able to reboot relay boards via telnet automatically might be useful. I also don't like changing API or DB Schema too often and if it must happen, do it all at once for posterity. Thoughts?
(In reply to Jake Watkins [:dividehex] from comment #3) > Unfortunately we need to enforce this assumption with the current relay > board hardware since by its nature of being an ethernet serial bridge with > out a buffer, it doesn't handle multiple tcp connections to the serial > socket. This is really a failure of the relay board design and I'm looking > into ways to fix this on the firmware level. So we're assuming it, but not enforcing it right now. That's something that inventorysync.py could scan for while it's reading inventory, and report on any problems. Now that I look again at inventorysync, it has some other problems: As written the inventory sync will not recognize changes to the relationship between relays and imaging servers -- once the row is recorded, it's in there for life. Also, the uniqueness constraint on the `relay_boards` table is only on the id, so this will generate lots of rows with the same name/fqdn -- not what you want. Finally, the `fqdn` argument to _insert_relay_board isn't really an fqdn at all, so the name is a bit confusing.. > boards via telnet automatically might be useful. I also don't like changing > API or DB Schema too often and if it must happen, do it all at once for > posterity. Thoughts? Sounds good to me. Can you add a comment to schema.sql as to why those fields are present but unused, and also fill them in (probably with state='ready') when initializing in inventorysync.py?
Attached patch Adds comms test api (obsolete) — Splinter Review
Attachment #744028 - Attachment is obsolete: true
Attachment #748336 - Flags: review?(dustin)
Comment on attachment 748336 [details] [diff] [review] Adds comms test api Review of attachment 748336 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but mozpool/db/relay_boards.py is not included in the patch. Can you re-post the patch with that file included? I'd like to double-check that tests pass with mysql and sqlite.
Attachment #748336 - Flags: review?(dustin) → review+
oops. I forgot to stage that file before grabbing the diff.
Attachment #748336 - Attachment is obsolete: true
Attachment #748443 - Flags: review?(dustin)
Attachment #748443 - Flags: review?(dustin) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: