Closed Bug 1033251 Opened 10 years ago Closed 10 years ago

mozpool does not function correctly after panda chassis are moved

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: arich)

References

Details

After moving the panda chassis, mozpool is frequently logging the following error:

request ERROR - [2014-07-02 02:26:43,525] failure in _tick
Traceback (most recent call last):
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/statedriver.py", line 93, in _tick
    self.poll_for_timeouts()
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/statedriver.py", line 127, in poll_for_timeouts
    for machine_name in self._get_timed_out_machine_names():
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/mozpool/requestmachine.py", line 64, in _get_timed_out_machine_names
    return self.db.requests.list_timed_out(self.imaging_server_id)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/mozpool/requestmachine.py", line 73, in imaging_server_id
    self._imaging_server_id = self.db.imaging_servers.get_id(config.get('server', 'fqdn'))
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/db/imaging_servers.py", line 18, in get_id
    return self.singleton(res)
  File "/opt/mozpool/frontend/lib/python2.7/site-packages/mozpool/db/base.py", line 28, in singleton
    raise exceptions.NotFound
NotFound
Severity: normal → major
They're failing to find themselves in the imaging_servers table.  The inventory sync should have taken care of that, assuming the servers appear in the pandas' k/v store.

And that assumption is false.  For example,
  https://inventory.mozilla.org/en-US/systems/show/6992/
panda-0033 has three 'system.imaging_server.0' keys, one of which is the old scl1 location.  When represented as a JSON dictionary, presumably the scl1 value is the one to appear.  So it's as if these values were never changed, and no pandas are assigned to the imaging server.
Assignee: dustin → juber
Depends on: 1033350
over to amy to verify when the dep bug is done
Assignee: juber → arich
After the database was fixed, it still required a mozpool code fix to allow the database sync to work.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.