Closed Bug 486357 Opened 16 years ago Closed 13 years ago

IT should monitor that all machines hosting AMO are at the same revision

Categories

(mozilla.org Graveyard :: Server Operations: Projects, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: ashish)

References

Details

Bug 481007 is checking to make sure that git doesn't trip on itself and break a push to our servers, however, we have no checks for if a server is being updated at all. We pushed the s/add-on/change-around/ code yesterday and mrapp09 never got updated until reed put it back in the "lvs-cluster.conf" (I don't know the details of what that means). If this were a major push like the one we are doing next week that will really mess things up. There should be a check to make sure all boxes hosting AMO are actually at the same revision. (this bug is specific to AMO - I'm not sure how other sites are set up but if this could benefit them too we should expand the description to include them)
mrapp09 got updated, but because it wasn't in lvs-cluster.conf, it didn't have apache restarted (at least this is what I think). I'm running a manual update on mrapp09 to check.
Proposal for app server nagios check: mradm02_list = last n(4?) git revisions on mradm02(git-log -n 4 --pretty=format:%H) if app server's current revision(git-log -n 1 --pretty=format:%H) in mradm02_list: return OK else return CRITICAL
Assignee: server-ops → oremj
Whiteboard: Blocked on push improvements
Depends on: 488762
Component: Server Operations → Server Operations: Projects
I'm tempted to wontfix this because as described it won't really fit our new way of pushing.. But I'm pulling people in to take note of it, as we refine our push process for rolling updates a blanket check would fail many times through the update. However, whatever process we end up with needs to insure that all webheads were properly updated in the end. As far as I know this has not been an issue for a long time, so we will roll this into our new push process rather than making a blanket monitor.
Do we still care about implementing this monitor?
Not until it happens again. ;) Our pushes have been pretty solid lately, but it'd be nice to just check the git version of the checkout on each webhead (a very fast running command). Except you don't sync the git dirs over....umm Oh, https://addons.mozilla.org/media/git-rev.txt. Just compare that file (on disk) for each webhead every 15 minutes. Fast operation and doesn't need to happen a ton, but if it ever doesn't match it's a big deal.
Whiteboard: Blocked on push improvements
Added this to nagios as a cluster check. It checks /data/addons/www/addons.mozilla.org/zamboni/media/git-rev.txt on the admin host against http://addons.mozilla.org/media/git-rev.txt on each webhead (via a Host header) and sends out an alert if they don't match.
Assignee: oremj → ashish
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.