Closed Bug 697269 Opened 13 years ago Closed 13 years ago

modify hg-mirrors sync check to operate on a per-repo basis

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: dustin)

Details

Right now the hg-mirrors check looks at all the repos at once. We would be better served by having each repo checked individually so that if one repo is out of sync (because of an ongoing push) and recovers, but then another goes out of sync before the next check (because of a different ongoing push), we do not get notified (false positive). Handing this to dustin to whip up a new check.
I added the following as check_hg_mirror (separate from the existing check_hg_mirrorsync). This is committed in puppet, but the nagios checks are still using check_hg_mirrorsync, so no alerts should occur. I'll make the nagios changes next week. ---- import yaml import os import os.path import sys import time import argparse parser = argparse.ArgumentParser(description='check status of a mirrored hg repo') parser.add_argument('-r', dest='repo', required=True, help='repo to check') parser.add_argument('-W', dest='warning', type=int, default=200, help='data age (s) for WARNING') parser.add_argument('-C', dest='critical', type=int, default=300, help='data age (s) for CRITICAL') args = parser.parse_args() datafile="/dev/shm/check_hg_mirrorsync/state" statd = os.stat(datafile) now = time.time() data_age = now - statd.st_mtime data = yaml.load(file(datafile)) total_count = len(data.keys()) if args.repo not in data: print "'%s' not a known repo" % args.repo sys.exit(3) # UNKNOWN # check sync before data age, since it's more important master_tip = data[args.repo]['upstream_tip'] local_tip = data[args.repo]['mirror_tip'] if master_tip != local_tip: print "repo '%s' is out of sync" % args.repo sys.exit(2) # CRITICAL if data_age > min(args.warning, args.critical): print "sync data is stale. %i seconds" % data_age if data_age > args.critical: sys.exit(2) # CRITICAL sys.exit(1) # WARNING else: print "SYNC OK" sys.exit(0)
These are starting to go green in nagios.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.