Closed Bug 1237811 Opened 5 years ago Closed 2 years ago

Establish a unified log of hg.mo events

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

This bug is about building a unified log of hg.mo events that is derived from the replication log. It will be used to power publishing to Pulse, creating an aggregate Gecko repo, etc. More details will be provided in the commit messages.
Blocks: 1212002
No longer blocks: 1108729
Blocks: 1264999
Depends on: 1265831
fubar: can you please add a Nagios monitor for a process with "vcsreplicator-aggregator /etc/mercurial/pushdataaggregator.ini" in its arguments to the hgssh master server? The full process is "/var/hg/venv_tools/bin/python2.7 /var/hg/venv_tools/bin/vcsreplicator-aggregator /etc/mercurial/pushdataaggregator.ini" but the actual python path may change over time. There should be at most 1 process.
Flags: needinfo?(klibby)
I can, though what happens when we have a failover event? There isn't any automated way for nagios to know which is the master at any given time, afaik.
Flags: needinfo?(klibby)
We could have puppet or something put a file on the machine indicating which machine is master or if the current machine is master. Then we could write custom Nagios checks that take the master into consideration. e.g. if you aren't the master, the Nagios check verifies 0 processes are present.

This all draws more attention to the fact that our failover situation is far from robust...
meh. between this and 1196915, I'll just make a hostgroup for just hgssh3 for the short term. 

added 'procs - hg vcsreplicator aggregator' nrpe check. will need mana page and docs on what to do when it goes off.
QA Contact: hwine → klibby

This has been fixed for ages.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.