Closed
Bug 1268131
Opened 9 years ago
Closed 8 years ago
Better management of single-homed master services on hgssh
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: gps)
References
Details
Attachments
(2 files)
Back in the day, we only had a sshd service running on hgssh. If the master server went down, the zlb failed over to the warm standby and all was well.
We now have a few other services running on the master (hgssh3 currently). These include pulsenotifier.service, which sends messages to Pulse and will quickly be relied on by many consumers, including Firefox automation.
It is important we only have a single instance of pulsenotifier.service running at a time, otherwise there may be race conditions, double posting, and other badness.
Furthermore, we don't have a good way of transitioning one server from standby to master. This would require a bunch of manually performed `systemctl enable` commands. Heck, we don't even have docs on what commands those should be. We need to make it turnkey to change the "state" of a server from standby to master and vice versa.
Since we have systemd now, I was thinking we could establish a target unit. e.g. hgmaster.target. The systemd services that need to run on the master will have their dependencies tied to this target. So starting all services required of a master will only require `systemctl enable/start hgmaster.target` or something like that.
I haven't verified this, but I /think/ we can also create a "counter-target" that "blocks" services from running. What I was thinking here is we'd have a hgstandby.target that is mutually exclusive with hgmaster.target. This could somehow prevent all the master-only services from running.
systemd targets solve the turnkey part. We need to ensure that 2 servers aren't both in the "master" state. If we do this naively, the master could go down, the standby could get promoted to master, then when the master starts up it will start all its master services with it and we have 2 copies of the master services running. No bueno. The easy solution to this is to have the master target not start on boot and require a human to start. We can provide some Ansible magic that ensures at most 1 server has the master target running.
Comment 1•9 years ago
|
||
fencing. whee. excuse me while I have a bunch of RHEL and Solaris clustering flashbacks!
how about querying zeus to see which pool the hg.m.o VS is using?
Assignee | ||
Comment 2•9 years ago
|
||
I'm going to take a stab at this.
Will submit reviews shortly.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•9 years ago
|
||
There are multiple services that need to run on the hg master server and
only the active hg master server. We create a systemd target unit to
control them as a group.
The target has a condition on a file being present on the NFS mount that
specifies the current active master. This should prevent the target from
starting unless it is the current master server.
Review commit: https://reviewboard.mozilla.org/r/49315/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49315/
Attachment #8746223 -
Flags: review?(klibby)
Attachment #8746224 -
Flags: review?(klibby)
Assignee | ||
Comment 4•9 years ago
|
||
Our hg-master.target unit will now control behavior of the various
systemd services that should only run on the master. If we stop the
hg-master.target unit, all master-related services should also stop.
Review commit: https://reviewboard.mozilla.org/r/49317/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49317/
Comment 5•9 years ago
|
||
Comment on attachment 8746223 [details]
MozReview Request: ansible/hg-ssh: create hg-master.target systemd unit (bug 1268131); r?fubar
https://reviewboard.mozilla.org/r/49315/#review46329
Attachment #8746223 -
Flags: review?(klibby) → review+
Comment 6•9 years ago
|
||
Comment on attachment 8746224 [details]
MozReview Request: ansible/hg-ssh: make hg master units WantedBy hg-master.target; r?fubar
https://reviewboard.mozilla.org/r/49317/#review46331
Attachment #8746224 -
Flags: review?(klibby) → review+
Assignee | ||
Comment 7•9 years ago
|
||
Hmmm. This didn't do everything I expected. I'm going to follow up with some tweaks.
Assignee | ||
Comment 8•9 years ago
|
||
https://hg.mozilla.org/hgcustom/version-control-tools/rev/281f47d24ccec41022f81ac8d5fdb343ec28d8e6
ansible/hg-ssh: add AssertPathExists on all units tied to master (bug 1268131)
Assignee | ||
Comment 9•9 years ago
|
||
https://hg.mozilla.org/hgcustom/version-control-tools/rev/324d7d53694ac2ca587af1264ae0d1b9805a931d
docs: document hg-master.target systemd unit (bug 1268131)
Assignee | ||
Comment 10•8 years ago
|
||
This landed.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•