Closed Bug 1268131 Opened 9 years ago Closed 8 years ago

Better management of single-homed master services on hgssh

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(2 files)

Back in the day, we only had a sshd service running on hgssh. If the master server went down, the zlb failed over to the warm standby and all was well. We now have a few other services running on the master (hgssh3 currently). These include pulsenotifier.service, which sends messages to Pulse and will quickly be relied on by many consumers, including Firefox automation. It is important we only have a single instance of pulsenotifier.service running at a time, otherwise there may be race conditions, double posting, and other badness. Furthermore, we don't have a good way of transitioning one server from standby to master. This would require a bunch of manually performed `systemctl enable` commands. Heck, we don't even have docs on what commands those should be. We need to make it turnkey to change the "state" of a server from standby to master and vice versa. Since we have systemd now, I was thinking we could establish a target unit. e.g. hgmaster.target. The systemd services that need to run on the master will have their dependencies tied to this target. So starting all services required of a master will only require `systemctl enable/start hgmaster.target` or something like that. I haven't verified this, but I /think/ we can also create a "counter-target" that "blocks" services from running. What I was thinking here is we'd have a hgstandby.target that is mutually exclusive with hgmaster.target. This could somehow prevent all the master-only services from running. systemd targets solve the turnkey part. We need to ensure that 2 servers aren't both in the "master" state. If we do this naively, the master could go down, the standby could get promoted to master, then when the master starts up it will start all its master services with it and we have 2 copies of the master services running. No bueno. The easy solution to this is to have the master target not start on boot and require a human to start. We can provide some Ansible magic that ensures at most 1 server has the master target running.
fencing. whee. excuse me while I have a bunch of RHEL and Solaris clustering flashbacks! how about querying zeus to see which pool the hg.m.o VS is using?
I'm going to take a stab at this. Will submit reviews shortly.
Assignee: nobody → gps
Status: NEW → ASSIGNED
There are multiple services that need to run on the hg master server and only the active hg master server. We create a systemd target unit to control them as a group. The target has a condition on a file being present on the NFS mount that specifies the current active master. This should prevent the target from starting unless it is the current master server. Review commit: https://reviewboard.mozilla.org/r/49315/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49315/
Attachment #8746223 - Flags: review?(klibby)
Attachment #8746224 - Flags: review?(klibby)
Our hg-master.target unit will now control behavior of the various systemd services that should only run on the master. If we stop the hg-master.target unit, all master-related services should also stop. Review commit: https://reviewboard.mozilla.org/r/49317/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/49317/
Comment on attachment 8746223 [details] MozReview Request: ansible/hg-ssh: create hg-master.target systemd unit (bug 1268131); r?fubar https://reviewboard.mozilla.org/r/49315/#review46329
Attachment #8746223 - Flags: review?(klibby) → review+
Comment on attachment 8746224 [details] MozReview Request: ansible/hg-ssh: make hg master units WantedBy hg-master.target; r?fubar https://reviewboard.mozilla.org/r/49317/#review46331
Attachment #8746224 - Flags: review?(klibby) → review+
Hmmm. This didn't do everything I expected. I'm going to follow up with some tweaks.
Blocks: 1263679
This landed.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: