1470606 - Expose repository data more atomically

Assignee

Description

•

7 years ago

There is a known deficiency with the hg.mozilla.org replication mechanism where there is a window where each hgweb machine exposes different repository state due to race conditions during replication. This is described at https://hg.mozilla.org/hgcustom/version-control-tools/file/536fa31ce353/docs/hgmo/replication.rst#l206. The mirror inconsistency is known to cause issues like bug 1462323. Ideally, we want each hgweb machine to expose new data atomically with every other machine: even if 1 hgweb machine is slow to replicate/receive new data, the fast machines shouldn't expose the new data until the slowest replicator has the data. Since we're using separate filesystems to store repository data, we need another mechanism to keep the fully replicated data in sync. In terms of ideal atomicity guarantees, the solution this problem is to use a network service to hold the fully replicated state. This service can be centralized (e.g. SQL) or decentralized (e.g. ZooKeeper). Or some kind of hybrid (like Redis). However, we've lived without a "proper" solution for ~3 years and it hasn't caused many problems. Only once we swapped in slow machines and the replication inconsistency window was increased from tens to hundreds of milliseconds to seconds did it become a problem. I'm sure failures due to inconsistency did occur. But the failure rate was perceived to be so low that it wasn't a problem. Given that a homogenous server environment with a non-atomic window of milliseconds seemed to cause few problems, there is a much simpler solution on the table that doesn't require a new network service and point of failure. And after connor and I consulted with bbangert, we decided simplicity triumphed and to go with that solution. The proposed solution is to build on top of the existing Kafka-based replication system. We already have a process on the hgssh server that republishes Kafka/replication messages once all consumers have acknowledged / applied that message. This Kafka topic is what the Pulse and SNS notifiers are built on top of. Our plan is to introduce a new Kafka consumer on each hgweb machine to consume the "fully replicated" Kafka topic. When it receives a message, it will atomically write a local file containing the state of the repo replicated up to that message (the file likely contains the set of heads in the changelog). We will introduce a repoview/filter that takes this file into account and serves a repository accordingly. Essentially: 1) Push occurs and replication messages written to Kafka 2) Each hgweb machine applies replication messages and acknowledges Kafka offset 3) Once each hgweb consumer has applied a message, that message is republished in a new topic 4) Each hgweb machine pulls republished message in near real time and updates state on local filesystem 5) hgweb process starts exposing new repository state There are still race conditions on the hgweb servers due to them not processing Kafka messages at the same rate. But from our experience with our low-volume Kafka cluster, messages are delivered across machines within a very tight window. As long as that holds and the work being done by the consumer is simple (e.g. writing a file) and that process is prioritized accordingly so it can't be preempted by other work on the system, the inconsistency window should be very short (single digit milliseconds) and the chances of anyone hitting a race condition should be very low. We believe the simplicity of this design (building on top of existing infrastructure) - while not resulting in the ideal atomic guarantees - is a better trade-off than the complexity involved with standing up a new service to hold the atomic state. We can likely implement this solution in a day or 2 of work since it builds on top of existing services. FWIW, once hgweb is moved to AWS or GCP, a hosted service like DynamoDB or Athena may be acceptable. We just don't want to stand up a new service in a Mozilla datacenter. Especially since we're in the middle of migrating datacenters and we plan to eventually move to a non-physical datacenter. It's not worth the hassle standing up a new network service in Mozilla's datacenter nor the latency or reliability concerns of making a network request outside the datacenter to provide the atomic replication feature. glob: I would appreciate an evaluation of my thoughts here. I can hop on a video call to discuss in more detail. Service architecture decision making is hard to do in text!

Flags: needinfo?(glob)

:glob ✱

Comment 1

•

7 years ago

as per our video chat, this looks good to me.

Flags: needinfo?(glob)

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Depends on: 1471732

Gregory Szorc [:gps]

Assignee

Comment 2

•

7 years ago

Attached file docs: document planned replication improvements to reduce inconsistency window (bug 1470606); r?glob, sheehan — Details

The docs change capture what will be implemented in subsequent commits.

Gregory Szorc [:gps]

Assignee

Comment 3

•

7 years ago

Attached file vcsreplicator: allow different message processing functions to be used (bug 1470606); r?glob, sheehan — Details

This will allow us to more easily introduce another consumer that reuses most of the existing code but handles individual messages differently (e.g. processes a different set of messages).

Gregory Szorc [:gps]

Assignee

Comment 4

•

7 years ago

Attached file vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606); r?glob, sheehan — Details

INCOMPLETE: TESTS CURRENTLY FAIL. DO NOT APPROVE. Our current replication mechanism is essentially: 1) (hgssh) Write message to pushdata topic 2) (hgweb) Apply message in pushdata, expose data to hgweb 3) (hgssh) Copy message to replicatedpushdata topic 4) (hgssh) React to fully replicated messages in replicatedpushdata Our new mechanism to reduce the inconsistency window on hgweb machines will involve a 2-phase replication: 1) (hgssh) Write message to pushdata topic 2) (hgweb) Apply message from pushdata 3) (hgssh) Copy message to replicatedpushdata-pending topic 4a) (hgweb) Write updated heads from message in replicatedpushdata-pending 4b) (hgweb) Start serving new heads via hgweb 5) (hgssh) Copy message to replicatedpushdata 6) (hgssh) React to fully replicated messages in replicatedpushdata Essentially, we're inserting a new topic between "pushdata" and "replicatedpushdata:" "replicatedpushdatapending." This commit implements that new middle topic. In message processing order, the following new functionality was implemented: * The "replicatedpushdatapending" topic is created in Ansible and in our test environment. * A new "pushdataaggregator-pending" service has been added to hgssh. It works just like pushdataaggregator except it copies messages to "replicatedpushdatapending" instead of "replicatedpushdata." A nagios check for the new service is included. * A new `vcsreplicator-headsconsumer` process has been added. It calls a function that no-ops on all messages (functionality will be implemented in a later commit). It shares most code with `vcsreplicator-consumer`. * A new systemd service has been added to run `vcsreplicator-headsconsumer` on hgweb. It monitors the "replicatedpushdatapending" topic. * The "pushdataaggregator" service on hgssh has been changed to monitor "replicatedpushdatapending" instead of "pushdata."

Phabricator Automation

Comment 5

•

7 years ago

Comment on attachment 8988639 [details] docs: document planned replication improvements to reduce inconsistency window (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1870

Attachment #8988639 - Flags: review+

Phabricator Automation

Comment 6

•

7 years ago

Comment on attachment 8988640 [details] vcsreplicator: allow different message processing functions to be used (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1872

Attachment #8988640 - Flags: review+

Gregory Szorc [:gps]

Assignee

Comment 7

•

7 years ago

fubar: I just wanted to give a heads up that we'll be introducing 2 new daemon processes as part of this bug (1 on hgweb and another on hgssh). We'll need to hook up some Nagios monitoring of those. We may also need to tweak existing Nagios monitoring.

Phabricator Automation

Updated

•

7 years ago

Attachment #8988641 - Attachment description: vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606) → vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606); r?glob, sheehan

Gregory Szorc [:gps]

Assignee

Comment 8

•

7 years ago

Attached file vcsreplicator: implement heads message handling (bug 1470606); r?glob, sheehan — Details

The previous commit established a new aggregated "pending" topic for all messages that had previously been replicated along with a new consumer daemon for processing messages in it. This commit teaches that new consumer daemon to handle the "hg-heads-1" message. Whenever we receive a message listing repository heads, we write a file in the repository containing that list of heads. We first write out to a temporary file then do an atomic rename. This ensures that readers see an atomic snapshot of the file content. A test for the new functionality has been added. It foreshadows hgweb using the heads file to determine which changesets should be exposed.

Gregory Szorc [:gps]

Assignee

Comment 9

•

7 years ago

Attached file vcsreplicator: only expose replicated revisions via hgweb (bug 1470606); r?glob, sheehan — Details

INCOMPLETE: there are still some test failures Now that we write out a file containing the set of fully replicated heads in a repository, it's now time to put it to use. This commit implements a new Mercurial extension that provides a new repoview/filter that takes the replicated-heads file into account. A repoview is simply a function that returns a set of revisions that should be hidden. By default, hgweb uses the "served" repoview. This view filters out hidden/obsolete and secret revisions. We implement a "replicatedserved" repoview. It starts by collecting the revisions filtered by the "served" repoview. If the ".hg/replicated-heads" file exists, we perform a DAG traversal starting at all DAG heads and stopping at revisions specified in this file and revisions hidden by the "served" repoview. These revisions are unioned with the revisions from the "served" repoview.

Phabricator Automation

Comment 10

•

7 years ago

Comment on attachment 8988639 [details] docs: document planned replication improvements to reduce inconsistency window (bug 1470606); r?glob, sheehan Byron Jones ‹:glob› has approved the revision. https://phabricator.services.mozilla.com/D1870

Attachment #8988639 - Flags: review+

Phabricator Automation

Comment 11

•

7 years ago

Comment on attachment 8988950 [details] vcsreplicator: only expose replicated revisions via hgweb (bug 1470606); r?glob, sheehan Byron Jones ‹:glob› has approved the revision. https://phabricator.services.mozilla.com/D1894

Attachment #8988950 - Flags: review+

Gregory Szorc [:gps]

Assignee

Comment 12

•

7 years ago

Attached file vcsreplicator: add last push ID to heads message (bug 1470606); r?sheehan, glob — Details

Our intent is to use the heads message as a coordination event to control what data is exposed on hgweb machines. As part of testing this eventual mechanism, I discovered that the pushlog needs to be integrated with this new mechanism because it assumes hidden changesets are obsolete. This commit adds the last push ID from the pushlog to the heads replication message, if available. Strictly speaking, this change is backwards incompatible and we should introduce a new message to convey the new data. However, nobody is using the heads message yet. So we can get away with this change without introducing a new message type.

Phabricator Automation

Updated

•

7 years ago

Attachment #8988950 - Attachment description: vcsreplicator: only expose replicated revisions via hgweb (bug 1470606) → vcsreplicator: only expose replicated revisions via hgweb (bug 1470606); r?glob, sheehan

Phabricator Automation

Comment 13

•

7 years ago

Comment on attachment 8989559 [details] vcsreplicator: add last push ID to heads message (bug 1470606); r?sheehan, glob Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1931

Attachment #8989559 - Flags: review+

Phabricator Automation

Comment 14

•

7 years ago

Comment on attachment 8988640 [details] vcsreplicator: allow different message processing functions to be used (bug 1470606); r?glob, sheehan Byron Jones ‹:glob› has approved the revision. https://phabricator.services.mozilla.com/D1872

Attachment #8988640 - Flags: review+

Phabricator Automation

Comment 15

•

7 years ago

Comment on attachment 8988641 [details] vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606); r?glob, sheehan Byron Jones ‹:glob› has approved the revision. https://phabricator.services.mozilla.com/D1873

Attachment #8988641 - Flags: review+

Phabricator Automation

Comment 16

•

7 years ago

Comment on attachment 8988949 [details] vcsreplicator: implement heads message handling (bug 1470606); r?glob, sheehan Byron Jones ‹:glob› has approved the revision. https://phabricator.services.mozilla.com/D1893

Attachment #8988949 - Flags: review+

Phabricator Automation

Comment 17

•

7 years ago

Comment on attachment 8988641 [details] vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1873

Attachment #8988641 - Flags: review+

Gregory Szorc [:gps]

Assignee

Comment 18

•

7 years ago

Attached file pushlog: only expose replicated pushlog entries (bug 1470606); r?glob, sheehan — Details

The replicated-data file contains the set of heads that are fully replicated along with the most recently replicated pushlog ID. This commit teaches the pushlog extension to take this data into account. Pushlog queries via hgweb now have pushes belonging to unreplicated pushes filtered out by default. In other words, the pushlog won't expose updated data until it has been fully replicated. This should significantly narrow the window of inconsistency that pushlog pollers see. test-served-heads.t changes demonstrate the new, desired behavior.

Gregory Szorc [:gps]

Assignee

Comment 19

•

7 years ago

fubar: the patches in this bug will require some new nagios alerts on hgssh and hgweb. What do you need from me to get those running? (We're on track to have this deployed as early as Monday and we need monitoring once it is otherwise the replication system could silently break.)

Flags: needinfo?(klibby)

Phabricator Automation

Comment 20

•

7 years ago

Comment on attachment 8988949 [details] vcsreplicator: implement heads message handling (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1893

Attachment #8988949 - Flags: review+

Phabricator Automation

Comment 21

•

7 years ago

Comment on attachment 8988950 [details] vcsreplicator: only expose replicated revisions via hgweb (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1894

Attachment #8988950 - Flags: review+

Kendall Libby [:fubar] (he/him)

Comment 22

•

7 years ago

(In reply to Gregory Szorc [:gps] from comment #19) > fubar: the patches in this bug will require some new nagios alerts on hgssh > and hgweb. What do you need from me to get those running? > > (We're on track to have this deployed as early as Monday and we need > monitoring once it is otherwise the replication system could silently break.) depends on what you want monitored! :-) tell me what you've added and what needs looking after, and I'll have a go at things on Mon/Tues.

Flags: needinfo?(klibby)

Phabricator Automation

Comment 23

•

7 years ago

Comment on attachment 8990148 [details] pushlog: only expose replicated pushlog entries (bug 1470606); r?glob, sheehan Connor Sheehan [:sheehan] has approved the revision. https://phabricator.services.mozilla.com/D1990

Attachment #8990148 - Flags: review+

Gregory Szorc [:gps]

Assignee

Updated

•

7 years ago

Keywords: leave-open

docs: document planned replication improvements to reduce inconsistency window (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	sheehan : review+ glob : review+	Details \| Review
vcsreplicator: allow different message processing functions to be used (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	sheehan : review+ glob : review+	Details \| Review
vcsreplicator: replicate messages to replicatedpushdatapending (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	glob : review+ sheehan : review+	Details \| Review
vcsreplicator: implement heads message handling (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	glob : review+ sheehan : review+	Details \| Review
vcsreplicator: only expose replicated revisions via hgweb (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	glob : review+ sheehan : review+	Details \| Review
vcsreplicator: add last push ID to heads message (bug 1470606); r?sheehan, glob 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	sheehan : review+	Details \| Review
pushlog: only expose replicated pushlog entries (bug 1470606); r?glob, sheehan 7 years ago Gregory Szorc [:gps] 46 bytes, text/x-phabricator-request	sheehan : review+	Details \| Review