Open Bug 1513276 Opened 4 years ago Updated 2 years ago

Mass upgrade repos to sparserevlog

Categories

(Developer Services :: Mercurial: hg.mozilla.org, enhancement, P2)

enhancement

Tracking

(Not tracked)

REOPENED

People

(Reporter: gps, Assigned: sheehan)

References

Details

Attachments

(1 file)

Mercurial 4.9 makes the "sparserevlog" repository feature enabled by default. This incremental improvement to revlog storage makes delta chains shorter (by trading I/O reads across larger distances). This makes revlogs smaller and makes fulltext revision reading faster. See https://www.mercurial-scm.org/repo/hg/rev/3764330f76a6 for performance numbers.

At some point after Mercurial 4.9 is released, we should mass upgrade repos on hg.mozilla.org to use it. This will be similar to what we did in bug 1351859.

Note: upgrading the repos will mean legacy clients won't be able to "stream clone." This could have adverse impact on CI, which has historically used stream clone heavily.
I locally upgraded my Firefox repo to sparse revlogs and... wow - manifest performance is much improved! Operations like rebasing a series are significantly faster.
Type: defect → enhancement
Priority: -- → P2
Duplicate of this bug: 1562856

Copy/pasting the whole comment from bug 1562856 + adding my own comment.

(In reply to Connor Sheehan [:sheehan] from bug 1562856 comment #1)

(In reply to Mike Hommey [:glandium] from bug 1562856 comment #0)

As seen in https://glandium.org/blog/?p=3913, currently, cloning mozilla-unified (or mozilla-central, for that matter) takes an awful long time (except when doing a streaming clone).

One of the reasons is that the clonebundle is suboptimal wrt delta chains, and sparse-revlog (a feature new to mercurial 4.7) improves things.

The hg --config format.sparse-revlog=yes debugupgraderepo --run command should work to convert a repository.

If I convert a mozilla-unified clone, and create a new bundle from it, unbundling that bundle takes 10 minutes instead of the 20 minutes it takes to unbundle the last mozilla-unified zstd bundle from hg.cdn.mozilla.net.

Interesting, that's a much larger improvement than I had previously seen for sparse-revlog.

Strictly speaking, this is better to apply on all repositories, and shouldn't affect their clonability with older versions of mercurial. Ideally, the web heads would be updated too. But the most important part is applying this to the repositories that are used to create the clonebundles.

As I understand it, this is not true for stream-clone bundles, since they essentially send the raw revlog data over the wire with no extra processing. From the output of hg help clonebundles:

'hg debugcreatestreamclonebundle' can be used to produce a special streaming
clonebundle
. These are bundle files that are extremely efficient to produce
and consume (read: fast). However, they are larger than traditional bundle
formats and require that clients support the exact set of repository data
store formats in use by the repository that created them. Typically, a newer
server can serve data that is compatible with older clients. However,
streaming clone bundles don't have this guarantee. Server operators need
to be aware that newer versions of Mercurial may produce streaming clone
bundles incompatible with older Mercurial versions.

This is the main blocker for us, since we don't have a consistent version of hg in use in CI. We should be on 4.8 almost everywhere, however I still see older versions in Taskcluster logs from time to time. IIRC there are some versions of hg baked in to Docker images, where upgrading the Docker image is undesirable. Decision tasks come to mind as one of the main cases of this, the last time I looked into it.

Oh right, stream clones would be affected. But nothing on CI should be doing stream clones of mozilla-unified, so we could start there.

With this new timing info it may be easier to justify chasing down those last few upgrades, or just doing the upgrade and letting things break to flush them out.

(In reply to Mike Hommey [:glandium] from comment #3)

Oh right, stream clones would be affected. But nothing on CI should be doing stream clones of mozilla-unified, so we could start there.

Most tasks in CI are using streamed-clones of mozilla-unified. robustcheckout is designed to work specifically with that repo. See here for example.

However I took a look through a few of the tasks that I had previously believed were on older versions of hg (like decision tasks and tasks on Windows) and they all seem to be running at least 4.7. I'm going to try and carve out some time this week and see if I can find any outstanding locations. Otherwise we should do the upgrade soon.

Most tasks in CI are using streamed-clones of mozilla-unified.

Oh, I thought they were stream-cloning the branch they're on.

(In reply to Connor Sheehan [:sheehan] from comment #4)

However I took a look through a few of the tasks that I had previously believed were on older versions of hg (like decision tasks and tasks on Windows) and they all seem to be running at least 4.7. I'm going to try and carve out some time this week and see if I can find any outstanding locations. Otherwise we should do the upgrade soon.

Is 4.7 sufficient, or do we need 4.9 as #c0 seems to imply?

How much time do you need to upgrade the repos? I have tentative approval for July 27 but need more info (eg how much time, when to start).

Flags: needinfo?(sheehan)

The feature was added in mercurial 4.7. 4.9 made it the default for new repositories.

(In reply to Kendall Libby [:fubar] (he/him) from comment #6)

How much time do you need to upgrade the repos? I have tentative approval for July 27 but need more info (eg how much time, when to start).

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

Flags: needinfo?(sheehan)

(In reply to Connor Sheehan [:sheehan] from comment #8)

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

Sure, but at some point we'll hit perf issues on the NFS mounts. What is the set of repos we need to update?

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

At this point I am uncomfortable with doing these upgrades this weekend. We don't have a good sense of how long it will take, nor who is doing the work and can recover if anything should go sideways. What are the risks is we delay this until the next TCW (Sept 21, and likely when Taskcluster services will migrate to GCP, so a good sized window)?

Flags: needinfo?(sheehan)

(In reply to Kendall Libby [:fubar] (he/him) from comment #9)

(In reply to Connor Sheehan [:sheehan] from comment #8)

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

Sure, but at some point we'll hit perf issues on the NFS mounts. What is the set of repos we need to update?

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

At this point I am uncomfortable with doing these upgrades this weekend. We don't have a good sense of how long it will take, nor who is doing the work and can recover if anything should go sideways. What are the risks is we delay this until the next TCW (Sept 21, and likely when Taskcluster services will migrate to GCP, so a good sized window)?

At minimum I'd like to update the "important" repos (central, autoland, release repos, anything running in CI, etc) to get the perf wins in CI and on developer machines. Eventually I'd like to have every repo on hgmo upgraded. That will take far too long for this TCW, though. There are no huge risks to delaying and upgrade, we just won't be using the latest and greatest storage formats until we do. I understand the caution, and since I won't be around to help with a rollback in the event something goes horribly wrong, I'm fine with this upgrade not taking place over this weekend.

Taking a look at the previous repository format upgrade (bug 1351859), we didn't actually use a TCW for many of the critical repos. Most of the information in that bug is still relevant such as the information in bug 1351859 comment 0, regarding the operation being safe to abort and creating a backup bundle of the repo. We could probably use the same strategy for this upgrade as the previous - perform the upgrade for critical repos whenever there is a low-traffic opportunity (evenings, weekends, etc), and upgrade the remaining repos during the next TCW.

Flags: needinfo?(sheehan)

I've started upgrading the most "important" repos to sparserevlog, beginning with the CI-only hgweb mirrors.

Assignee: nobody → sheehan

This commit updates the bundle spec for stream clone bundles
to include the sparserevlog repo requirement. After the
repo format upgrade in the upcoming TCW, this requirement
will be present on all stream clone bundles and thus must be
advertised in the clonebundles manifest for each repository.

Since sparserevlog was introduced in Mercurial 4.7, the
CDN landing page message is updated to note the requirement
of the newer Mercurial version. The warning for Mercurial 4.1
is removed since most users should be on a newer Mercurial
by now.

I resolved the bundlespec by creating a bundle from a local
repository that already has sparserevlog using the exact
bundle command arguments we use in production:

hg bundle -a -t none-v2;stream=v2

and running hg debugbundle --spec on the produced bundle.

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/ca8f4a90db9d
bundles: update bundlespec to include sparserevlog requirement r=zeid

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.