1513276 - Mass upgrade repos to sparserevlog

Reporter

Description

•

7 years ago

Mercurial 4.9 makes the "sparserevlog" repository feature enabled by default. This incremental improvement to revlog storage makes delta chains shorter (by trading I/O reads across larger distances). This makes revlogs smaller and makes fulltext revision reading faster. See https://www.mercurial-scm.org/repo/hg/rev/3764330f76a6 for performance numbers. At some point after Mercurial 4.9 is released, we should mass upgrade repos on hg.mozilla.org to use it. This will be similar to what we did in bug 1351859. Note: upgrading the repos will mean legacy clients won't be able to "stream clone." This could have adverse impact on CI, which has historically used stream clone heavily.

Gregory Szorc [:gps]

Reporter

Comment 1

•

7 years ago

I locally upgraded my Firefox repo to sparse revlogs and... wow - manifest performance is much improved! Operations like rebasing a series are significantly faster.

Connor Sheehan [:sheehan]

Assignee

Updated

•

6 years ago

Type: defect → enhancement

Priority: -- → P2

Mike Hommey [:glandium]

Comment 3

•

6 years ago

Copy/pasting the whole comment from bug 1562856 + adding my own comment.

(In reply to Connor Sheehan [:sheehan] from bug 1562856 comment #1)

(In reply to Mike Hommey [:glandium] from bug 1562856 comment #0)

As seen in https://glandium.org/blog/?p=3913, currently, cloning mozilla-unified (or mozilla-central, for that matter) takes an awful long time (except when doing a streaming clone).

One of the reasons is that the clonebundle is suboptimal wrt delta chains, and sparse-revlog (a feature new to mercurial 4.7) improves things.

The hg --config format.sparse-revlog=yes debugupgraderepo --run command should work to convert a repository.

If I convert a mozilla-unified clone, and create a new bundle from it, unbundling that bundle takes 10 minutes instead of the 20 minutes it takes to unbundle the last mozilla-unified zstd bundle from hg.cdn.mozilla.net.

Interesting, that's a much larger improvement than I had previously seen for sparse-revlog.

Strictly speaking, this is better to apply on all repositories, and shouldn't affect their clonability with older versions of mercurial. Ideally, the web heads would be updated too. But the most important part is applying this to the repositories that are used to create the clonebundles.

As I understand it, this is not true for stream-clone bundles, since they essentially send the raw revlog data over the wire with no extra processing. From the output of hg help clonebundles:

'hg debugcreatestreamclonebundle' can be used to produce a special streaming
clonebundle. These are bundle files that are extremely efficient to produce
and consume (read: fast). However, they are larger than traditional bundle
formats and require that clients support the exact set of repository data
store formats in use by the repository that created them. Typically, a newer
server can serve data that is compatible with older clients. However,
streaming clone bundles don't have this guarantee. Server operators need
to be aware that newer versions of Mercurial may produce streaming clone
bundles incompatible with older Mercurial versions.

This is the main blocker for us, since we don't have a consistent version of hg in use in CI. We should be on 4.8 almost everywhere, however I still see older versions in Taskcluster logs from time to time. IIRC there are some versions of hg baked in to Docker images, where upgrading the Docker image is undesirable. Decision tasks come to mind as one of the main cases of this, the last time I looked into it.

Oh right, stream clones would be affected. But nothing on CI should be doing stream clones of mozilla-unified, so we could start there.

With this new timing info it may be easier to justify chasing down those last few upgrades, or just doing the upgrade and letting things break to flush them out.

Connor Sheehan [:sheehan]

Assignee

Comment 4

•

6 years ago

(In reply to Mike Hommey [:glandium] from comment #3)

Oh right, stream clones would be affected. But nothing on CI should be doing stream clones of mozilla-unified, so we could start there.

Most tasks in CI are using streamed-clones of mozilla-unified. robustcheckout is designed to work specifically with that repo. See here for example.

However I took a look through a few of the tasks that I had previously believed were on older versions of hg (like decision tasks and tasks on Windows) and they all seem to be running at least 4.7. I'm going to try and carve out some time this week and see if I can find any outstanding locations. Otherwise we should do the upgrade soon.

Mike Hommey [:glandium]

Comment 5

•

6 years ago

Most tasks in CI are using streamed-clones of mozilla-unified.

Oh, I thought they were stream-cloning the branch they're on.

Kendall Libby [:fubar] (he/him)

Comment 6

•

6 years ago

(In reply to Connor Sheehan [:sheehan] from comment #4)

However I took a look through a few of the tasks that I had previously believed were on older versions of hg (like decision tasks and tasks on Windows) and they all seem to be running at least 4.7. I'm going to try and carve out some time this week and see if I can find any outstanding locations. Otherwise we should do the upgrade soon.

Is 4.7 sufficient, or do we need 4.9 as #c0 seems to imply?

How much time do you need to upgrade the repos? I have tentative approval for July 27 but need more info (eg how much time, when to start).

Flags: needinfo?(sheehan)

Mike Hommey [:glandium]

Comment 7

•

6 years ago

The feature was added in mercurial 4.7. 4.9 made it the default for new repositories.

Connor Sheehan [:sheehan]

Assignee

Comment 8

•

6 years ago

(In reply to Kendall Libby [:fubar] (he/him) from comment #6)

How much time do you need to upgrade the repos? I have tentative approval for July 27 but need more info (eg how much time, when to start).

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

Flags: needinfo?(sheehan)

Kendall Libby [:fubar] (he/him)

Comment 9

•

6 years ago

(In reply to Connor Sheehan [:sheehan] from comment #8)

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

Sure, but at some point we'll hit perf issues on the NFS mounts. What is the set of repos we need to update?

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

At this point I am uncomfortable with doing these upgrades this weekend. We don't have a good sense of how long it will take, nor who is doing the work and can recover if anything should go sideways. What are the risks is we delay this until the next TCW (Sept 21, and likely when Taskcluster services will migrate to GCP, so a good sized window)?

Flags: needinfo?(sheehan)

Connor Sheehan [:sheehan]

Assignee

Comment 10

•

6 years ago

(In reply to Kendall Libby [:fubar] (he/him) from comment #9)

(In reply to Connor Sheehan [:sheehan] from comment #8)

The upgrade takes about an hour per repo. We have a script that can upgrade repos in parallel, so we would just need to pass the set of repos to upgrade to that script and let it run.

Sure, but at some point we'll hit perf issues on the NFS mounts. What is the set of repos we need to update?

I have a prior obligation on the 27th, so I'm not sure I'll be able to babysit the upgrade process myself, unfortunately. The upgrade should be as simple as running an ad-hoc Ansible command to execute the script on each host in the hg.mo cluster, so hopefully someone else can do it if I define an upgrade/backout plan. I'll iron out those details next week.

At this point I am uncomfortable with doing these upgrades this weekend. We don't have a good sense of how long it will take, nor who is doing the work and can recover if anything should go sideways. What are the risks is we delay this until the next TCW (Sept 21, and likely when Taskcluster services will migrate to GCP, so a good sized window)?

At minimum I'd like to update the "important" repos (central, autoland, release repos, anything running in CI, etc) to get the perf wins in CI and on developer machines. Eventually I'd like to have every repo on hgmo upgraded. That will take far too long for this TCW, though. There are no huge risks to delaying and upgrade, we just won't be using the latest and greatest storage formats until we do. I understand the caution, and since I won't be around to help with a rollback in the event something goes horribly wrong, I'm fine with this upgrade not taking place over this weekend.

Taking a look at the previous repository format upgrade (bug 1351859), we didn't actually use a TCW for many of the critical repos. Most of the information in that bug is still relevant such as the information in bug 1351859 comment 0, regarding the operation being safe to abort and creating a backup bundle of the repo. We could probably use the same strategy for this upgrade as the previous - perform the upgrade for critical repos whenever there is a low-traffic opportunity (evenings, weekends, etc), and upgrade the remaining repos during the next TCW.

Flags: needinfo?(sheehan)

Connor Sheehan [:sheehan]

Assignee

Comment 11

•

5 years ago

I've started upgrading the most "important" repos to sparserevlog, beginning with the CI-only hgweb mirrors.

Assignee: nobody → sheehan

Connor Sheehan [:sheehan]

Assignee

Comment 12

•

5 years ago

Attached file bundles: update bundlespec to include sparserevlog requirement (Bug 1513276) r?zeid — Details

This commit updates the bundle spec for stream clone bundles
to include the sparserevlog repo requirement. After the
repo format upgrade in the upcoming TCW, this requirement
will be present on all stream clone bundles and thus must be
advertised in the clonebundles manifest for each repository.

Since sparserevlog was introduced in Mercurial 4.7, the
CDN landing page message is updated to note the requirement
of the newer Mercurial version. The warning for Mercurial 4.1
is removed since most users should be on a newer Mercurial
by now.

I resolved the bundlespec by creating a bundle from a local
repository that already has sparserevlog using the exact
bundle command arguments we use in production:

hg bundle -a -t none-v2;stream=v2

and running hg debugbundle --spec on the produced bundle.

Pulsebot

Comment 13

•

5 years ago

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/ca8f4a90db9d
bundles: update bundlespec to include sparserevlog requirement r=zeid

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Connor Sheehan [:sheehan]

Assignee

Updated

•

5 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Regressions: 1631243

Connor Sheehan [:sheehan]

Assignee

Comment 14

•

3 years ago

All of our production repos are upgraded, as well as any newly created repos.

Status: REOPENED → RESOLVED

Closed: 5 years ago → 3 years ago

Resolution: --- → FIXED

Bugzilla

Mass upgrade repos to sparserevlog

Categories

(Developer Services :: Mercurial: hg.mozilla.org, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: gps, Assigned: sheehan)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated

Comment 14

Attachment

General

Description

File Name

Content Type