Closed Bug 1351859 Opened 7 years ago Closed 5 years ago

Mass upgrade repos to generaldelta

Categories

(Developer Services :: Mercurial: hg.mozilla.org, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

Details

(Keywords: leave-open)

Mercurial 3.7 uses the "generaldelta" storage mechanism by default. This storage format computes deltas in storage against any parent, not the logical parent before it. For repos with lots of merges, this can have a drastic impact on repo size. For example, non-generaldelta mozilla-central is ~2,183 MB on disk (not including inode overhead). With generaldelta, it is ~1,758 MB on disk.

generaldelta is why the mozilla-unified repo (a superset of central, aurora, beta, release, etc) is smaller than mozilla-central.

Since generaldelta repos use more optimal storage, they generally result in faster repo operations. This includes clone, pull, and push times. This also includes more common operations, like committing.

We haven't yet mass converted repos to generaldelta on the server out of concern for performance. A drawback with generaldelta is that when a server sends data to legacy clients, the server may have to re-encode data to non-generaldelta. On large repos (like the Firefox repo), this can result in a ton of extra CPU.

Automated clients under Mozilla's control are responsible for most requests against hg.mozilla.org. Most are running Mercurial 3.9 or 4.1, which means they can consume generaldelta repos optimally without incurring any extra server-side load.

And, enough time has passed that the number of legacy clients not supporting generaldelta data exchange has decreased to a point where we can afford the server-side CPU hit.

Mercurial 4.1 introduced an `hg debugupgraderepo` command that can be used to upgrade a Mercurial repository in place to use the latest/greatest storage format. It can also re-encode data during the upgrade so optimal/minimal storage is used.

That command works by creating a new, empty "store" directory then copies revlogs one-by-one using an internal API. It basically replays all data using the new code paths so optimal storage is used. During the conversion, the repo is locked and can't be written to. At the end of conversion, it does a mostly-atomic swap of the .hg/store directory. It also leaves a backup of the old .hg/store directory around in the .hg directory.

The process for upgrading repositories to generaldelta basically involves running `hg debugupgraderepo` on each repository. We should be running with `--optimize redeltamultibase` and either `--optimize redeltaall` or `--redeltaparentoptimize`. This combination will minimize repo storage by forcing deltas to be recomputed, taking advantage of generaldelta.

The time it takes to run `hg debugupgraderepo` with delta calculation varies on the size of the repo. For a mozilla-central like repo, it likely takes 2-4 hours on the hgssh server and NFS. During that time, the repo is read-only. This is unfortunate. The command can be aborted any time without data loss. So scheduling the upgrade of critical repos may not be necessary. e.g. we can ask the sheriffs when a quiet time will be and just do it. If there is a chemspill and we need to land something, we abort the upgrade and try again later.

Once a repo is upgraded on hgssh4, we'll need to re-clone the repo on each hgweb node so it serves the generaldelta version. I think there is an Ansible playbook for that that will even do the re-clone in such a way that will result in no client-visible downtime.

Another concern with mass upgrading the repos is inode bloat and backup overhead on the NFS server. Snapshot backups will encounter hundreds of thousands or millions of new files from the "store backup" directory. This could potentially bring the backup system to its knees. We may want to coordinate with the backup process operators when we do mass upgrading.
I have started manually converting a cherry-picked list of repos to generaldelta on hgssh. Last night, I converted build/mozharness, build/talos, build/tools, comm-central, hgcustom/version-control-tools, and releases/mozilla-esr31. I did this by running the command `sudo -u hg /var/hg/venv_hg_pre/bin/hg --config format.usegeneraldelta=true debugupgraderepo --optimize redeltamultibase --optimize redeltaall --run`. When that completed, I also had to run the `repo-permissions` script to restore the group owner because `sudo -g` isn't currently working on the server.

The upgrade itself seemed to go off without a hitch. Bundles for the new repos were generated without error.
I am currently upgrading releases/mozilla-beta on hgssh5. I've announced this in #sheriffs and they are aware they need to ping me if the repo needs to be urgently unlocked.

Also, I haven't yet propagated the generaldelta upgrade of any of these repos to the hgweb machines. At this point, I'm more interested in getting the canonical repo upgraded: we can replicate the upgrade later.
I aborted the mozilla-beta upgrade because it was going to take another 3-5 hours and I'm not going to be around for another 3 hours to babysit it.

I suspect that repo was taking forever to convert because of the unholy number of DAG heads it has. There's a lot more expensive delta work to do on that repo compared to more linear repos.
releases/mozilla-esr38 upgraded. releases/mozilla-release should finish upgrading in ~10 minutes.
I'm upgrading releases/mozilla-esr45 and a bunch of releases/mozilla-b2g repos. At this point, I'm basically working my way through the repos we generate bundles for because repos with bundles tend to be higher traffic or more important and are therefore the repos we care most about upgrading.
gcox: could you please tell me about the backup situation for the hg NFS mount? Specifically:

1) Will creating potentially a few million short-lived files during the upgrade cause problems for the backup mechanism?

2) How many days of backups do we retain and what is the process for obtaining files from a backup? In the rare case we have to revert the upgrade, it is nice to have a backup to fall back on. Currently, we're keeping Mercurial's backup around. But if we can delete that immediately and restore from the NFS backup up to N days later, that might be acceptable.
(In reply to Gregory Szorc [:gps] from comment #7)
> gcox: could you please tell me about the backup situation for the hg NFS
> mount?

back-reference  ancient-history bug 1311022, for some "how we got to where we are now".

> 1) Will creating potentially a few million short-lived files during the
> upgrade cause problems for the backup mechanism?

So, hg has ~26M inodes.  hg/mozilla/users has ~39M inodes.  users gets backed up weekly by bacula, main is nightly.
We were having timing issues when all the inodes were in one volume: 60-some million inodes was taking 20-some hours to back up.  The splitup got things back into a better space.

If you need "a few" million, it's probably fine to just roll with it, but DO clean up as soon as feasible.
If it's more than "a few", can we give you a separate volume that isn't on the daily backup location?  However, if they're all interspersed and can't be lumped into one subvolume (e.g. /repo/hg/tmp ?), probably best to just go for it.
If it's "a lot", let's look at it so we don't blow out the volume.

> 2) How many days of backups do we retain

On the filer:
users: 7 nightlies (at 0010 UTC), 6 'hourlies' (at 0800, 1200, 1600, 2000 UTC)
main:  4 nightlies (at 0010 UTC), 6 'hourlies' (at 0800, 1200, 1600, 2000 UTC)
(we skip 0400 UTC based on generally low changes)

Both are also backed up to bacula, but I don't know their retention period there, but it's probably in the 'months' range.

> and what is the process for obtaining files from a backup?

Files are pretty easy.
Short version: (cp or rsync) /repo/hg/.snapshot/(some snapshot)/(yourpathandfile) /repo/hg/(yourpathandfile).  If it's a few files, just do it as a self-service.
If it's a LOT of copies, let us know so we can make sure the snapshot doesn't age off in the middle of your restore.
Gentle reminder here, leave the .snapshot directory before it ages off and leaves a stale mount, ala bug 1269855 comment 11.

If you're to the point where things are so far out of reality that you need to go from bacula, you're at the point of "raise a flag", as there's no self-service capability on it and we'd need to work together on a return to service.

> In the rare case we have to revert the
> upgrade, it is nice to have a backup to fall back on. Currently, we're
> keeping Mercurial's backup around. But if we can delete that immediately and
> restore from the NFS backup up to N days later, that might be acceptable.

If you can do plain file restores, a cp from .snapshot is pretty good, but keep in mind there's no hooks to force a db-style 'hot backup mode' to where on-disk-consistency is a perfect restore point.  My snapshots are "your filesystem looked list THIS, at THAT point in time"... which is usually good enough for most applications, but, it's obviously not as tested as a backup from the actual vendor.

There's also a 'snap restore' restore option.  This isn't self-service, you'd need us to do it.  It's an instantaneous one-way trip into the past, taking the volume back to what it looked like at snapshot time.  We love these for maintenance windows, when you halt a service (making sure the snapshot is a copy of an unchanging data set), try your upgrade, and have a good rollback point.  Rolling back to a hot snapshot is possible, but it's got the risk of the not-cold-data (you didn't halt-or-stabilize the service for the snapshot), combined with unplanned data loss (people have been checking in commits between then and now).  It's a pretty nuclear option, but, it's there if we need it.
Thanks, Greg. That's very useful context and I think that provides enough to guide how we should eventually do the automated "mass" upgrade (which will likely run for days).

Before we kick off that high-volume upgrade, I'll try to remember to run the process by you so you can raise any red flags.
l10-central/* were also upgraded today. I figured I'd do some smaller repos to get a feel for how quickly they ran. (5-20s per repo.)
I upgraded the following repos today:

* integration/b2g-inbound
* integration/fx-team
* mozilla-central
* projects/ash
* projects/cedar
* projects/holly
* projects/jamun
* projects/larch
* projects/oak
* releases/mozilla-aurora
* releases/mozilla-beta

That leaves integration/mozilla-inbound as the single repo we generate bundles for still not using generaldelta. I'm not sure when I'll be able to convert inbound: lots of developers use it and a 6+ hour closure won't be appetizing. I may try to sneak it in on a quiet Sunday or wait for a TCW. Or I may do a one-off process that doesn't require the repo be read-only during the bulk of the upgrade. I'll figure out something.

The upgrades today took a while. I ran like 8 concurrently. I'm pretty sure it exhausted I/O or CPU because the conversions took upwards of 8 hours.

As part of investigating the slowness, I think I found a significant performance bug with the upgrade. Notably, each revlog revision performs a file open(), lstat(), seek(), write(), close() instead of reusing an open file descriptor. Over NFS, I think that translates to a substantial slowdown. I'll see if I can fix that upstream for the 4.2 release.

I still have yet to reclone any repo on the HTTP servers. I attempted that today. But the Ansible playbook we have is out of date *and* the load balancer integration is busted for reasons I don't yet understand. I may do it manually tomorrow if I can't figure out what's busted, as I really want these high volume repos switched over to generaldelta.
(In reply to Gregory Szorc [:gps] from comment #11)
> I upgraded the following repos today:
> 
> * integration/b2g-inbound
> * integration/fx-team
> * mozilla-central
> * projects/ash
> * projects/cedar
> * projects/holly
> * projects/jamun
> * projects/larch
> * projects/oak
> * releases/mozilla-aurora
> * releases/mozilla-beta
> 
> That leaves integration/mozilla-inbound as the single repo we generate
> bundles for still not using generaldelta.

Can you also upgrade `projects/date` please (since releng uses it as a tc-nightly testbed, and thus I have it locally too)
I recloned many converted repos to hgweb11 and hgweb12 today. The process was manual and looked similar to:

1. Remove host from load balancer
2. downtime host in nagios
3. rm -rf /repo/hg/mozilla/$repo
4. sudo -u hg /var/hg/venv_replication/bin hg init /repo/hg/mozilla/$repo
5. (from hgssh) /var/hg/venv_pash -R /repo/hg/mozilla/$repo replicatehgrc
6. sudo systemctl stop vcsreplicator@*.service
7. sudo -u hg /var/hg/venv_replication/bin/hg --config ui.clonebundleprefers=VERSION=packed1 -R /repo/hg/mozilla/$repo pull ssh://hg.mozilla.org/$repo
8. sudo /var/hg/version-control-tools/scripts/repo-permissions /repo/hg/mozilla/$repo hg hg wwr
9. for i in 0 1 2 3 4 5 6 7; do sudo systemctl start vcsreplicator@$i.service; done
10. (from hgssh) /var/hg/venv_pash -R /repo/hg/mozilla/$repo replicatesync
11. <wait for vcsreplicator to process backlog>
12. undowntime host in nagios
13. put host back in load balancer

hgweb11 and 12 are in the load balancer and serving requests. I figure I'll let things be this way over night to flush out any unexpected problems. If things are good in the morning, I'll proceed with cloning these repos on hgweb13 and 14. Then I'll send out an announcement email and encourage people to re-clone or upgrade their repos so they get better performance.
needinfo me to get projects/date upgraded and have bundles generated so it behaves like the prod repos.
Flags: needinfo?(gps)
I performed a one-off upgrade of mozilla-inbound over the course of yesterday and today. I basically rsync'd the repo, did an upgrade offline, then swapped in the new store and requirements while the repo was read-only and renamed on the hgssh server. After spot checking it, I put it back in service then recloned the repo on the HTTP servers.

As part of recloning the repo on the HTTP servers, I made my first real mistake of this conversion: I accidentally removed inbound on an HTTP server that was active in the load balancer. There were a few minutes where 50% of requests to mozilla-inbound failed. Oops.

At this point, all high-traffic repos (the set of repos we generate bundles for) have been converted to generaldelta on both the SSH and HTTP servers. The remaining work will be to convert the long-tail of other repos. I'm not sure when we'll do this. Given the amount of work involved, it is definitely something we should automate. If nothing else, that should prevent the mistake I made earlier today that resulted in inbound being intermittently available for a few minutes.
Blocks: 1354356
I upgraded projects/date today.
Flags: needinfo?(gps)
Assignee: gps → nobody
Status: ASSIGNED → NEW
Keywords: leave-open
I have resumed activity on this bug. I'm actively mass upgrading repos to generaldelta on hgssh4.
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/ed9e3b65f21d
scripts: script to upgrade Mercurial repos
All non-user repositories have been upgraded to generaldelta.

We still have several GB of .hg/upgradebackup.* directories sitting around. I'll purge them after running `hg verify` on the upgraded repos.

Next up are the user repos. 2191/2318 need upgraded.
I have stopped the mass repo upgrade in preparation for the winter break.

903/2318 user repos still need upgraded.

We'll finish the upgrade sometime in 2018.
I have resumed the upgrades. Upgrade processes running on hgssh4.
The upgrades finished sometime on Saturday!

The backups are being removed as I type this.

Next step is to mass re-clone the repos on to the hgweb machines so they are also serving generaldelta repos.
Blocks: 1513276
All repos should now be using generaldelta on all servers. I'm pretty sure we mass upgraded repos on hgweb long ago. If not, they were upgraded as a side-effect of the MDC1 migration.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.