Closed Bug 1232442 Opened 9 years ago Closed 8 years ago

Pre-seed AMI images with hg bundles from hg.cdn.mozilla.net

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: gps)

References

Details

Attachments

(1 file)

1) it fails (https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-beta-noarch/1450124939/mozilla-beta-bundle-bm73-build1-build0.txt.gz): added 7141 changesets with 53959 changes to 16912 files updating to branch default 129284 files updated, 0 files merged, 0 files removed, 0 files unresolved 304034 changesets found scp: /home/ftp/pub/firefox/bundles/mozilla-beta.hg.upload: No such file or directory 2) we use the bundleclone extension!
For posterity, the new official home of the bundles is at https://hg.cdn.mozilla.net/. If you want a machine readable index, we can start producing one. File a bug against Developer Services :: hg.mozilla.org.
See Also: → 1229532
I pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=09bd975cdcf5 to remove the --bundle arg from our hgtool usage. When looking at the logs it's hard to see this having any effect, because machines already have a hg share for try. However I did see some newly started AWS instances pull in 11k changesets, presumably because they're prepopulated with stale repos (bug 1229532).
I'm not sure the implications of changing this, but from my perspective as a server operator of hg.mozilla.org, I'd rather have clients re-clone from S3/CDN and pull up to 1k changesets than pull 10k changesets to an old snapshot. Also, I believe we had an uplift today. So the bundles for aurora, beta, release, etc might be out of date by a Gecko release for the next few hours yet.
Yes, I agree. Ideally we'll stop generating bundles on our side, and rely on the mercurial server, but I'm flagging up the preseeding of AWS instances to Rail. A machine readable manifest (in json or similar) on hg.cdn.mozilla.net would probably help him achieve that.
Depends on: 1232733
I actually removed the builders in bug 1229532, so lets morph this to handle the pre-seeding. Rail, where are we doing that ? I see support via hg_bundles but not anything using it, eg https://dxr.mozilla.org/build-central/search?q=hg_bundles&redirect=false&case=true
Flags: needinfo?(rail)
Summary: Stop generating hg bundles → Pre-seed AMI images with hg bundles from hg.cdn.mozilla.net
We stopped using bundles because they were outdated (broken builders) and cause a lot of pull traffic. We thought that using hg clone with bundleclone enabled is cheaper.
Flags: needinfo?(rail)
https://hg.cdn.mozilla.net/bundles.json now exists. Pre-seed away.
blocking-b2g: 2.2r? → ---
Bug 1270317 changed how mozharness does hg repo management. We now use the "auto pooled storage" feature of the share extension. The code is aggressive and requires the auto pooled storage feature to be enabled. It goes so far as to blow away existing clones not using pooled storage. The auto pooled storage stores repos under <share_base>/<sha1>, where <sha1> is the 40 character SHA-1 of rev 0 of the repo. This means the existing seeding in AMIs is effectively worthless now, as the new code won't use the data. We should update the seeding to create a single Firefox repo in <share_base>/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29. Ideally this would be a generaldelta repo created from a unified Firefox repo (like https://hg.mozilla.org/experimental/firefox-unified). However, that repo is still experimental and we're not currently generating bundles for it. So perhaps we can live with seeding from mozilla-central instead.
Depends on: 1270317
(In reply to Gregory Szorc [:gps] from comment #9) > Bug 1270317 changed how mozharness does hg repo management. > > We now use the "auto pooled storage" feature of the share extension. The > code is aggressive and requires the auto pooled storage feature to be > enabled. It goes so far as to blow away existing clones not using pooled > storage. > > The auto pooled storage stores repos under <share_base>/<sha1>, where <sha1> > is the 40 character SHA-1 of rev 0 of the repo. This means the existing > seeding in AMIs is effectively worthless now, as the new code won't use the > data. > > We should update the seeding to create a single Firefox repo in > <share_base>/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29. Ideally this would be > a generaldelta repo created from a unified Firefox repo (like > https://hg.mozilla.org/experimental/firefox-unified). However, that repo is > still experimental and we're not currently generating bundles for it. So > perhaps we can live with seeding from mozilla-central instead. we'd likely want a seed pulled there for: * m-c * m-beta * m-release * m-esr[all-that-we-use] For the mere fact that release and beta have lots of heads and csets not in central.
Ideally we should use the same mechanism as in-tree to prepopulate the images. ie adapt 02:44:34 INFO - Copy/paste: hg --config ui.merge=internal:merge --config extensions.robustcheckout=/builds/slave/try-and-api-15-000000000000000/scripts/external_tools/robustcheckout.py robustcheckout https://hg.mozilla.org/try /builds/slave/try-and-api-15-000000000000000/build/src --sharebase /builds/hg-shared --purge --upstream https://hg.mozilla.org/mozilla-central --revision 64f88603f59ac386ea7ff737c1168d1c0a6f6eb3 We'd have to get a copy of mozharness, maybe just from the archiver using default of mozilla-central. Alternatively, bug 1270951 just landed the robustcheckout extension in tools.
I'd prefer we seed from https://hg.mozilla.org/experimental/firefox-unified because that repo: 1) has all heads 2) is smaller 3) uses the generaldelta storage format If the number of operations per day is small, you /could/ `hg clone -U --uncompressed https://hg.mozilla.org/experimental/firefox-unified` to get the seed for this repo. However, before we do that we should a) consider removing the "experimental" label b) stand up bundle generation for this repo so clones are served from CDN/S3. That being said, seeding from a clone of mozilla-central should be fine for the short term. Although the first pull from aurora, beta, release, or esr will be a bit painful. I suppose the seeding mechanism could pull from all those repos so all the heads are present.
We're now generating bundles for the experimental/firefox-unified repo. However, I'm also standing up a "firefox" repo that will be a near exact copy of experimental/firefox-unified. That should be fully deployed in the next ~24h. At that time, we should seed the AMI with a stream clone bundle of that repo. That will be in the "stream (generaldelta)" column of https://hg.cdn.mozilla.net and the "packed1-gd" bundle listed at https://hg.cdn.mozilla.net/bundles.json. You can apply the bundle and populate repo caches for optimal initial consumption by doing something like: 1. hg --config format.generaldelta=true init 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29 2. cd 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29 3. hg debugapplystreamclonebundle <file> 4. hg pull https://hg.mozilla.org/firefox 5. hg branches 6. hg tags
The https://hg.mozilla.org/firefox repo is a single Mercurial repository with relevant heads from all important repos (mozilla-central, mozilla-inbound, mozilla-aurora, mozilla-release, esrs, etc). The repository is encoded as generaldelta, which means it is smaller than mozilla-central (even though it contains 30,000+ more commits!) Recent work in automation (namely bug 1270317) changed automation to always use shared, pooled storage for Mercurial repos. This meant that we only need a single store for Firefox repos. When this change was made, we didn't change AMI seeding. This means that a worker would clone the Firefox repo on first job that needed it. This is obviously inefficient. This commit changes the shared repo seeding so the pooled/shared repo now populated in automation is seeded at AMI generation time. So on first job run, most commits will be present and we'll only do an incremental pull. This restores the behavior from before bug 1270317 landed. There are multiple benefits: 1) Shared repo population will complete quicker (because we're only populating 1 repo) 2) We'll use less disk space for local repos (because we will only populate 1) 3) Jobs will start faster since most commits from most Firefox repos will already be present in the pre-populated shared repo. The previous version of this file had code to map the instance's current availability zone to an S3 bucket. As of bug 1249197, hg.mozilla.org advertised bundle URLs to the appropriate S3 endpoint based on the requesting IP. This favors same-AZ serving and means there should be 0 cost for data downloads from S3. Since this mapping is now done server side as part of clone bundles, we remove this feature. The previous version of this file downloaded a tar file of the .hg directory for various repos and uncompressed it. The new version just does an `hg clone` preferring "streaming clones." "Streaming clones" are effectively `tar | nc` and are extremely fast. IMO the tar file provides little value so it has been removed from the equation. A downside of not using a tar file is that seeding now talks to hg.mozilla.org instead of only S3. This could potentially drive a lot of load to hg.mozilla.org if multiple machines perform this seeding at the same time. However, 99+% of clone load will be offloaded to S3 via the clone bundles and hg.mozilla.org will only need to serve commits since the bundle was created. This should not be more than a few hundred commits and should not require much effort on behalf of the server. But if this does overwhelm the server, we can restore tar files. This commit assumes that all machines have Mercurial 3.7 as `hg` in PATH. If an older version of Mercurial is present, the clone will take several minutes longer than it should or it will fail due to the client not having bundle2 support (the firefox repo requires bundle2). A downside of this commit is that jobs not having the new shared/pooled storage code deployed will need to perform a full clone on first job because the old paths (e.g. /builds/hg-shared/mozilla-central) are no longer present. This only impacts legacy commits/jobs and the number of jobs should diminish over time. Furthermore, once hgtool is updated to use shared/pooled storage, this won't be an issue (that is tracked in bug 1270951). Review commit: https://reviewboard.mozilla.org/r/58922/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/58922/
Attachment #8761881 - Flags: review?(catlee)
Assignee: nobody → gps
Status: NEW → ASSIGNED
Comment on attachment 8761881 [details] Bug 1232442 - Seed images with a stream clone of the unified Firefox repo; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/58922/diff/1-2/
Attachment #8761881 - Flags: review?(jlund)
Comment on attachment 8761881 [details] Bug 1232442 - Seed images with a stream clone of the unified Firefox repo; https://reviewboard.mozilla.org/r/58922/#review56254 I don't think I would be a good candidate for this review as I am unfamiliar with this code. If you would like me to review it for knowledge sharing purposes, please re-request me and I will have a look.
Attachment #8761881 - Flags: review?(jlund)
https://reviewboard.mozilla.org/r/58922/#review57432 ::: modules/runner/templates/tasks/populate_shared_repos.erb:32 (Diff revision 2) > > -def is_try_slave(hostname): > - return hostname.startswith("try-") > - > - > -def get_availability_zone(): > +def clone_firefox(): > + """Clone the Firefox repo to the hg-shared directory.""" > + dest_dir = os.path.join(SHARE_BASE_DIR, FIREFOX_SHA1) > + if os.path.exists(dest_dir): > + log.info('%s already exists; skipping' % dest_dir) need to actually skip the operation here? ::: modules/runner/templates/tasks/populate_shared_repos.erb:89 (Diff revision 2) > log.warn("%s is not supported", hostname) > exit(0) > > - if is_try_slave(hostname): > - log.info("Try slave detected") > - dirs = get_prepopulated_dirs(is_try=True) > + # The Firefox repo is the only one large enough to warrant > + # seeding. > + exit(clone_firefox()) This breaks the behaviour below of exiting 0 even in the case of failure. This means that if we fail to clone this unified repo for any reason, then the machine will not be able to run any jobs. ::: modules/runner/templates/tasks/populate_shared_repos.erb:96 (Diff revision 2) > > if __name__ == "__main__": > try: > main() > except Exception: > log.exception("Failed to fetch tarballs, gracefully exiting...") This exception message needs updating.
Comment on attachment 8761881 [details] Bug 1232442 - Seed images with a stream clone of the unified Firefox repo; https://reviewboard.mozilla.org/r/58922/#review57692
Attachment #8761881 - Flags: review?(catlee)
Comment on attachment 8761881 [details] Bug 1232442 - Seed images with a stream clone of the unified Firefox repo; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/58922/diff/2-3/
Attachment #8761881 - Flags: review?(catlee)
(In reply to Nick Thomas [:nthomas] from comment #21) > Worth considering if these kind of prepopulations > > https://github.com/mozilla/build-cloud-tools/blob/master/instance_data/us- > east-1.instance_data_prod.json > > https://github.com/mozilla/build-cloud-tools/blob/master/instance_data/us- > east-1.instance_data_try.json > > https://github.com/mozilla/build-cloud-tools/blob/master/configs/ > Ec2UserdataUtils.psm1#L831 > are also obsolete with the changes here. Looks like it. What are these used for? Windows machines? Does a fix belong in this bug or elsewhere?
Comment on attachment 8761881 [details] Bug 1232442 - Seed images with a stream clone of the unified Firefox repo; https://reviewboard.mozilla.org/r/58922/#review60762
Attachment #8761881 - Flags: review?(catlee) → review+
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Blocks: 1286335
Blocks: 1286336
Blocks: 1286430
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: