Optimize hgmo for GCP
Categories
(Developer Services :: Mercurial: hg.mozilla.org, task)
Tracking
(Not tracked)
People
(Reporter: sheehan, Assigned: sheehan)
References
Details
Attachments
(7 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
We need to do some work to port various AWS optimizations for hg.mo over to GCP. Namely:
- Uploading bundles to the GCP equivalent of AWS S3.
- Determining if the origin IP address for a request comes from a GCP advertised IP block, ie the GCP equivalent of AWS' IP ranges document, which appears to be this process.
- Prioritizing stream clone bundles from the same GCP region to requests coming from these IP blocks.
Comment 1•5 years ago
|
||
(In reply to Connor Sheehan [:sheehan] from comment #0)
We need to do some work to port various AWS optimizations for hg.mo over to GCP. Namely:
- Uploading bundles to the GCP equivalent of AWS S3.
- Determining if the origin IP address for a request comes from a GCP advertised IP block, ie the GCP equivalent of AWS' IP ranges document, which appears to be this process.
- Prioritizing stream clone bundles from the same GCP region to requests coming from these IP blocks.
Connor: can you estimate how much work would be required for this? Are we talking days/weeks/months?
Assignee | ||
Comment 2•5 years ago
|
||
It should be a few days work - I've already made some considerable progress on it. I'm away starting tomorrow afternoon, returning next Thursday (Oct3-9, back the 10th). I'll do my best to have it deployed shortly after I return.
Comment 3•5 years ago
|
||
(In reply to Connor Sheehan [:sheehan] from comment #2)
It should be a few days work - I've already made some considerable progress on it. I'm away starting tomorrow afternoon, returning next Thursday (Oct3-9, back the 10th). I'll do my best to have it deployed shortly after I return.
Connor: checking in now that you're back, do you still think you'll be able to tackle this ASAP? We're definitely still hitting hg bottlenecks in GCP.
Assignee | ||
Comment 4•5 years ago
|
||
(In reply to Chris Cooper [:coop] pronoun: he from comment #3)
Connor: checking in now that you're back, do you still think you'll be able to tackle this ASAP? We're definitely still hitting hg bottlenecks in GCP.
Yes, I'm making progress here still. The remaining work is to determine what GCS bucket storage class we should be using, then create the bundles and point them at the new buckets. If we're experimenting in a single GCE region we can simply create a bucket there and serve bundles from that region for all incoming GCP requests. If we're already in multiple regions, we may need to do some more work/I will need to speak with someone from CloudOps to determine the best path forward here, due to limitations in GCP APIs.
Which GCE regions are we running the builds out of?
Assignee | ||
Comment 5•5 years ago
|
||
I spoke with Brian and apparently we are in us-central1 only, for the time being. This will allow me to complete this optimization fairly easily, after which I'm going to get started on the work to stand up private hgweb mirrors for GCP. Having the mirrors stood up will allow me to work around the aforementioned API limitations in GCP.
Assignee | ||
Comment 6•5 years ago
|
||
This commit adds a new subcommand to scrape-manifest-ip-ranges.py
,
which scrapes Google's DNS records to gather information about it's
public IP address blocks. The process implemented in this commit is
outlined in Google's cloud support docs. [1]
To summarize, we use the dnspython
DNS toolkit to first run a
query for _cloud-netblocks.googleusercontent.com
. This query
will return a list of domains, each returning a set of IP blocks
for Google Cloud Platform services. The resulting blocks are then
saved to a file on disk. An example output looks like:
ip4:35.199.0.0/17
ip4:35.199.128.0/18
ip4:35.235.216.0/21
ip6:2600:1900::/35
ip4:35.190.224.0/20
Assignee | ||
Comment 7•5 years ago
|
||
This commit adds a systemd unit and timer to schedule runs of the
GCP address scraper on hg-web. The unit and timer are copies of
the AWS scraper's unit/timer, except the gcp
subcommand of
the manifest scraper script is called instead.
Assignee | ||
Comment 8•5 years ago
|
||
We will need this dependency for an upcoming commit which adds code
to query DNS records.
Assignee | ||
Comment 9•5 years ago
|
||
The bundle generation script will soon need to upload files to Google
Cloud Storage. This commit updates requirements-bundles.txt
to add
the required SDK dependencies.
Assignee | ||
Comment 10•5 years ago
|
||
This commit adds Terraform configs for a GCS bucket and service account
required to publish Mercurial clonebundles to GCP. The service account
represents the hgssh master server process which generates the bundle
and uploads to GCP, with the corresponding key being used for credentials.
The bucket is created with a retention policy and lifecycle policy of
7 days. The retention policy holds data as undeletable for a minimum of
7 days and the lifecycle policy deletes the data after it is 7 days
past expiration time.
Assignee | ||
Comment 11•5 years ago
|
||
This commit extends the clonebundle generation and upload script to
also upload generated bundles to a GCS bucket in us-central1
. The
format from the S3 bundle upload was mostly replicated and GCS APIs
were substituted for the S3 APIs. Most region-specific operations are
left in loops to facilitate easily extending into more GCS regions.
test-clonebundles.t
was updated to reflect new clonebundles manifest
entries and the bundleclone.rst
documentation includes the new
gceregion
bundle attribute.
Assignee | ||
Comment 12•5 years ago
|
||
This commit teaches the hgmo extension to prioritize stream clone bundles
when responding to clone requests from IP addresses in GCP. To do so we
make the filter_manifest_for_aws_region
more generic to account for the
new GCP regions. We add a new config option which points to a path on disk
where the previously added GCP IP scraper will dump a file containing IP
addresses for known GCP blocks. This file is mocked out by adding an
example file to the docker-hg-web
Ansible role.
Comment 13•5 years ago
|
||
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/4cbcfeb98791
ansible/hg-web: add dnspython
to venv_tools3
on hgweb r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/833e7e7f3c2f
scripts: add gcp
option to scrape-manifest-ip-ranges.py
r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/b86cd8ce560a
ansible/hg-web: add systemd unit and timer for GCP IP address scrape r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/7bf7c9fbab4f
ansible/hg-ssh: add google-cloud-storage
dependency to requirements-bundles.txt
r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/dfc58dbfcca3
terraform: create resources to store Mercurial clonebundles in GCP r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/9e2203249fdb
hgserver: extend bundle generation script to upload to GCS r=smacleod
https://hg.mozilla.org/hgcustom/version-control-tools/rev/6c25d57a7552
hgmo: prioritize stream clone bundles when cloning from GCP r=smacleod
Assignee | ||
Comment 14•5 years ago
|
||
This has landed but needs to be deployed and tested in production to assert the GCP upload/download works as intended. I'll be taking care of that tomorrow morning.
Comment 15•5 years ago
|
||
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/9d5409716a91
terraform: switch from bucket ACL to IAM member policy
https://hg.mozilla.org/hgcustom/version-control-tools/rev/b7590e298924
terraform: grant admin bucket privileges for hgbundler service account
https://hg.mozilla.org/hgcustom/version-control-tools/rev/48bc0a9aa838
bundles: fix busted import of Google cloud SDK
https://hg.mozilla.org/hgcustom/version-control-tools/rev/9ecc4d94fa5b
ansible/hg-ssh: specify path to hgbundler credentials file
Assignee | ||
Comment 16•5 years ago
|
||
Need to deploy and test one last piece.
Assignee | ||
Comment 17•5 years ago
|
||
This is deployed. Now when running clone tasks from within GCP the initial download of the bundle will come from GCS. It will also be a stream-clone bundle, which is better on fast networks.
Full download and working directory checkout:
cosheehan@instance-test-google-bundles:~$ time /home/cosheehan/.local/bin/hg clone https://hg.mozilla.org/mozilla-unified
destination directory: mozilla-unified
applying clone bundle from https://storage.googleapis.com/moz-hg-bundles-gcp-us-central1/mozilla-unified/fa97283e9f5d89b55d24eeb4171036bd34d12f00.packed1.hg
545204 files to transfer, 3.08 GB of data
transferred 3.08 GB in 62.0 seconds (50.8 MB/sec)
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
added 389 changesets with 6891 changes to 6244 files (+2 heads)
new changesets 5d748daa45d3:8a47372311a9
updating to branch default
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
282800 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 3m45.904s
user 3m33.910s
sys 0m54.175s
Working directory checkout with a cached repo:
cosheehan@instance-test-google-bundles:~$ time /home/cosheehan/.local/bin/hg share mozilla-unified/ share-unified
updating working directory
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
282800 files updated, 0 files merged, 0 files removed, 0 files unresolved
real 2m4.927s
user 2m33.870s
sys 0m36.591s
Tested on a n2-standard-2 (2 vCPUs, 8 GB memory), premium network tier, standard persistent disk. The performance for checkouts might be improved on the build instances (as is the case here, where it only takes ~45s), this test was mostly to make sure the new code was functioning correctly in production. But the initial clone of 18-20m should instead be as fast as Google's networks will allow us to transfer bits.
Comment 18•5 years ago
|
||
Thanks, Connor.
I kicked off a Try run to test this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20c99f633f3c035d2f2f66d017f97f9677ed7201
If there's an existing clone on a worker, am I going to see any improvement, or any evidence otherwise in the log?
Builds are still completing, but if I look at the linux64 opt plain build that I normally use as a metric, I can't tell from the log whether anything has changed. Granted, it's an existing clone, but should I be worried about "region gecko-1 not yet supported?"
[vcs 2019-10-25T00:14:52.667Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2019-10-25T00:14:53.057Z] region gecko-1 not yet supported; using public hg.mozilla.org service
[vcs 2019-10-25T00:14:53.057Z] fetching hg.mozilla.org fingerprint from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgfingerprint
[vcs 2019-10-25T00:14:53.184Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--config', 'hostsecurity.hg.mozilla.org:fingerprints=sha256:17:38:aa:92:0b:84:3e:aa:8e:52:52:e9:4c:2f:98:a9:0e:bf:6c:3e:e9:15:ff:0a:29:80:f7:06:02:5b:e8:48,sha256:8e:ad:f7:6a:eb:44:06:15:ed:f3:e4:69:a6:64:60:37:2d:ff:98:88:37:bf:d7:b8:40:84:01:48:9c:26:ce:d9', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '20c99f633f3c035d2f2f66d017f97f9677ed7201', 'https://hg.mozilla.org/try', '/builds/worker/workspace/build/src']
[vcs 2019-10-25T00:14:53.282Z] (using Mercurial 4.8.1)
[vcs 2019-10-25T00:14:53.282Z] ensuring https://hg.mozilla.org/try@20c99f633f3c035d2f2f66d017f97f9677ed7201 is available at /builds/worker/workspace/build/src
[vcs 2019-10-25T00:14:53.862Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[vcs 2019-10-25T00:14:54.136Z] (sharing from existing pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)```
Updated•5 years ago
|
Assignee | ||
Comment 19•5 years ago
|
||
No, this won't make a difference if there's an existing clone on the worker. In that case we would still need to hg pull
the new changes from the public hgweb endpoint in MDC1 (looks to take about 10s from the log in comment 18), then perform a working directory checkout on the worker, which takes about 45s.
The line about "region gecko-1 not yet supported" relates to private mirrors. run-task
fetches a Taskcluster secret and checks if the value of TASKCLUSTER_WORKER_GROUP
environment variable is a key in the secret. If the key exists, the worker group is supported for private hgweb mirrors and the value that maps to the key contains some configuration regarding how to communicate with the private mirror. Since we don't have mirrors for GCP yet, it's expected we see that line in the logs.
In tasks where this change will make a difference, after "cloning from upstream repo", we won't see "sharing from existing pooled repository". Instead we'll see the output from comment 17, "applying clone bundle from https://storage.googleapis.com/moz-hg-bundles-gcp-us-central1/<repo>/<revision>.packed1.hg".
Description
•