Closed Bug 1249197 Opened 6 years ago Closed 6 years ago

Advertise S3 URLs to clients in EC2

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(3 files)

Currently, clone bundles / bundleclone manifests expose CDN URLs first and list S3 URLs for clients that prefer them. This means clients must explicitly set some hgrc options to "prefer" the S3 URLs over the CDN.

We want clients in EC2 to use the S3 URLs (assuming they are in a region where we have the bundles uploaded to S3) because S3 is faster and transfers from S3 are free within EC2. If we require clients to explicitly set a config option, someone will inevitably forget and we'll have slower and costlier clones.

Let's IP sniff on the servers and only advertise S3 URLs for the same region an EC2 request is coming from. Everywhere else will continue to get the full manifest with the CDN first.
We'll need this to examine source IPs to see if they are from AWS.

File obtained from https://ip-ranges.amazonaws.com/ip-ranges.json
and checked in unmodified.

Review commit: https://reviewboard.mozilla.org/r/35567/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35567/
Attachment #8721093 - Flags: review?(klibby)
Python 2 doesn't have a nice API for parsing IP addresses and doing
netmask foo. Python 3.3 introduced the "ipaddress" package. There is a
backport for Python 2, which we install in this commit.

Review commit: https://reviewboard.mozilla.org/r/35569/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35569/
Attachment #8721094 - Flags: review?(klibby)
This commit modifies the hgmo extension to inspect the source IP
address of HTTP requests for bundle clone and clone bundles manifests
and compare against known AWS IP ranges. If the request is coming from
an AWS region that we have S3 URLs for (identified by the presence of
"ec2region=" in the manifest metadata), we filter the manifest to only
advertise URLs in the same region. This guarantees an intra-region
transfer without any client configuration. This is fast and free
since intra-region S3 transfers don't cost anything!

In addition, we also reorder the manifest to advertise the stream clone
bundles first. We recommend clients in automation apply stream clone
bundles (over the default gzip bundles) because they are the fastest
mechanism to clone repositories (a stream clone bundle can apply at
>80 MB/s).

Tests for the new features have been added to the hgserver tests because
we have existing tests that run the code for generating manifests
and we want to use realistic values for testing.

Documentation for the bundle hosting has been updated to reflect the
change in behavior.

Review commit: https://reviewboard.mozilla.org/r/35571/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35571/
Attachment #8721095 - Flags: review?(klibby)
Attachment #8721094 - Flags: review?(klibby) → review+
Comment on attachment 8721094 [details]
MozReview Request: ansible/hg-web: install ipaddress Python package (bug 1249197); r?fubar

https://reviewboard.mozilla.org/r/35569/#review32283

lgtm
Comment on attachment 8721095 [details]
MozReview Request: hgmo: serve AWS clients bundles from local S3 region (bug 1249197); r?fubar

https://reviewboard.mozilla.org/r/35571/#review32285

lgtm, but tbh you might want a third set of brains on the python (I'm still new to it and am slightly fried from patching everything)
Attachment #8721095 - Flags: review?(klibby) → review+
Comment on attachment 8721093 [details]
MozReview Request: ansible/hg-web: install AWS IP ranges JSON file (bug 1249197); r?fubar

https://reviewboard.mozilla.org/r/35567/#review32287

my only question is: what's the process for determining when the AWS IP ranges have changed and need updating?
Attachment #8721093 - Flags: review?(klibby) → review+
Attachment #8721095 - Flags: review?(dminor)
https://reviewboard.mozilla.org/r/35567/#review32287

Yeah, I was trying to figure that out too. I was feeling too lazy to write a CRON. https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html says they send updates to a SNS topic when the file changes. Perhaps I could subscribe and have an e-mail sent to developer-services@ so we know when we need to update. If the frequency is too high we can develop an automated system.
https://reviewboard.mozilla.org/r/35567/#review32287

developer-services@ is now receiving an e-mail notification when the IP ranges change.
Comment on attachment 8721095 [details]
MozReview Request: hgmo: serve AWS clients bundles from local S3 region (bug 1249197); r?fubar

https://reviewboard.mozilla.org/r/35571/#review32317

::: hgext/hgmo/__init__.py:562
(Diff revision 1)
> +                              proto.req.env.get('REMOTE_ADDR'))

nit: I think pep8 would say this is underindented

::: hgext/hgmo/__init__.py:618
(Diff revision 1)
> +    except Exception:

It would be nice to add a warning here that an exception occurred and we're using the default values instead.
Attachment #8721095 - Flags: review?(dminor) → review+
(In reply to Gregory Szorc [:gps] from comment #8)
> https://reviewboard.mozilla.org/r/35567/#review32287
> 
> developer-services@ is now receiving an e-mail notification when the IP
> ranges change.

perfect! now let's not forget why when it doesn't change for 9 months...  ;-)
This is deployed to prod.
Assignee: nobody → gps
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 1249765
Blocks: 1252624
You need to log in before you can comment on or make changes to this bug.