AWS region-local caches for https stuff

RESOLVED FIXED

Status

Release Engineering
General Automation
P1
normal
RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: catlee, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1956] )

Attachments

(5 attachments, 12 obsolete attachments)

13.98 KB, patch
jlund
: review+
emorley
: checked-in-
Details | Diff | Splinter Review
1.16 KB, patch
catlee
: review+
emorley
: checked-in-
Details | Diff | Splinter Review
768 bytes, patch
pmoore
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
5.31 KB, patch
catlee
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
3.30 KB, patch
jlund
: review+
massimo
: checked-in+
Details | Diff | Splinter Review
(Reporter)

Description

3 years ago
We'd like to set up local-to-each-datacenter varnish caches for things like S3, FTP, tooltool.

Clients should then request files from their local varnish server(s), falling back to the canonical URLs only if fetching from varnish fails horribly.

Comment 1

3 years ago
Gozer,
We'd like to setup a reverse proxy in each AWS region that caches ftp and other Mozilla SSL http endpoints.

After positive experience with Varnish that was my first suggestion. However, the SSL req might make us look at nginx instead or varnish + something else.

Can you think of a solution for this?

We'd like to have the cache look like:

http://ftp.mozilla.org.ourcache.internal.aws.dns/foo -> https://ftp.mozilla.org/foo

We'd like to use docker to be able to specify domains to cache and size of each cache.

Updated

3 years ago
Summary: Local varnish caches for http stuff → AWS region-local varnish caches for https stuff

Comment 2

3 years ago
This important to avoid overloading our netscalers. Our loaners are scheduled for removal.
Flags: needinfo?(gozer)
Priority: -- → P1
(Reporter)

Comment 3

3 years ago
Also, could be nginx since varnish doesn't do SSL.
Summary: AWS region-local varnish caches for https stuff → AWS region-local caches for https stuff
(In reply to Taras Glek (:taras) from comment #1)
> Gozer,
> We'd like to setup a reverse proxy in each AWS region that caches ftp and
> other Mozilla SSL http endpoints.

A very good idea!

> After positive experience with Varnish that was my first suggestion.
> However, the SSL req might make us look at nginx instead or varnish +
> something else.
> 
> Can you think of a solution for this?

Varnish is an HTTP cache by design, and implementing SSL just isn't part of their plan.

However, since you are talking about possibly multiple end-points for this cache, I would not
give up on Varnish. Given that SSL is only a requirement for making connections to origin servers,
I would set thing up like this;

varnish => httpd/mod_proxy => origin

One of the main questions in designing this, however, would have to do with the amount of data being cached and the predicted hit rate.

Makes a big difference if what you anticipate is a very hot, but small working set, versus a relatively cold but large working set.

> We'd like to have the cache look like:
> 
> http://ftp.mozilla.org.ourcache.internal.aws.dns/foo ->
> https://ftp.mozilla.org/foo

That's simple enough for the varnish+httpd (varnish+nginx even) combo.

> We'd like to use docker to be able to specify domains to cache and size of
> each cache.

That would make a lot of sense, to build a docker image that takes as bootup argument something like:

{ 'caches':
  '/foo': {
    'parent': 'https://ftp.mozilla.org/foo',
    'size'  : '16G',
    'ttl'   : '1d',
  },
  '/people': {
    'parent': 'https://people.mozilla.org/',
    'size'  : '1G',
    'ttl'   : '1h',
  }
}

Or something like that. Then build the proprer httpd/mod_proxy config out of that, and let Varnish handle
the traffic.
Flags: needinfo?(gozer)

Comment 5

3 years ago
Thanks,
I do anticipate a hot and bounded working set. I think 80gb ssds on m3.xlarge will work well for this. 

George can you get us a docker image that works like Gozer's spec?
Assignee: nobody → george.miroshnykov
(Reporter)

Comment 6

3 years ago
> However, since you are talking about possibly multiple end-points for this
> cache, I would not
> give up on Varnish. Given that SSL is only a requirement for making
> connections to origin servers,
> I would set thing up like this;

I don't think SSL is a requirement only for making connections or origin servers. I think we should have SSL from the cache to the internal nodes as well.
Here's the initial implementation:
https://github.com/laggyluke/proxxy

(sorry, naming things is hard)

This allows you to configure and run nginx inside a Docker container.
We don't need varnish as nginx can handle LRU just fine.

I've used the config provided by Gozer and it seems to work, but maybe someone could give me an actual example URL that are going to be cached, so I can make sure it actually works?
Status: NEW → ASSIGNED

Comment 8

3 years ago
George,
This doesn't take account of disk space. Can you update the setup to set a total(or per cache) limit. Otherwise nginx will run out of disk and everyone will be sad, right?


Chris,
Can you provide a list of hosts to cache?
Flags: needinfo?(catlee)

Comment 9

3 years ago
George says there is no easy way to go a global size limited, have to rely on data partition size, quotas, etc
(Reporter)

Comment 10

3 years ago
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path says you can add a 'max_size' parameter to the proxy_cache_path configuration.
Flags: needinfo?(catlee)
(Reporter)

Comment 11

3 years ago
As for host names, let's start by proxying https://ftp.mozilla.org/...
I got a test version up and running on http://proxxy-dev.elasticbeanstalk.com/

In order to try it out, you'll have to play around with the DNS like so:
1. Run "nslookup proxxy-dev.elasticbeanstalk.com" and take a note of the load balancer's IP.
2. Add the following line to your /etc/hosts:

<ip-from-step-1> ftp.mozilla.org.proxy people.mozilla.org.proxy

3. Open http://ftp.mozilla.org.proxy or http://people.mozilla.org.proxy in your browser.
4. To make sure caching actually works, you can use curl to inspect X-Proxxy header like this:
curl -I http://people.mozilla.org.proxy/
HTTP/1.1 200 OK
...
X-Proxxy: HIT
...

Comment 13

3 years ago
Just clarified with George: dns is not setup so ftp.mozilla.org.proxy would be an entry in /etc/hosts until we setup route53 or some equivalent.
(Reporter)

Comment 14

3 years ago
You can also do this directly with curl without messing around with /etc/hosts. e.g.

curl -v -O -H "Host: ftp.mozilla.org.proxy" http://proxxy-dev.elasticbeanstalk.com/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1401821123/firefox-32.0a1.en-US.linux-x86_64.tests.zip

Comment 15

3 years ago
(In reply to Chris AtLee [:catlee] from comment #14)
> You can also do this directly with curl without messing around with
> /etc/hosts. e.g.
> 
> curl -v -O -H "Host: ftp.mozilla.org.proxy"
> http://proxxy-dev.elasticbeanstalk.com/pub/mozilla.org/firefox/tinderbox-
> builds/mozilla-inbound-linux64/1401821123/firefox-32.0a1.en-US.linux-x86_64.
> tests.zip

I meant dns is not setup. This is a better hack indeed :)

George:
< Server: nginx/1.4.7
< X-Backend-Server: ftp1.dmz.scl3.mozilla.com
< X-Cache-Info: not cacheable; response file size too large
Please disregard X-Cache-Info, it's coming from the backend server (ftp.mozilla.org) and is not relevant here.
You should only look at X-Proxxy to see if it was a hit or miss.
(Reporter)

Comment 17

3 years ago
I set up DNS:
ftp.mozilla.com.proxxy-dev.srv.releng.use1.mozilla.com -> CNAME to proxxy-dev.elasticbeanstalk.com

just need the app to recognize the new vhost?

*.proxxy-dev.srv.releng.use1.mozilla.com should resolve to the same CNAME.
ftp.mozilla.com.... is a typo which should be .org ? Also, do we stop using ftp-ssl.mozilla.org with this ?
That's right, I've only set up proxying for mozilla.org, not mozilla.com.
Please let me know it we need to proxy mozilla.com as well.

The new version is deployed, two routes are available:
http://ftp.mozilla.org.proxxy-dev.srv.releng.use1.mozilla.com/
http://people.mozilla.org.proxxy-dev.srv.releng.use1.mozilla.com/

As the app is running inside a Docker container, I've pushed the corresponding image into public Docker registry:
https://index.docker.io/u/laggyluke/proxxy/

Any config change requires new registry push, so eventually we'll need to either transfer it to Mozilla account on the public registry or host our own private registry.
Depending on how we plan to handle config secrets with Docker, the second option may be preferable.
The logs are now available on host the machine at the following location:
/var/log/eb-docker/containers/eb-current-app/access.log

They can be accessed via Elastic Beanstalk log interface.
(Reporter)

Comment 21

3 years ago
One thought - if we're going to be proxying private contents (e.g. tooltool), then these will need to be created inside the VPC and not be accessible from the public internet.

Comment 22

3 years ago
(In reply to Chris AtLee [:catlee] from comment #21)
> One thought - if we're going to be proxying private contents (e.g.
> tooltool), then these will need to be created inside the VPC and not be
> accessible from the public internet.

wonder if we should do that restriction on nginx side(eg to not serve certain paths from public ip)

Comment 23

3 years ago
What the plan for giving this endpoint ssl? Do we do self-signed certs in VPC?
(Reporter)

Comment 24

3 years ago
That's what we've done before for internal services. self-signed certs for the server, and distribute the public key to clients to verify.
(Reporter)

Comment 25

3 years ago
Looks like the us-east-1 instances are misconfigured. e.g.
https://github.com/laggyluke/proxxy/blob/master/config.json#L3 I think should be .use1 instead of .usw2.

This results in some interesting behaviour:
curl http://ftp.mozilla.org.proxxy-us-east-1-dev.srv.releng.use1.mozilla.com/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1402296158/jsshell-linux-i686.zip

returns the string "200 OK" in the body of the response. This should probably be a 404 or some other error in the HTTP response and an empty body.
Fixed that misconfiguration, now it returns 404 when used with unknown hostname.
Redeployed both regions.
(Reporter)

Comment 27

3 years ago
I'm testing this on Ash: https://tbpl.mozilla.org/?tree=Ash

Quite a few new failures to address:
- partial responses (show up as Caught exception: <urlopen error Download incomplete; content-length was 139627718, but only received 3592330>)
- Caught exception: HTTP Error 503: Service Unavailable: Back-end server is at capacity
- switched from m1.micro to m3.medium instances
- enabled logging to s3
- enabled rolling updates to avoid downtime

Comment 29

3 years ago
(In reply to George Miroshnykov ( :gmiroshnykov ) from comment #28)
> - switched from m1.micro to m3.medium instances
> - enabled logging to s3
> - enabled rolling updates to avoid downtime

Given catlee's problems with timeouts, don't use crap instances :)

Use c3.xlarge for now with cache stored on ssds. c3 have IO-SRV enhanced networking among other improvements over m3(according to docs anyway).
(Reporter)

Comment 30

3 years ago
Still TODO:
- good metrics on caching hit rates
- moving proxxy inside VPC so we can cache pvtbuild results
(Reporter)

Updated

3 years ago
Depends on: 1035891
Blocks: 1042722
(Reporter)

Comment 31

3 years ago
Created attachment 8464860 [details] [diff] [review]
add ProxxyMixin to mozharness

We've been testing this on Ash off and on for the past few weeks. It's been working pretty well I'd say.

The idea is to use a local HTTP cache for certain base URLs. E.g. we want to download build and test packages via the cache. This is to reduce the amount of data we transfer over the network from mozilla.
Attachment #8464860 - Flags: feedback?(jlund)
Comment on attachment 8464860 [details] [diff] [review]
add ProxxyMixin to mozharness

Review of attachment 8464860 [details] [diff] [review]:
-----------------------------------------------------------------

neato! lvgtm :D

::: mozharness/mozilla/proxxy.py
@@ +27,5 @@
> +            return urls
> +
> +        proxxy_urls = cfg.get('urls', None)
> +        proxxy_instances = cfg.get('instances', None)
> +        if proxxy_urls is None or proxxy_instances is None:

I feel like if we make it here we are in a situation where we are expecting to use a proxy server but we did not configure correctly. Maybe we should add some logs stating so? I suppose we could always grep the logs for 'URL Candidate: {url}' or '{url} matches {prefix}'

@@ +42,5 @@
> +            if url.startswith(prefix):
> +                self.info("%s matches %s" % (url, prefix))
> +                for instance in proxxy_instances:
> +                    new_url = "http://{target}.{instance}{url_path}".format(
> +                        target=target, instance=instance, url_path=url_path)

I know we've used format elsewhere for individual scripts but IIRC we have been bitten where the py version used against mozharness is < 2.7 and it would barf here. Although if that still is the case, I suppose there is the argument we should fix the builder's interpreter being passed to ScriptFactory for those scripts over switching to %(foo)s here.

@@ +55,5 @@
> +        """Sorting function for urls.
> +
> +        Clients should try proxxy instances in their own region first, and then
> +        proxxy instances in other regions, finally the original url."""
> +        # TODO: Don't hardcode these regions

I agree. I like how we have no mention of AWS in this mixin at all up to this point. Maybe could do something like (untested):

        fqdn = socket.getfqdn()
        for region in self.config.get('proxxy', {}).get('regions', []):
            if all(region in target for target in [url, fqdn]):
                return 0
        if "proxxy" in url:
            return 1
        else:
            return 2

we could even put the regions we are worried about today as default if we didn't want to add that key to every proxxy config. That way it is at least overwrite-able:
        for region in self.config.get('proxxy', {}).get('regions', ['.use1.', '.usw2.']):

@@ +66,5 @@
> +            return 1
> +        else:
> +            return 2
> +
> +    def download_proxied_file(self, url, file_name=None, parent_dir=None,

how many platforms/builders are going to eventually use this? j/w bc TestingMixin is used extensively across our test jobs so it may mean we need to flood a lot of configs with identical proxxy dicts. If you are just testing on linux desktop unittests and we plan to do it everywhere, maybe at that point we could have a default PROXXY_CONFIG in ProxxyMixin or a proxy_config param in download_proxied_file signature?
Attachment #8464860 - Flags: feedback?(jlund) → feedback+
(Reporter)

Updated

3 years ago
Depends on: 1046770
(Reporter)

Comment 33

3 years ago
Created attachment 8465550 [details] [diff] [review]
add ProxxyMixin to mozharness

Changes from preview patch:
- Created PROXXY_CONFIG in ProxxyMixin with our current default configs, since this will be shared among many different scripts. It can be overridden by self.config

- Change URL sorting to discard proxxy URLs that aren't local. The way we have things set up right now, fetching from a remote proxxy instance would incur at least 2 more round trips for the data, actually making things worse for the network.

- Moved region hostname identifiers into config
Attachment #8464860 - Attachment is obsolete: true
Attachment #8465550 - Flags: review?(jlund)
Comment on attachment 8465550 [details] [diff] [review]
add ProxxyMixin to mozharness

Review of attachment 8465550 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm assuming current ash run goes well.
Attachment #8465550 - Flags: review?(jlund) → review+
(Reporter)

Comment 35

3 years ago
Comment on attachment 8465550 [details] [diff] [review]
add ProxxyMixin to mozharness

Review of attachment 8465550 [details] [diff] [review]:
-----------------------------------------------------------------

landed on default
Attachment #8465550 - Flags: checked-in+
Created attachment 8465734 [details] [diff] [review]
sets retry download attempts to 3 and sleeptime to 30 per url

r=catlee

pushed to default -> https://hg.mozilla.org/build/mozharness/rev/9b3bcca7aab9
Attachment #8465734 - Flags: review+
Created attachment 8465749 [details] [diff] [review]
proxxy_set_retry_sleeptime_to_default_60.patch

keeps defaults of 60, r=catlee
on default -> https://hg.mozilla.org/build/mozharness/rev/9f9a1f6cb032
Attachment #8465749 - Flags: review+
Attachment #8465734 - Flags: checked-in+
Attachment #8465749 - Flags: checked-in+
Created attachment 8465796 [details] [diff] [review]
gaia-try-proxxy-fix

r=jgriffin

on default and merged to prod
Attachment #8465796 - Flags: review+
Attachment #8465796 - Flags: checked-in+
looks like there has been some fallout from this today.

1) when proxxy server fails we retry 5 times each time with a 60 second increase in sleep time. if each download fails that can chew up too much time in our overall script maxtime. So for download_proxied_file we lowered the retry attempts and sleeptime so we stop timing out

2) gaia scripts that avail of GaiaTest class, use an overrided _retry_download_file [1]. This method had no concept of multiple urls (proxy and non-proxy equiv) and when it failed its retries, it returned -1. This was bad as download_proxied_file[2] and other methods that use download_file, expect something that evals to False returned when the download fails.

both issues *should* be fixed.

[1] http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/testing/gaia_test.py#220
[2] http://mxr.mozilla.org/build/source/ash-mozharness/mozharness/mozilla/proxxy.py#81
Created attachment 8465808 [details] [diff] [review]
gaia-try-retry-sig-sleeptime-and-attempts.patch

r=catlee

one more fix hopefully.

if this does not resolve edge cases, I am going to back everything out and we can try again tomorrow.
Attachment #8465808 - Flags: review+
Attachment #8465808 - Flags: checked-in+
Created attachment 8465821 [details] [diff] [review]
backout_proxxy_test_work.patch

still seeing issues. backing out till we get a handle on everything.

this leaves ProxxyMixin file but removes reference to it
Attachment #8465821 - Flags: review?
Comment on attachment 8465821 [details] [diff] [review]
backout_proxxy_test_work.patch

It'll be easier to reland if proxxy.py gets removed too.
Attachment #8465821 - Flags: review? → review+
Comment on attachment 8465821 [details] [diff] [review]
backout_proxxy_test_work.patch

with proxxy.py removed too: https://hg.mozilla.org/build/mozharness/rev/a5db02723864
Attachment #8465821 - Flags: checked-in+
for the record we had this as last error:

Uncaught exception: Traceback (most recent call last):
File "/builds/slave/test/scripts/mozharness/base/script.py", line 1242, in run
self.run_action(action)
File "/builds/slave/test/scripts/mozharness/base/script.py", line 1184, in run_action
self._possibly_run_method(method_name, error_if_missing=True)
File "/builds/slave/test/scripts/mozharness/base/script.py", line 1125, in _possibly_run_method
return getattr(self, method_name)()
File "/builds/slave/test/scripts/mozharness/mozilla/testing/gaia_test.py", line 245, in download_and_extract
super(GaiaTest, self).download_and_extract()
File "/builds/slave/test/scripts/mozharness/mozilla/testing/testbase.py", line 327, in download_and_extract
self._download_test_zip()
File "/builds/slave/test/scripts/mozharness/mozilla/testing/testbase.py", line 224, in _download_test_zip
error_level=FATAL)
File "/builds/slave/test/scripts/mozharness/mozilla/proxxy.py", line 91, in download_proxied_file
exit_code=exit_code, attempts=3, sleeptime=30)
File "/builds/slave/test/scripts/mozharness/base/script.py", line 269, in download_file
url, file_name, error_level, attempts=attempts, sleeptime=sleeptime
TypeError: _retry_download_file() got multiple values for keyword argument 'attempts'
Running post_fatal callback...

It was another easy fix but at this point I feel like we have maxed out our attempts at hot fixing. We should fix this without the pressure of bustage and do it right. e.g.: possibly not have GaiaTest use ProxxyMixin at all.
Suspect https://tbpl.mozilla.org/php/getParsedLog.php?id=45029804&full=1&branch=mozilla-inbound is related
Depends on: 1047244
Attachment #8465796 - Attachment is patch: true
Attachment #8465796 - Attachment mime type: text/x-patch → text/plain
Attachment #8465734 - Attachment is patch: true
Attachment #8465734 - Attachment mime type: text/x-patch → text/plain
(In reply to Pete Moore [:pete][:pmoore] from comment #45)
> Suspect
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=45029804&full=1&branch=mozilla-inbound is related

Not related after all.
No longer depends on: 1047244
(Reporter)

Comment 47

3 years ago
Ok, I figured out why the latest patches were failing.

compare gaia_test.py's version of _retry_download_file with script.py's version:

def _retry_download_file(self, url, file_name, error_level, attempts=None, sleeptime=60) # script.py
def _retry_download_file(self, url, file_name, attempts=None, sleeptime=60, error_level=FATAL) # gaia_test.py

The order of the arguments is different. script.py calls _retry_download_file with 3 positional arguments (url, file_name, error_level), and then some keyword arguments.
(Reporter)

Comment 48

3 years ago
Created attachment 8466244 [details] [diff] [review]
add ProxxyMixin to mozharness
Attachment #8465550 - Attachment is obsolete: true
Attachment #8465734 - Attachment is obsolete: true
Attachment #8465749 - Attachment is obsolete: true
Attachment #8465796 - Attachment is obsolete: true
Attachment #8465808 - Attachment is obsolete: true
Attachment #8465821 - Attachment is obsolete: true
Attachment #8466244 - Flags: review?(jlund)
Comment on attachment 8466244 [details] [diff] [review]
add ProxxyMixin to mozharness

lgtm. I'll attach the patch to disable Proxxy on GaiaTest after this
Attachment #8466244 - Flags: review?(jlund) → review+
Created attachment 8466340 [details] [diff] [review]
rm_gaia_from_proxxy3
Attachment #8466340 - Flags: review?(catlee)
Comment on attachment 8466340 [details] [diff] [review]
rm_gaia_from_proxxy3

my only concern is that maybe we should be using proxxy here: http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/testing/gaia_test.py#239

But I don't think that has to be solved today, we can add those jobs to proxxy once everything looks good with what our current patches will enable
(Reporter)

Updated

3 years ago
Attachment #8466340 - Attachment is patch: true
Attachment #8466340 - Attachment mime type: text/x-patch → text/plain
Created attachment 8466366 [details] [diff] [review]
rm_gaia_from_proxxy4

achieves the same... in a much simpler way.
Attachment #8466340 - Attachment is obsolete: true
Attachment #8466340 - Flags: review?(catlee)
Attachment #8466366 - Flags: review?(catlee)
(Reporter)

Updated

3 years ago
Attachment #8466366 - Attachment is patch: true
Attachment #8466366 - Attachment mime type: text/x-patch → text/plain
(Reporter)

Updated

3 years ago
Attachment #8466366 - Flags: review?(catlee) → review+
(Reporter)

Updated

3 years ago
Attachment #8466244 - Flags: checked-in+
Comment on attachment 8466366 [details] [diff] [review]
rm_gaia_from_proxxy4

https://hg.mozilla.org/build/mozharness/rev/550419ca6ff9
Attachment #8466366 - Flags: checked-in+
Merged to production: https://hg.mozilla.org/build/mozharness/rev/0c2710a85246
We're seeing lots of failures on ASan jobs, eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45178921&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45180231&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45179590&tree=Mozilla-Inbound
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=asan&fromchange=972daa5dd3c5&tochange=ac94341843e2
It looks like most of these errors are related to usw2.
I've tuned use1 to use a bigger instance type earlier today, let me do the same for usw2 now.
This should help both with timeouts and with poor cache hit/miss ratio.
Could this also be the cause of:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45182016&full=1&branch=mozilla-inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45181461&full=1&branch=mozilla-inbound

...
retry: Calling <function run_with_timeout at 0x7f4f645be938> with args: (['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android-x86/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'], 300, None, None, False, True), kwargs: {}, attempt #10
Executing: ['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android-x86/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache']
WARNING: Timeout (300) exceeded, killing process 26266
Traceback (most recent call last):
  File "/tools/tooltool.py", line 898, in <module>
    main()
  File "/tools/tooltool.py", line 895, in main
    exit(0 if process_command(options, args) else 1)
  File "/tools/tooltool.py", line 790, in process_command
    cache_folder=options['cache_folder'])
  File "/tools/tooltool.py", line 562, in fetch_files
    temp_file_name = fetch_file(base_urls, f)
  File "/tools/tooltool.py", line 442, in fetch_file
    indata = f.read(grabchunk)
  File "/tools/python27/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/tools/python27/lib/python2.7/httplib.py", line 561, in read
    s = self.fp.read(amt)
  File "/tools/python27/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
KeyboardInterrupt
retry: Giving up on <function run_with_timeout at 0x7f4f645be938>
Unable to successfully run ['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android-x86/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'] after 10 attempts
program finished with exit code 1
elapsedTime=5002.783194
========= Finished 'sh /builds/slave/m-in-and-x86-00000000000000000/tools/scripts/tooltool/tooltool_wrapper.sh ...' failed (results: 2, elapsed: 1 hrs, 23 mins, 22 secs) (at 2014-08-04 09:01:03.501903) =========
I've retriggered the failing jobs - let's see how they fare :-)
We're still seeing issues, eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45227521&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45225510&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45223978&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=45225522&tree=Mozilla-Inbound
Backed out for now:
remote:   https://hg.mozilla.org/build/mozharness/rev/b3c1dc7f4979
remote:   https://hg.mozilla.org/build/mozharness/rev/7508d2b5f1a6

And merged to prod:
remote:   https://hg.mozilla.org/build/mozharness/rev/136cf767a41f
Attachment #8466244 - Flags: checked-in+ → checked-in-
Attachment #8466366 - Flags: checked-in+ → checked-in-

Comment 61

3 years ago
According to graphite:

availability zone a is our hottest in us-west-2

zone c in us-east-1

Placing our cache node in one of those should maximize bandwidth. Looks our algorithm/amazon-api doesn't change AZs often:
https://www.hostedgraphite.com/da5c920d/af587ddb-3e87-432e-8d10-7ed541694a6a/graphite/render/?width=588&height=310&_salt=1393631665.848&title=Spot%20instances&target=stacked%28groupByNode%28ec2.spot.us-east.*.*medium,3,%22sum%22%29%29&hideLegend=false&tz=America/Los_Angeles&lineMode=connected&from=-2months

https://www.hostedgraphite.com/da5c920d/af587ddb-3e87-432e-8d10-7ed541694a6a/graphite/render/?width=588&height=310&_salt=1393631665.848&title=Spot%20instances&target=stacked%28groupByNode%28ec2.spot.us-west.*.*medium,3,%22sum%22%29%29&hideLegend=false&tz=America/Los_Angeles&lineMode=connected&from=-2months

In last 2 months, only us-east changed our hot zone.
Proxxy instances are now running in us-east-1c and us-west-2a.
(Reporter)

Comment 63

3 years ago
Created attachment 8470034 [details] [diff] [review]
Use proxxy for pvtbuilds access
Attachment #8470034 - Flags: review?(pmoore)
Comment on attachment 8470034 [details] [diff] [review]
Use proxxy for pvtbuilds access

Review of attachment 8470034 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm!
Attachment #8470034 - Flags: review?(pmoore) → review+
(Reporter)

Comment 65

3 years ago
Comment on attachment 8470034 [details] [diff] [review]
Use proxxy for pvtbuilds access

https://hg.mozilla.org/build/mozharness/rev/6e4aab331905
Attachment #8470034 - Flags: checked-in+
(Reporter)

Comment 66

3 years ago
In production.

We're now serving files for pvtbuilds from the proxxy caches.
(Reporter)

Comment 67

3 years ago
We should also proxxy http://pypi.pvt.build.mozilla.org/pub, http://pypi.pub.build.mozilla.org/pub and http://runtime-binaries.pvt.build.mozilla.org
(Reporter)

Comment 68

3 years ago
some graphs for today so far.

us-west-2: http://people.mozilla.org/~catlee/sattap/38cb255a.png
us-east-1: http://people.mozilla.org/~catlee/sattap/63df9dc7.png

(the colours are reversed between the graphs...I don't know why)

we're clearly saving bandwidth, just by comparing the NetworkIn vs NetworkOut lines. Looks like about 10x savings.

Comment 69

3 years ago
Those c3.8xls are expensive buggers :). Lets keep them around for now, but figure out a way to move off them. One way is to replace them with cheaper nodes during low times...Another is to hash the url and have a host per hash. Eg if we peak at 3gb, we could have 3 weaker instances per region.
Created attachment 8471633 [details] [diff] [review]
Use Proxxy in Tooltool

This patch adds: 
* proxxy support for tooltool in mozharness
* pep8 fixes

I'd like to test this on ash. George is there anything to configure on the server side to proxy the content from:

* http://tooltool.pub.build.mozilla.org
* http://runtime-binaries.pvt.build.mozilla.org
* http://secure.pub.build.mozilla.org
Attachment #8471633 - Flags: review?(jlund)
Flags: needinfo?(george.miroshnykov)
Yes, we need to update Proxxy config and deploy a new version before applying the patch.

We use a special set of credentials baked into Proxxy to access pvtbuilds.
I've just checked and those credentials do not work for secure.pub.build.mozilla.org and runtime-binaries.pvt.build.mozilla.org.
We should get new credentials for those servers or extend permissions for existing credentials.
I'll ask around...
Flags: needinfo?(george.miroshnykov)
Dustin may have some ideas re comment 70 and comment 71
Comment on attachment 8471633 [details] [diff] [review]
Use Proxxy in Tooltool

Review of attachment 8471633 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. sounds like you got a good handle on how this works. see in line comments for questions

I'm guessing we can't do anything about this yet till we get the server side of proxxy working with the required urls?

I'm going to r- for now to reset my queue but I'm open to discussion and could r+ later on.

::: mozharness/mozilla/proxxy.py
@@ +28,5 @@
>              ('http://pvtbuilds.pvt.build.mozilla.org', 'pvtbuilds.mozilla.org'),
> +            # tooltool
> +            ('http://tooltool.pub.build.mozilla.org', 'tooltool.pub.build.mozilla.org'),
> +            ('http://runtime-binaries.pvt.build.mozilla.org', 'runtime-binaries.pvt.build.mozilla.org'),
> +            ('http://secure.pub.build.mozilla.org', 'secure.pub.build.mozilla.org'),

can these be whatever we want as long as the server side knows about them?

as in, like pvtbuilds.mozilla.org, can we do:
secure.mozilla.org
runtime-binaries.mozilla.org
tooltool.mozilla.org

just to stick with the same concise format. Also, do we need to worry about the http://tooltool.pub.build.mozilla.org or http://secure.pub.build.mozilla.org case? IIUC, they are only used in developer configs. Asking cause I honestly don't know.

::: mozharness/mozilla/tooltool.py
@@ +28,5 @@
> +
> +        # find the proxyied url, if any
> +        proxxy_urls = []
> +        for url in default_urls:
> +            proxxy_urls.extend(self.query_proxy_urls(url))

hmm, I see what you mean, query_proxy_urls doesn't work well with a list of servers.

is len(c['tooltool_servers']) ever > 1? It doesn't look that way from a brief mxr poke. If we can confirm there is no need for tooltool_servers to be a list anymore, this would clean up the code a lot more here and we can change all the appropriate configs. maybe simone might know?
Attachment #8471633 - Flags: review?(jlund) → review-
Created attachment 8472386 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Thanks Jordan!

Here's a newer version of tooltool/proxxy patch.

I've tried to clean up the code in ProxxyMixing rather than hacking in TooltoolMixin.

I have created _get_proxies_for_url(). It returns a list of proxied urls without the input url. This method is used by query_proxy_urls() (accepts a list of urls) query_proxy_url() (accepts a sting) they both take care of managing the input url(s).

I have also renamed the tooltool urls in PROXXY_CONFIG as you have suggested.

Simone, do you know what urls are used for tooltool? are http://tooltool.pub.build.mozilla.org and http://secure.pub.build.mozilla.org needed? Do we need to proxy them in AWS?
Attachment #8471633 - Attachment is obsolete: true
Attachment #8472386 - Flags: review?(jlund)
Simone, I have forgot to add you to the needinfo field, in my previous comment.
Flags: needinfo?(sbruno)
http://tooltool.pub.build.mozilla.org is currently being used by Sea Monkey devs only, while http://secure.pub.build.mozilla.org is used to allow direct download of tooltool artifacts (see Bug 1019011), so I think at the moment we do not need to proxy them in AWS.
Flags: needinfo?(sbruno)
Comment on attachment 8472386 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Review of attachment 8472386 [details] [diff] [review]:
-----------------------------------------------------------------

cool, I think this is a better approach. before we stamp it though, I want to clear up a point of confusion.

As this is we have:

query_proxy_url(url) -> returns a list of [proxy_url, url] if theres a proxy_url else [url]

query_proxy_urls(urls) -> returns a list of [proxy_url1, proxy_url2, url1, url2] for each url that has a proxy_equiv

_get_proxies_for_url(url) -> returns a list of [proxy_url] if url has a proxy equiv else []


issue 1:
IIUC, the way we have it you can't have more than 1 item returned from _get_proxies_for_url. This means it should be _get_proxy_for_url() which returns None or a string depending on if there is a proxy_url

issue 2:
this still feels complicated. i.e. query_proxy_urls and query_proxy_url both return lists and pretty much accomplish similar things. Bear with me being a nit monster here, but what do you think about just two methods:

_get_proxy_for_url(url) -> # returns proxxy_url or None

get_proxies_and_urls(urls) -> return [self._get_proxies_for_url(url) for url in urls] + urls

then we:

# from tooltool
proxxy_urls = self.get_urls_with_proxies(c['tooltool_servers'])

# from download_proxied_file()
urls = self.get_urls_with_proxies([url])

thoughts? Does that accomplish the same thing or am I missing something? I can be persuaded into an r+ if you push back.
Attachment #8472386 - Flags: review?(jlund) → review-
that was bad, even for pseudo code.

> ... [self._get_proxies_for_url(url) for url in urls] ...
should be: [self._get_proxy_for_url(url) for url in urls]

> # from tooltool
> proxxy_urls = self.get_urls_with_proxies(c['tooltool_servers'])
should be: proxxy_urls = self.get_proxies_and_urls(c['tooltool_servers'])

> # from download_proxied_file()
> urls = self.get_urls_with_proxies([url])
should be: urls = self.get_proxies_and_urls([url])
Thanks Jordan,

> issue 1:
> IIUC, the way we have it you can't have more than 1 item returned from
> _get_proxies_for_url. This means it should be _get_proxy_for_url() which
> returns None or a string depending on if there is a proxy_url
Actually, it returns an empty list if there are no proxy urls and a list of urls in any other cases and because of our configuration (single proxxy server per region) this list has a single element. If we keep the list style we don't have to worry about changes in the "instances" data structure.

> issue 2:
> this still feels complicated..
Agreed! 

I like your suggested approach, new patch coming soon!

Thanks again!
Created attachment 8473044 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Hi Jordan,

here's another patch!

Changes:
* removed tooltool.mozilla.org from PROXXY_CONFIG (the other 2 urls are used by some scripts)
* renamed updated fixed get_proxies_for_url() and get_proxies_and_urls()
Attachment #8472386 - Attachment is obsolete: true
Attachment #8473044 - Flags: review?(jlund)
Comment on attachment 8473044 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Review of attachment 8473044 [details] [diff] [review]:
-----------------------------------------------------------------

this lvgtm :)

thanks for iterating on this and also for fixing up the pep issues.

I assume the plan is to give this a whirl on ash once the server side of proxxy is good 2 go?

::: mozharness/mozilla/tooltool.py
@@ +19,5 @@
>          """docstring for tooltool_fetch"""
>          tooltool = self.query_exe('tooltool.py', return_type='list')
>          cmd = tooltool
> +        # get the tooltools servers from configuration
> +        default_urls = [s for s in self.config['tooltool_servers']]

is default_urls var and its list comprehension still needed?
Attachment #8473044 - Flags: review?(jlund) → review+
(Reporter)

Comment 82

3 years ago
Comment on attachment 8473044 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Review of attachment 8473044 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/proxxy.py
@@ +27,5 @@
>              ('https://ftp-ssl.mozilla.org', 'ftp.mozilla.org'),
>              ('http://pvtbuilds.pvt.build.mozilla.org', 'pvtbuilds.mozilla.org'),
> +            # tooltool
> +            ('http://runtime-binaries.pvt.build.mozilla.org', 'runtime-binaries.mozilla.org'),
> +            ('http://secure.pub.build.mozilla.org', 'secure.mozilla.org'),

I would actually keep the .pvt/.pub bits of the domains in here. In the pvtbuilds case we were able to strip it out because the host 'pvtbuilds.mozilla.org' actually exists, and is where we're fetching the data from. In the tooltool case, the aliases you have aren't hosts in their own right.

@@ +77,5 @@
>          return urls
>  
> +    def get_proxies_and_urls(self, urls):
> +        """Gets a list of urls and returns a list of proxied urls, the list
> +           of input urls is appended at the return values"""

s/at the/at the end of the/
Created attachment 8475179 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Thanks Jordan and Chris for the reviews.

In this new version:
* updated tooltool urls
* fixed typo in comments
Attachment #8473044 - Attachment is obsolete: true
Attachment #8475179 - Flags: review?(catlee)
(In reply to Massimo Gervasini [:mgerva] from comment #83)
> Created attachment 8475179 [details] [diff] [review]
> [mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch
> 
> Thanks Jordan and Chris for the reviews.
> 
> In this new version:
> * updated tooltool urls
> * fixed typo in comments

... and removed the redundant list comprehension spotted by Jordan in comment 81
(Reporter)

Comment 85

3 years ago
Comment on attachment 8475179 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy for Tooltool.patch

Review of attachment 8475179 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/proxxy.py
@@ +27,5 @@
>              ('https://ftp-ssl.mozilla.org', 'ftp.mozilla.org'),
>              ('http://pvtbuilds.pvt.build.mozilla.org', 'pvtbuilds.mozilla.org'),
> +            # tooltool
> +            ('http://runtime-binaries.pvt.build.mozilla.org', 'runtime-binaries.pvt.build.mozilla.org'),
> +            ('https://secure.pub.build.mozilla.org', 'secure.pub.build.mozilla.org'),

I don't think this second URL is required, let's remove it.
Attachment #8475179 - Flags: review?(catlee) → review+
Patch landed on ash-mozharness; new proxxy-dev hosts are just working fine.

Updated inventory with the new IPs for proxxy {use1,usw2} waiting for dns changes to propagate before landing attachment 8475179 [details] [diff] [review]
Attachment #8475179 - Flags: checked-in+
No longer blocks: 1042722
(Reporter)

Updated

3 years ago
Depends on: 1058844
Created attachment 8481269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy when installing packages with virtualenv.patch

This patch adds proxxy urls support in virtualenv. It populates '--find-links' with extra proxxy urls (if any).

--pypi-url option is left unchanged because only a single value can be added here and there's no fall back mechanism to a different url. (Callek this option is used only by configs/users/callek/tegra1-foopy.py and configs/users/callek/tegra1.py, do we still need it?)
Attachment #8481269 - Flags: review?(jlund)
Flags: needinfo?(bugspam.Callek)
Comment on attachment 8481269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy when installing packages with virtualenv.patch

Review of attachment 8481269 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/proxxy.py
@@ +31,5 @@
>              # tooltool
>              ('http://runtime-binaries.pvt.build.mozilla.org', 'runtime-binaries.pvt.build.mozilla.org'),
> +            # pypi
> +            ('http://pypi.pvt.build.mozilla.org/pub', 'pypi.pvt.build.mozilla.org'),
> +            ('http://pypi.pub.build.mozilla.org/pub', 'pypi.pub.build.mozilla.org'),

Please remove the '/pub' path from these lines.
While this will probably work for PyPI, adding path breaks relative links and can cause other unintended side effects.
Comment on attachment 8481269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy when installing packages with virtualenv.patch

Review of attachment 8481269 [details] [diff] [review]:
-----------------------------------------------------------------

This is really cool! I completely agree that we can use proxxy as a standalone obj. I think I'd prefer to have it completely converted (stripped of 'Mixin' in classname, given an init with cfg/log_obj, and other proxxy uses patched up accordingly). However, I don't want to block.

I personally would love to see a few mixins ripped out and have more isolated objects. Maybe we can tackle this together in the future :)

r+ with the request from https://bugzilla.mozilla.org/show_bug.cgi?id=1017759#c88 and assuming callek has no issues. Since this reaches nearly everywhere, should we test this on ash

::: mozharness/mozilla/proxxy.py
@@ +47,5 @@
>          defined, returns PROXXY_CONFIG instead"""
> +        try:
> +            cfg = self.config.get('proxxy', self.PROXXY_CONFIG)
> +        except AttributeError:
> +            # no self.config, proxxy mixing has been created

s/mixing/mixin ?
Attachment #8481269 - Flags: review?(jlund) → review+
Attachment #8481269 - Flags: checked-in+
Comment on attachment 8481269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use Proxxy when installing packages with virtualenv.patch

backed out. 

it causes some scripts to fail with:   

TypeError: Error when calling the metaclass bases Cannot create a consistent method resolution order (MRO)

(from "scripts/android_panda.py")

I am creating a separate bug to address this issue.
Attachment #8481269 - Flags: checked-in+ → checked-in-
Depends on: 1063532
(In reply to Massimo Gervasini [:mgerva] from comment #87)
> (Callek this
> option is used only by configs/users/callek/tegra1-foopy.py and
> configs/users/callek/tegra1.py, do we still need it?)

Nope, infact I recommend hg rm'ing those files.
Flags: needinfo?(bugspam.Callek)
Created attachment 8493269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use proxxy to install virtualenv dependencies.patch

This patch adds the proxxy support to virtualenv dependencies downloads and removes the pypi_url options (per comment #91)
Attachment #8481269 - Attachment is obsolete: true
Attachment #8493269 - Flags: review?(jlund)
Comment on attachment 8493269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use proxxy to install virtualenv dependencies.patch

Review of attachment 8493269 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. :)

I think, like all proxxy patches and its reach, this needs ash or cypress testing/verification before merging to prod
Attachment #8493269 - Flags: review?(jlund) → review+
(In reply to Justin Wood (:Callek) from comment #91)
> (In reply to Massimo Gervasini [:mgerva] from comment #87)
> > (Callek this
> > option is used only by configs/users/callek/tegra1-foopy.py and
> > configs/users/callek/tegra1.py, do we still need it?)
> 
> Nope, infact I recommend hg rm'ing those files.

oh, while we are at it, can you add this to the patch and r+ me too :) it will be confusing with --pypi-url option being removed. please please please. :)
Comment on attachment 8493269 [details] [diff] [review]
[mozharness] Bug 1017759 - Use proxxy to install virtualenv dependencies.patch

Thanks for the review, Jordan.

Removed configs/users/callek/tegra*.py files too.
Attachment #8493269 - Flags: checked-in+
Depends on: 1011629

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1956]
Depends on: 1123706
Assignee: gmiroshnykov → nobody
Status: ASSIGNED → NEW
fixed -- I think
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.