Closed Bug 1305804 Opened 3 years ago Closed 3 years ago

Don't use pypi.pvt.build.mozilla.org/pub unless necessary

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox52 fixed)

RESOLVED FIXED
Tracking Status
firefox52 --- fixed

People

(Reporter: gps, Assigned: gps)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

A side-effect of bug 1304176 is that mozharness now automatically adds --trusted-host to all defined pip --links. This means that various mozharness scripts are attempting to connect to http://pypi.pvt.build.mozilla.org/pub where they previously weren't.

pip attempts to load all --links when searching for packages. In some environments (namely TaskCluster), pypi.pvt.build.mozilla.org does not resolve. pip has internal retry logic. This is resulting in pip spinning for several seconds attempting to connect to pypi.pvt.build.mozilla.org. e.g.

[task 2016-09-27T17:32:11.386122Z] 17:32:11     INFO - Running command: ['/home/worker/workspace/build/venv/bin/pip', 'install', '--timeout', '120', '--no-index', '--find-links', 'http://pypi.pvt.build.mozilla.org/pub', '--find-links', 'http://pypi.pub.build.mozilla.org/pub', '--trusted-host', 'pypi.pub.build.mozilla.org', '--trusted-host', 'pypi.pvt.build.mozilla.org', 'psutil>=3.1.1'] in /home/worker/workspace/build
[task 2016-09-27T17:32:11.386705Z] 17:32:11     INFO - Copy/paste: /home/worker/workspace/build/venv/bin/pip install --timeout 120 --no-index --find-links http://pypi.pvt.build.mozilla.org/pub --find-links http://pypi.pub.build.mozilla.org/pub --trusted-host pypi.pub.build.mozilla.org --trusted-host pypi.pvt.build.mozilla.org psutil>=3.1.1
[task 2016-09-27T17:32:11.776441Z] 17:32:11     INFO -  Ignoring indexes: https://pypi.python.org/simple
[task 2016-09-27T17:32:11.785740Z] 17:32:11     INFO -  Collecting psutil>=3.1.1
[task 2016-09-27T17:32:11.801642Z] 17:32:11     INFO -    Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fcc79fd8550>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /pub
[task 2016-09-27T17:32:12.313623Z] 17:32:12     INFO -    Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fcc79fd8710>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /pub
[task 2016-09-27T17:32:13.326213Z] 17:32:13     INFO -    Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fcc79fd88d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /pub
[task 2016-09-27T17:32:15.339897Z] 17:32:15     INFO -    Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fcc79fd8a90>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /pub
[task 2016-09-27T17:32:19.355619Z] 17:32:19     INFO -    Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7fcc79fd8c50>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /pub
[task 2016-09-27T17:32:19.998835Z] 17:32:19     INFO -    Downloading http://pypi.pub.build.mozilla.org/pub/psutil-3.1.1.tar.gz (247kB)
[task 2016-09-27T17:32:20.433005Z] 17:32:20     INFO -  Building wheels for collected packages: psutil
[task 2016-09-27T17:32:20.433526Z] 17:32:20     INFO -    Running setup.py bdist_wheel for psutil: started
[task 2016-09-27T17:32:21.209408Z] 17:32:21     INFO -    Running setup.py bdist_wheel for psutil: finished with status 'done'
[task 2016-09-27T17:32:21.210844Z] 17:32:21     INFO -    Stored in directory: /home/worker/.cache/pip/wheels/10/ab/a8/d516edc511515105beb8238e28239b0aabf25d33b624e4c286
[task 2016-09-27T17:32:21.226506Z] 17:32:21     INFO -  Successfully built psutil
[task 2016-09-27T17:32:21.227574Z] 17:32:21     INFO -  Installing collected packages: psutil
[task 2016-09-27T17:32:21.302307Z] 17:32:21     INFO -  Successfully installed psutil-3.1.1
[task 2016-09-27T17:32:21.331671Z] 17:32:21     INFO - Return code: 0

That's 8 seconds lost retrying. And that's just for a single package. I think we repeat this pattern 3+ times per test.

We should not pass pypi.pvt.build.mozilla.org to pip --links unless it will work.

I was chatting with aki the other day about whether we actually need to use pypi.pvt.build.mozilla.org. Apparently it goes to the same place as pypi.pub.build.mozilla.org, just using a separate network interface or route. Apparently sending traffic through the private interface is preferred.

If directing traffic to pypi.pvt.build.mozilla.org is justified, I guess we need a way to determine if that network service is available. I'm thinking we can:

1) Infer it from some environment variable or some such
2) Do a DNS lookup within mozharness and filter URLs appropriately

I like avoiding network lookups because networks are unreliable. Then again, DNS for service discovery is the whole point of DNS. So I can't argue too strongly against doing the DNS lookup within Python (probably via socket.gethostbyname()).
Is there a reason we can't just fix DNS or whatever so that pypi.pvt works in taskcluster? Alternately, just define a new hostname, like `pypi.best.build.mozilla.org` that resolves to the right thing for the environment?
Due to this issue our Firefox-ui tests as run via mozmill-ci are completely busted for mozilla-central. Can we get a fix for this issue ASAP? Or are there workaround we could do if a short-term fix is not possible?
Flags: needinfo?(gps)
(In reply to Henrik Skupin (:whimboo) from comment #2)
> Due to this issue our Firefox-ui tests as run via mozmill-ci are completely
> busted for mozilla-central. Can we get a fix for this issue ASAP? Or are
> there workaround we could do if a short-term fix is not possible?

That shouldn't happen. The worst that should happen is this issue is slowing down virtualenv setup. Perhaps there is some kind of timeout in the virtualenv command invocation in mozharness's end? Please provide logs showing the failure.
Flags: needinfo?(gps)
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attached file mozmill-ci log
Ok, so I checked again and the reason is that our jobs get killed after 60 minutes by Jenkins. By that time we haven't even reached the end of package installation by pip! So this is a serious regression for us.
The intervals between retries in that log are really long. I suspect you have a pip.conf or a custom pip retry interval configured somewhere else. It is also possible some older versions of pip had a much longer default retry interval.

Hopefully my patch fixes the problem for you.
We don't have any custom pip.conf in use. We simply use what we get via mozharness by running our firefox-ui-* scripts. But after checking our config, it looks like that the problem is located there:

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/firefox_ui_tests/qa_jenkins.py
Comment on attachment 8795459 [details]
Bug 1305804 - Resolve hostname before attempting to use pip link;

https://reviewboard.mozilla.org/r/81504/#review80114

The try push for this seemed happy. I confirmed from TC logs it is properly filtering pypi.pvt.build.mozilla.org away from pip.
Comment on attachment 8795459 [details]
Bug 1305804 - Resolve hostname before attempting to use pip link;

https://reviewboard.mozilla.org/r/81504/#review80322

This seems fine for avoiding the currently bad behavior, but could we file a followup to just create a hostname that does the right thing in either buildbot or taskcluster? I think DNS is the right place for this kind of info, and it seems like we should just have a hostname that resolves properly in whatever environment it's used in.
Attachment #8795459 - Flags: review?(ted) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e1763ef7d6b9
Resolve hostname before attempting to use pip link; r=ted
Comment on attachment 8795459 [details]
Bug 1305804 - Resolve hostname before attempting to use pip link;

https://reviewboard.mozilla.org/r/81504/#review80322

I agree that a single hostname that resolves to the most appropriate IP is the ideal solution here.

Historically, I've had really bad luck getting "geo DNS" entries for mozilla.org created (I've long wanted one for hg.mozilla.org so we can have automation use mirrors in the local AWS region). So don't hold your breath :(
https://hg.mozilla.org/mozilla-central/rev/e1763ef7d6b9
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.