Closed
Bug 1309912
Opened 8 years ago
Closed 8 years ago
Explicitely set urlopen()'s timeout for download and extract
Categories
(Testing :: General, defect)
Testing
General
Tracking
(firefox52 fixed)
RESOLVED
FIXED
mozilla52
Tracking | Status | |
---|---|---|
firefox52 | --- | fixed |
People
(Reporter: armenzg, Assigned: armenzg)
References
Details
Attachments
(1 file)
We want to try to add a timeout to urlopen() [1][2] since we're not sure if the global timeout is doing the trick. If this fails to solve the issue in bug 1300413 we will have to look into using requests. [1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py?q=x-amz&redirect_type=single#390 [2] https://docs.python.org/2/library/urllib2.html#urllib2.urlopen > The optional timeout parameter specifies a timeout in seconds for blocking > operations like the connection attempt (if not specified, the global default > timeout setting will be used). This actually only works for HTTP, HTTPS and > FTP connections.
Assignee | ||
Comment 1•8 years ago
|
||
Here's the docs for the specific version of the Mac machines: https://docs.python.org/release/2.7.3/library/urllib2.html?highlight=urlopen#urllib2.urlopen It reads the same as expected.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8800724 -
Flags: review?(dustin)
Assignee | ||
Comment 3•8 years ago
|
||
Nothing broke on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=67ff6167e020 r? dustin
Comment 4•8 years ago
|
||
FWIW, looking at the Python 2.7 source, `urlopen` does pass a default value for `timeout`: http://svn.python.org/view/python/branches/release27-maint/Lib/urllib2.py?revision=87450&view=markup#l122 ...but it turns out that it's not actually a number, just a singleton: http://svn.python.org/view/python/branches/release27-maint/Lib/socket.py?revision=84603&view=markup#l535 ...and if that's the value passed as timeout, the socket code just doesn't set a timeout: http://svn.python.org/view/python/branches/release27-maint/Lib/socket.py?revision=84603&view=markup#l558 ...which means it's just going to call the OS `connect` function in blocking mode: http://svn.python.org/view/python/branches/release27-maint/Modules/socketmodule.c?revision=85870&view=markup#l1980 ...and you're at the mercy of whatever the OS-level TCP connect timeout is. Poking around some more, https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man4/tcp.4.html says "The TCP_CONNECTIONTIMEOUT option allows to specify the timeout, in seconds, for new, non established TCP connections. This option can be useful for both active and passive TCP connections. The default value is specified by the MIB variable net.inet.tcp.keepinit." On my Mac that value is ludicrously big: $ sysctl net.inet.tcp.keepidle net.inet.tcp.keepidle: 7200000 That's in milliseconds, but it's still set to *2 hours*. I guess that makes sense as a value for "how long can an existing TCP connection be idle with no traffic" but it is pretty nonsensical for the connect timeout.
Assignee | ||
Comment 5•8 years ago
|
||
Should we set the timeout in few other places? I would not be surprised if more intermittent oranges without output could be related to this.
Assignee | ||
Updated•8 years ago
|
Summary: Try to add timeout to urlopen() for download and extract → Explicitely set urlopen()'s timeout for download and extract
Assignee | ||
Comment 6•8 years ago
|
||
Could the TCP timeout being so high affect test jobs?
Comment 7•8 years ago
|
||
mozreview-review |
Comment on attachment 8800724 [details] Bug 1309912 - Add explicit timeout for urllib2.urlopen() instead of relying on global timeout https://reviewboard.mozilla.org/r/85598/#review84324 ::: testing/mozharness/mozharness/base/script.py:392 (Diff revision 1) > - # Bug 1301855 - URLError: <urlopen error [Errno 60] Operation timed out> > - # Bug 1302237 - URLError: <urlopen error [Errno 104] Connection reset by peer> > - # Bug 1301807 - BadStatusLine: '' > - response = urllib2.urlopen(request) > + # * Bug 1301855 - URLError: <urlopen error [Errno 60] Operation timed out> > + # * Bug 1302237 - URLError: <urlopen error [Errno 104] Connection reset by peer> > + # * Bug 1301807 - BadStatusLine: '' > + # > + # Bug 1309912 - Adding timeout in hopes to solve blocking on response.read() (bug 1300413) > + response = urllib2.urlopen(request, timeout=30) Holy cow, there's your list of reasons to use requests!
Comment 8•8 years ago
|
||
mozreview-review |
Comment on attachment 8800724 [details] Bug 1309912 - Add explicit timeout for urllib2.urlopen() instead of relying on global timeout https://reviewboard.mozilla.org/r/85598/#review84326
Attachment #8800724 -
Flags: review?(dustin) → review+
Comment 9•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4) > This option can be useful for both active and passive TCP connections. The > default value is specified by the MIB variable net.inet.tcp.keepinit." > > On my Mac that value is ludicrously big: > $ sysctl net.inet.tcp.keepidle > net.inet.tcp.keepidle: 7200000 Re-reading this I can see that "net.inet.tcp.keepinit" and "net.inet.tcp.keepidle" are not the same thing! My machine says: $ sysctl net.inet.tcp.keepinit net.inet.tcp.keepinit: 75000 ...which is a slightly more reasonable 75 seconds. You could certainly double-check that value on our build+test machines.
Comment 10•8 years ago
|
||
Pushed by armenzg@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7aa1eb0ca75e Add explicit timeout for urllib2.urlopen() instead of relying on global timeout r=dustin
Assignee | ||
Comment 11•8 years ago
|
||
And from the test machines: net.inet.tcp.keepinit: 75000
Comment 12•8 years ago
|
||
Keep in mind that MIBs aren't the same as kernel sysctls, which also vary wildly between OS's dustin@ramanujan ~ $ sudo sysctl net.inet.tcp.keepinit sysctl: cannot stat /proc/sys/net/inet/tcp/keepinit: No such file or directory (on Fedora 24) and dustin@ramanujan ~ $ strace -t python -c 'import socket; socket.socket().connect(("169.254.169.254", 80))' ... 09:43:06 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 09:43:06 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("169.254.169.254")}, 16) = -1 ETIMEDOUT (Connection timed out) 09:45:13 stat("systemd", 0x7ffc61c85070) = -1 ENOENT (No such file or directory) ... so, 127 seconds. But, I think you determined earlier that this was a timeout during a read operation. In that case, there is no core TCP functionality that permits any kind of timeout. That's why your SSH session can stay open at the shell prompt for hours with no traffic. TCP Keepalives can help there, but in order to keep traffic low they default to one every 2 hours, and are only used when the application sets an option on the socket. It's generally better for applications that do not expect long delays (such as HTTP clients) to set a timeout on each read operation, rather than relying on keepalives. In other words, I think passing timeout=.. is the fix here. I'd still rather see requests used, since it embodies a bunch of other HTTP client best practices that may address some of the dozen other bugs referenced in scripts.py, but this is a start :)
Comment 13•8 years ago
|
||
Ah, if this is a read timeout then yes, the 2 hour TCP keepalive applies. `urlopen` defaulting to no timeout is just ridiculous default behavior.
Comment 14•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7aa1eb0ca75e
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in
before you can comment on or make changes to this bug.
Description
•