Closed Bug 1309912 Opened 8 years ago Closed 8 years ago

Explicitely set urlopen()'s timeout for download and extract

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(1 file)

We want to try to add a timeout to urlopen() [1][2] since we're not sure if the global timeout is doing the trick.

If this fails to solve the issue in bug 1300413 we will have to look into using requests.

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py?q=x-amz&redirect_type=single#390

[2] https://docs.python.org/2/library/urllib2.html#urllib2.urlopen
> The optional timeout parameter specifies a timeout in seconds for blocking
> operations like the connection attempt (if not specified, the global default
> timeout setting will be used). This actually only works for HTTP, HTTPS and
> FTP connections.
Here's the docs for the specific version of the Mac machines:
https://docs.python.org/release/2.7.3/library/urllib2.html?highlight=urlopen#urllib2.urlopen

It reads the same as expected.
Attachment #8800724 - Flags: review?(dustin)
FWIW, looking at the Python 2.7 source, `urlopen` does pass a default value for `timeout`:
http://svn.python.org/view/python/branches/release27-maint/Lib/urllib2.py?revision=87450&view=markup#l122
...but it turns out that it's not actually a number, just a singleton: 
http://svn.python.org/view/python/branches/release27-maint/Lib/socket.py?revision=84603&view=markup#l535
...and if that's the value passed as timeout, the socket code just doesn't set a timeout:
http://svn.python.org/view/python/branches/release27-maint/Lib/socket.py?revision=84603&view=markup#l558
...which means it's just going to call the OS `connect` function in blocking mode:
http://svn.python.org/view/python/branches/release27-maint/Modules/socketmodule.c?revision=85870&view=markup#l1980
...and you're at the mercy of whatever the OS-level TCP connect timeout is.

Poking around some more, https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man4/tcp.4.html says "The TCP_CONNECTIONTIMEOUT option allows to specify the timeout, in seconds, for new, non established TCP connections. This option can be useful for both active and passive TCP connections. The default value is specified by the MIB variable net.inet.tcp.keepinit."

On my Mac that value is ludicrously big:
$ sysctl net.inet.tcp.keepidle
net.inet.tcp.keepidle: 7200000

That's in milliseconds, but it's still set to *2 hours*. I guess that makes sense as a value for "how long can an existing TCP connection be idle with no traffic" but it is pretty nonsensical for the connect timeout.
Should we set the timeout in few other places?
I would not be surprised if more intermittent oranges without output could be related to this.
Summary: Try to add timeout to urlopen() for download and extract → Explicitely set urlopen()'s timeout for download and extract
Could the TCP timeout being so high affect test jobs?
Comment on attachment 8800724 [details]
Bug 1309912 - Add explicit timeout for urllib2.urlopen() instead of relying on global timeout

https://reviewboard.mozilla.org/r/85598/#review84324

::: testing/mozharness/mozharness/base/script.py:392
(Diff revision 1)
> -        # Bug 1301855 - URLError: <urlopen error [Errno 60] Operation timed out>
> -        # Bug 1302237 - URLError: <urlopen error [Errno 104] Connection reset by peer>
> -        # Bug 1301807 - BadStatusLine: ''
> -        response = urllib2.urlopen(request)
> +        # * Bug 1301855 - URLError: <urlopen error [Errno 60] Operation timed out>
> +        # * Bug 1302237 - URLError: <urlopen error [Errno 104] Connection reset by peer>
> +        # * Bug 1301807 - BadStatusLine: ''
> +        #
> +        # Bug 1309912 - Adding timeout in hopes to solve blocking on response.read() (bug 1300413)
> +        response = urllib2.urlopen(request, timeout=30)

Holy cow, there's your list of reasons to use requests!
Comment on attachment 8800724 [details]
Bug 1309912 - Add explicit timeout for urllib2.urlopen() instead of relying on global timeout

https://reviewboard.mozilla.org/r/85598/#review84326
Attachment #8800724 - Flags: review?(dustin) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)
> This option can be useful for both active and passive TCP connections. The
> default value is specified by the MIB variable net.inet.tcp.keepinit."
> 
> On my Mac that value is ludicrously big:
> $ sysctl net.inet.tcp.keepidle
> net.inet.tcp.keepidle: 7200000

Re-reading this I can see that "net.inet.tcp.keepinit" and "net.inet.tcp.keepidle" are not the same thing! My machine says:
$ sysctl net.inet.tcp.keepinit
net.inet.tcp.keepinit: 75000

...which is a slightly more reasonable 75 seconds. You could certainly double-check that value on our build+test machines.
Pushed by armenzg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7aa1eb0ca75e
Add explicit timeout for urllib2.urlopen() instead of relying on global timeout r=dustin
And from the test machines: net.inet.tcp.keepinit: 75000
Keep in mind that MIBs aren't the same as kernel sysctls, which also vary wildly between OS's

  dustin@ramanujan ~ $ sudo sysctl net.inet.tcp.keepinit
  sysctl: cannot stat /proc/sys/net/inet/tcp/keepinit: No such file or directory

(on Fedora 24)

and

  dustin@ramanujan ~ $ strace -t python -c 'import socket; socket.socket().connect(("169.254.169.254", 80))'
  ...
  09:43:06 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  09:43:06 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("169.254.169.254")}, 16) = -1 ETIMEDOUT (Connection timed out)
  09:45:13 stat("systemd", 0x7ffc61c85070) = -1 ENOENT (No such file or directory)
  ...

so, 127 seconds.

But, I think you determined earlier that this was a timeout during a read operation.  In that case, there is no core TCP functionality that permits any kind of timeout.  That's why your SSH session can stay open at the shell prompt for hours with no traffic.  TCP Keepalives can help there, but in order to keep traffic low they default to one every 2 hours, and are only used when the application sets an option on the socket.  It's generally better for applications that do not expect long delays (such as HTTP clients) to set a timeout on each read operation, rather than relying on keepalives.  In other words, I think passing timeout=.. is the fix here.  I'd still rather see requests used, since it embodies a bunch of other HTTP client best practices that may address some of the dozen other bugs referenced in scripts.py, but this is a start :)
Ah, if this is a read timeout then yes, the 2 hour TCP keepalive applies. `urlopen` defaulting to no timeout is just ridiculous default behavior.
https://hg.mozilla.org/mozilla-central/rev/7aa1eb0ca75e
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: