Closed Bug 826117 Opened 12 years ago Closed 12 years ago

[mozdevice] Timeout not respected when connecting socket in DeviceManagerSUT

Categories

(Testing :: Mozbase, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Unassigned)

References

Details

Attachments

(2 files)

The method of timing out socket.connect() calls in DeviceManagerSUT is broken.  It calls connect() followed by select(); however, connect() is a blocking call, so, without a timeout set explicitly on the socket, it will return at a time of its own choosing, which may be longer than the requested timeout.

The attached patch adds some debug print statements to DeviceManagerSUT to show this problem.  Here's some output when trying to use a default timeout of 15 seconds:

python -c "import mozdevice; mozdevice.DeviceManagerSUT.default_timeout = 15; mozdevice.DeviceManagerSUT('192.168.1.88')"

1357173029.250030 connecting... (timeout 15)
1357173105.115223 socket error
Could not connect; sleeping for 5 seconds.

The host 192.168.1.88 does not exist, so it is using some timeout as determined by the network or kernel or something.  This is much longer than the specified timeout (76 s vs 15 s).

Note that the recv() timeout is not affected by this bug.  It correctly calls select() *before* recv().  This is actually impossible with connect() and a blocking socket; the only solution is to make it a nonblocking socket or use socket.settimeout().  I think the latter is the way to go.
With this patch, we see the correct timeout when trying to connect to an unreachable host:

python -c "import mozdevice; mozdevice.DeviceManagerSUT.default_timeout = 15; mozdevice.DeviceManagerSUT('192.168.1.143')"
1357231390.424417 connecting... (timeout 15)
1357231405.432886 socket error
Could not connect; sleeping for 5 seconds.

It seems that sometimes a retry will fail immediately after an attempt that timed out, but I think that's kernel-level stuff.  You can get the same behaviour out of nc if you try to connect repeatedly to an unreachable host, so I think this is okay.
Attachment #697488 - Flags: review?(wlachance)
Attachment #697488 - Flags: feedback?(jmaher)
Comment on attachment 697488 [details] [diff] [review]
Use settimeout with connect

Your solution makes tons of sense to me. Thanks!
Attachment #697488 - Flags: review?(wlachance) → review+
Comment on attachment 697488 [details] [diff] [review]
Use settimeout with connect

Review of attachment 697488 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozdevice/mozdevice/devicemanagerSUT.py
@@ +172,5 @@
> +                self._sock.recv(1024)
> +            except socket.error, msg:
> +                self._sock.close()
> +                self._sock = None
> +                raise DMError("Remote Device Error: Unable to connect socket: "+str(msg), fatal=True)

is this the right error to bubble up?  I would think it would be "Remote Device Error: Unable to get a SUT prompt after connecting"
Attachment #697488 - Flags: feedback?(jmaher) → feedback+
Ah good catch.  Will fix.
Assignee: nobody → mcote
Status: NEW → ASSIGNED
Try run for 24935a02828d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=24935a02828d
Results (out of 19 total builds):
    success: 19
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcote@mozilla.com-24935a02828d
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: