Closed Bug 1256300 Opened 5 years ago Closed 5 years ago

Intermittent Android "command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/'..."


(Testing :: General, defect)

Not set


(firefox47 unaffected, firefox48 fixed, firefox49 fixed)

Tracking Status
firefox47 --- unaffected
firefox48 --- fixed
firefox49 --- fixed


(Reporter: gbrown, Assigned: gbrown)



(Keywords: intermittent-failure)


(2 files)

As reported in bug 1051908 for the last 2 weeks. Maybe a hang in
See Also: → 1237706
There is variability in the 2400 second timeout logs collected in this bug, so there may be multiple issues at play here. A few logs (the minority), like, look like:

 04:14:56     INFO - Copy/paste: /tools/ --url --authentication-file /builds/relengapi.tok fetch -m /builds/slave/test/build/ -o -c /builds/tooltool_cache
 04:14:56     INFO -  INFO - File android-sdk_r24.0.2-linux.tgz not present in local cache folder /builds/tooltool_cache
 04:14:56     INFO -  INFO - Attempting to fetch from ''...
 command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/', '--cfg', 'android/', '--test-suite', 'mochitest-1', '--blob-upload-branch', 'mozilla-central', '--download-symbols', 'ondemand'], attempting to kill
 process killed by signal 9
 program finished with exit code -1

I think tooltool might be hung. At any rate, it seems possible that tooltool could hang, or download very, very slowly sometimes. I'd prefer to see these cases fail with a specific mozharness timeout rather than wait for the buildbot timeout.

A random sampling of test logs suggest tooltool completes in less than 30 seconds for most of our current downloads, so this 600 second output timeout seems reasonably safe. suggests this does no harm.

I intend to leave-open for follow-up on other timeout causes.
Attachment #8750480 - Flags: review?(rail)
Attachment #8750480 - Flags: review?(rail) → review+
Keywords: leave-open
The other, more common, log variation is:

 09:38:17     INFO - Writing buildbot properties ['build_url'] to /builds/slave/test/properties/build_url
 09:38:17     INFO - Writing to file /builds/slave/test/properties/build_url
 09:38:17     INFO - Contents:
 09:38:17     INFO -  build_url:
 command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/', '--cfg', 'android/', '--test-suite', 'mochitest-16', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
 process killed by signal 9
 program finished with exit code -1

The final "build_url:" log comes from:


Note that this variation occurs for both opt and debug jobs. For debug, the next non-trivial operation is to download symbols; for opt, the next non-trivial operation is downloading host utilities.

The first, next, missing log statement is the "proxxy config: ..." reported by get_proxies_for_url.
This is a speculative fix for the log variation shown in Comment 7. A hang in urllib2.urlopen() here would be consistent with those logs for both opt and debug cases.

Most (all?) other urllib2.urlopen calls in mozharness use a timeout parameter, typically between 10 and 60 seconds. In this case, additional logging in treeherder logs show urlopen taking approximately 20 seconds in the majority of cases - - so I've opted for the longer 120 second timeout.

On timeout, I see urllib2.urlopen throwing an SSLError, which is a socket.error, but there may be some variation across urllib2 implementations, and I note other mozharness code handling socket.timeout also, so I've included that as well.
Attachment #8751366 - Flags: review?(jlund)
Comment on attachment 8751366 [details] [diff] [review]
add timeout to urllib2.urlopen call

Review of attachment 8751366 [details] [diff] [review]:

looks good to me. just one comment about proposing to fold up exception blocks

::: testing/mozharness/mozharness/mozilla/testing/
@@ +201,5 @@
>              except urllib2.URLError:
>                  self.warning("Can't figure out symbols_url from installer_url: %s!" %
>                               self.installer_url)
> +            except socket.timeout, e:
> +                self.warning("Can't figure out symbols_url from installer_url: %s - %s" %

so exception message and error_level is the same each time? I wonder if we should fold up all of these into one except block. and also use self.exception[1] that will grab/log traceback for us.


except (urllib2.URLError, socket.error, socket.timeout):
    self.exception("Can't figure out symbols_url from installer_url: %s!" % self.installer_url, level=WARNING)

Attachment #8751366 - Flags: review?(jlund) → review+
Great idea - thanks jlund!
Failures continue on mozilla-aurora (48). It looks like this has never been reported on 47; I wonder why?
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.