Closed
Bug 1256300
Opened 8 years ago
Closed 8 years ago
Intermittent Android "command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_emulator_unittest.py'..."
Categories
(Testing :: General, defect)
Testing
General
Tracking
(firefox47 unaffected, firefox48 fixed, firefox49 fixed)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox47 | --- | unaffected |
firefox48 | --- | fixed |
firefox49 | --- | fixed |
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
1.32 KB,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
1.82 KB,
patch
|
jlund
:
review+
|
Details | Diff | Splinter Review |
As reported in bug 1051908 for the last 2 weeks. Maybe a hang in tooltool.py?
Assignee | ||
Updated•8 years ago
|
Keywords: intermittent-failure
Assignee | ||
Comment 1•8 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1237706#c41
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 5•8 years ago
|
||
There is variability in the 2400 second timeout logs collected in this bug, so there may be multiple issues at play here. A few logs (the minority), like https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=3785227#L848, look like: 04:14:56 INFO - Copy/paste: /tools/tooltool.py --url https://api.pub.build.mozilla.org/tooltool/ --authentication-file /builds/relengapi.tok fetch -m /builds/slave/test/build/tooltool.tt -o -c /builds/tooltool_cache 04:14:56 INFO - INFO - File android-sdk_r24.0.2-linux.tgz not present in local cache folder /builds/tooltool_cache 04:14:56 INFO - INFO - Attempting to fetch from 'https://api.pub.build.mozilla.org/tooltool/'... command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_emulator_unittest.py', '--cfg', 'android/androidarm_4_3.py', '--test-suite', 'mochitest-1', '--blob-upload-branch', 'mozilla-central', '--download-symbols', 'ondemand'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2417.380167 I think tooltool might be hung. At any rate, it seems possible that tooltool could hang, or download very, very slowly sometimes. I'd prefer to see these cases fail with a specific mozharness timeout rather than wait for the buildbot timeout. A random sampling of test logs suggest tooltool completes in less than 30 seconds for most of our current downloads, so this 600 second output timeout seems reasonably safe. https://treeherder.mozilla.org/#/jobs?repo=try&revision=610594efd242d3f5534534acd33886bc544ec48a suggests this does no harm. I intend to leave-open for follow-up on other timeout causes.
Attachment #8750480 -
Flags: review?(rail)
Updated•8 years ago
|
Attachment #8750480 -
Flags: review?(rail) → review+
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Comment 7•8 years ago
|
||
The other, more common, log variation is: 09:38:17 INFO - Writing buildbot properties ['build_url'] to /builds/slave/test/properties/build_url 09:38:17 INFO - Writing to file /builds/slave/test/properties/build_url 09:38:17 INFO - Contents: 09:38:17 INFO - build_url:https://queue.taskcluster.net/v1/task/b0gIAvQVTUuQuR06YXr4zw/artifacts/public/build/fennec-49.0a1.en-US.android-arm.apk command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/android_emulator_unittest.py', '--cfg', 'android/androidarm_4_3.py', '--test-suite', 'mochitest-16', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=2449.852416 The final "build_url:" log comes from: _download_and_extract _download_installer set_buildbot_properties dump_buildbot_properties write_to_file Note that this variation occurs for both opt and debug jobs. For debug, the next non-trivial operation is to download symbols; for opt, the next non-trivial operation is downloading host utilities. The first, next, missing log statement is the "proxxy config: ..." reported by get_proxies_for_url.
Comment 8•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b9079ed81182
Assignee | ||
Comment 9•8 years ago
|
||
This is a speculative fix for the log variation shown in Comment 7. A hang in urllib2.urlopen() here would be consistent with those logs for both opt and debug cases. Most (all?) other urllib2.urlopen calls in mozharness use a timeout parameter, typically between 10 and 60 seconds. In this case, additional logging in treeherder logs show urlopen taking approximately 20 seconds in the majority of cases - https://treeherder.mozilla.org/#/jobs?repo=try&revision=9087cd124b23 - so I've opted for the longer 120 second timeout. On timeout, I see urllib2.urlopen throwing an SSLError, which is a socket.error, but there may be some variation across urllib2 implementations, and I note other mozharness code handling socket.timeout also, so I've included that as well.
Attachment #8751366 -
Flags: review?(jlund)
Comment 10•8 years ago
|
||
Comment on attachment 8751366 [details] [diff] [review] add timeout to urllib2.urlopen call Review of attachment 8751366 [details] [diff] [review]: ----------------------------------------------------------------- looks good to me. just one comment about proposing to fold up exception blocks ::: testing/mozharness/mozharness/mozilla/testing/testbase.py @@ +201,5 @@ > except urllib2.URLError: > self.warning("Can't figure out symbols_url from installer_url: %s!" % > self.installer_url) > + except socket.timeout, e: > + self.warning("Can't figure out symbols_url from installer_url: %s - %s" % so exception message and error_level is the same each time? I wonder if we should fold up all of these into one except block. and also use self.exception[1] that will grab/log traceback for us. e.g. except (urllib2.URLError, socket.error, socket.timeout): self.exception("Can't figure out symbols_url from installer_url: %s!" % self.installer_url, level=WARNING) [1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/log.py#157
Attachment #8751366 -
Flags: review?(jlund) → review+
Assignee | ||
Comment 11•8 years ago
|
||
Great idea - thanks jlund!
Comment 14•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bf088a188e27 https://hg.mozilla.org/mozilla-central/rev/537ad1398c52
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 16•8 years ago
|
||
Failures continue on mozilla-aurora (48). It looks like this has never been reported on 47; I wonder why?
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox47:
--- → unaffected
status-firefox48:
--- → affected
status-firefox49:
--- → fixed
Keywords: leave-open
Resolution: --- → FIXED
Comment 17•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/53d94e77fd50
You need to log in
before you can comment on or make changes to this bug.
Description
•