Closed Bug 1049525 Opened 11 years ago Closed 8 years ago

Tooltool timeouts do not produce a TBPL-compatible error message

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P2, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2574] )

Several times over the last few days, whilst we've been having infra issues, I've seen unhandled exceptions in tooltool which do not result in TBPL-friendly error output. eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=45307376&full=1&branch=mozilla-inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=45307279&full=1&branch=mozilla-inbound { ... retry: Calling <function run_with_timeout at 0x7f944e579938> with args: (['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'], 300, None, None, False, True), kwargs: {}, attempt #9 Executing: ['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'] WARNING: Timeout (300) exceeded, killing process 2177 Traceback (most recent call last): File "/tools/tooltool.py", line 898, in <module> main() File "/tools/tooltool.py", line 895, in main exit(0 if process_command(options, args) else 1) File "/tools/tooltool.py", line 790, in process_command cache_folder=options['cache_folder']) File "/tools/tooltool.py", line 562, in fetch_files temp_file_name = fetch_file(base_urls, f) File "/tools/tooltool.py", line 442, in fetch_file indata = f.read(grabchunk) File "/tools/python27/lib/python2.7/socket.py", line 380, in read data = self._sock.recv(left) File "/tools/python27/lib/python2.7/httplib.py", line 561, in read s = self.fp.read(amt) File "/tools/python27/lib/python2.7/socket.py", line 380, in read data = self._sock.recv(left) KeyboardInterrupt retry: Failed, sleeping 300 seconds before retrying retry: Calling <function run_with_timeout at 0x7f944e579938> with args: (['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'], 300, None, None, False, True), kwargs: {}, attempt #10 Executing: ['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'] WARNING: Timeout (300) exceeded, killing process 2192 Traceback (most recent call last): File "/tools/tooltool.py", line 898, in <module> main() File "/tools/tooltool.py", line 895, in main exit(0 if process_command(options, args) else 1) File "/tools/tooltool.py", line 790, in process_command cache_folder=options['cache_folder']) File "/tools/tooltool.py", line 562, in fetch_files temp_file_name = fetch_file(base_urls, f) File "/tools/tooltool.py", line 442, in fetch_file indata = f.read(grabchunk) File "/tools/python27/lib/python2.7/socket.py", line 380, in read data = self._sock.recv(left) File "/tools/python27/lib/python2.7/httplib.py", line 561, in read s = self.fp.read(amt) File "/tools/python27/lib/python2.7/socket.py", line 380, in read data = self._sock.recv(left) KeyboardInterrupt retry: Giving up on <function run_with_timeout at 0x7f944e579938> Unable to successfully run ['/tools/tooltool.py', '--url', 'http://runtime-binaries.pvt.build.mozilla.org/tooltool', '--overwrite', '-m', 'mobile/android/config/tooltool-manifests/android/releng.manifest', 'fetch', '-c', '/builds/tooltool_cache'] after 10 attempts program finished with exit code 1 elapsedTime=5002.372033 ========= Finished 'sh /builds/slave/m-in-and-d-0000000000000000000/tools/scripts/tooltool/tooltool_wrapper.sh ...' failed (results: 2, elapsed: 1 hrs, 23 mins, 40 secs) (at 2014-08-05 23:31:06.272956) ========= } This is coming from: https://github.com/mozilla/build-tooltool/blob/master/tooltool.py#L442 It would be great if we could get TBPL-compatible output here - not sure where best to handle this - in tooltool.py, retry.py or tooltool_wrapper.sh
Summary: Tooltool unhandled KeyboardInterrupt exception during f.read(grabchunk) with no TBPL-friendly error message → Tooltool timeouts do not produce a TBPL-compatible error message
The simplest solution is probably to change the message logged by retry.py: https://hg.mozilla.org/build/tools/file/tip/buildfarm/utils/retry.py#l80 This would make all timeouts managed by retry.py display a tbpl compatible message (not only tooltool ones, if any). Ed, which message would you like to display here?
Flags: needinfo?(emorley)
(In reply to Simone Bruno [:simone] from comment #2) > This would make all timeouts managed by retry.py display a tbpl compatible > message (not only tooltool ones, if any). > > Ed, which message would you like to display here? Yeah that sounds sensible - I'll just need to check that this won't cause redundant error summary output for other failure modes & think of the best string to use; added to my list for next week.
Flags: needinfo?(emorley)
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2574]
Component: Tools → General
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.