we need to exit with non zero on a failure, but upon failure we exit with 0! https://public-artifacts.taskcluster.net/cTSOoLz4RKuUlteYTtDnog/0/public/logs/live_backing.log this is when test.sh calls test-linux.sh and then test-linux.sh does a curl and gets a 404. Our .sh scripts need to have better error handling.
Besides fixing the test jobs, should we make the build fail? https://public-artifacts.taskcluster.net/Zs-Qz_hZTEixDMoJIBZOzA/0/public/logs/live_backing.log 6:54:24 INFO - Notification center failed: Install the python dbus module to get a notification when the build finishes. 16:54:24 ERROR - Return code: 2 16:54:24 WARNING - setting return code to 2 16:54:24 FATAL - 'mach build' did not run successfully. Please check log for errors. 16:54:24 FATAL - Running post_fatal callback... 16:54:24 FATAL - Exiting -1 16:54:24 INFO - Running post-action listener: influxdb_recording_post_action 16:54:24 INFO - Running post-action listener: record_mach_stats 16:54:24 INFO - No build_resources.json found, not logging stats 16:54:24 INFO - Running post-run listener: _summarize 16:54:24 ERROR - # TBPL FAILURE # 16:54:24 INFO - ##### 16:54:24 INFO - ##### FxDesktopBuild summary: 16:54:24 INFO - ##### 16:54:24 ERROR - # TBPL FAILURE # cleanup + cleanup + '[' -n 58 ']' + kill 58 [taskcluster] === Task Finished === [taskcluster] Artifact "public/build" not found at "/home/worker/artifacts/" [taskcluster] Successful task run with exit code: 0 completed in 1711.082 seconds
I suspect that the script itself exited with 1 -- if not, then we would never have seen an orange test run! As you suggested in irc, a good guess is that something in the wrapper used by --interactive did not properly carry forward the exit status.
Summary: when we fail to download mozharness.zip, the job finishes and turns GREEN → Interactive feature causes failing jobs to be marked successful
What eveidence do we have that it is related? I've got some oranges on a --interactive push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94a73d41899
all those oranges are 3600 second timeouts.
Probably an issue with our wrapper script... I'll take a look tomorrow...
Component: Docker-Worker → Worker
I'll remove my own ni, as I clearly didn't take a look.
You need to log in before you can comment on or make changes to this bug.