Interactive feature causes failing jobs to be marked successful

NEW
Unassigned

Status

Taskcluster
Worker
2 years ago
a year ago

People

(Reporter: jmaher, Unassigned)

Tracking

Details

(Whiteboard: [docker-worker])

(Reporter)

Description

2 years ago
we need to exit with non zero on a failure, but upon failure we exit with 0!

https://public-artifacts.taskcluster.net/cTSOoLz4RKuUlteYTtDnog/0/public/logs/live_backing.log

this is when test.sh calls test-linux.sh and then test-linux.sh does a curl and gets a 404.

Our .sh scripts need to have better error handling.

Comment 1

2 years ago
Besides fixing the test jobs, should we make the build fail?

https://public-artifacts.taskcluster.net/Zs-Qz_hZTEixDMoJIBZOzA/0/public/logs/live_backing.log

6:54:24     INFO -  Notification center failed: Install the python dbus module to get a notification when the build finishes.
16:54:24    ERROR - Return code: 2
16:54:24  WARNING - setting return code to 2
16:54:24    FATAL - 'mach build' did not run successfully. Please check log for errors.
16:54:24    FATAL - Running post_fatal callback...
16:54:24    FATAL - Exiting -1
16:54:24     INFO - Running post-action listener: influxdb_recording_post_action
16:54:24     INFO - Running post-action listener: record_mach_stats
16:54:24     INFO - No build_resources.json found, not logging stats
16:54:24     INFO - Running post-run listener: _summarize
16:54:24    ERROR - # TBPL FAILURE #
16:54:24     INFO - #####
16:54:24     INFO - ##### FxDesktopBuild summary:
16:54:24     INFO - #####
16:54:24    ERROR - # TBPL FAILURE #
cleanup
+ cleanup
+ '[' -n 58 ']'
+ kill 58
[taskcluster] === Task Finished ===
[taskcluster] Artifact "public/build" not found at "/home/worker/artifacts/"
[taskcluster] Successful task run with exit code: 0 completed in 1711.082 seconds
I suspect that the script itself exited with 1 -- if not, then we would never have seen an orange test run!

As you suggested in irc, a good guess is that something in the wrapper used by --interactive did not properly carry forward the exit status.
Component: General → Docker-Worker
Duplicate of this bug: 1232387
Summary: when we fail to download mozharness.zip, the job finishes and turns GREEN → Interactive feature causes failing jobs to be marked successful

Comment 4

2 years ago
What eveidence do we have that it is related?

I've got some oranges on a --interactive push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94a73d41899
(Reporter)

Comment 5

2 years ago
all those oranges are 3600 second timeouts.
Probably an issue with our wrapper script... I'll take a look tomorrow...
Flags: needinfo?(jopsen)
Whiteboard: [docker-worker]
Component: Docker-Worker → Worker
I'll remove my own ni, as I clearly didn't take a look.
Flags: needinfo?(jopsen)
You need to log in before you can comment on or make changes to this bug.