Closed Bug 1272792 Opened 8 years ago Closed 8 years ago

Ensure jobs turn red if there's an error prior to mozharness

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: ahal, Unassigned)

References

Details

I was playing around with some scripts in the desktop-test image and made a mistake which prevented mozharness from getting invoked:
https://tools.taskcluster.net/task-inspector/#GRpTXETSQCS5hY-Jy9uMUw/0

But because mozharness is the thing that detects errors, taskcluster assumed the job worked out properly and didn't set the appropriate failure status (note: I don't actually know how this works).

Anyway, we should have some kind of built-in way to detect whether a failure that prevented mozharness from running happened.
Tasks are either marked as failed or completed depending on exit status of the task.  In this case, the exit status is 0 as can be seen here from the task log:
cleanup
++ cleanup
++ local rv=0
++ [[ -s /home/worker/.xsession-errors ]]
++ cp /home/worker/.xsession-errors /home/worker/artifacts/public/xsession-errors.log
++ '[' -n 33 ']'
++ '[' false == false ']'
++ kill 33
++ exit 0


I'm not sure if this is an issue with trapping signals from exec when a command is not found or what, but this minimized case can be used to see what happens:
#!/bin/bash
set -e -x

cleanup() {
    local rv=$?
    exit $rv
}

trap cleanup EXIT INT
exec some_command


However, if exec is changed to 
status=$(exec some_command)
exit status

Then "cleanup" exits with the expected code, 127 in this case.
:ahal -- I followed up with Greg in IRC and he said: 

> once someone fixes that script to report the right exit code, the task will show up with the correct failure status in treeherder.  As far as parsing out errors, if this happens within mozharness, there's already bits in there to parse mozharness errors.  For the errors that happen before it, work was done to have some of those formatted in a way treeherder likes.

So, as long as $rv is set, this should be complete. Do you want to keep this bug open for that work to happen? There shouldn't be additional work required on taskcluster. 

I'm opening a docs bug for treeherder about how to format error messages in a way that the system likes.
Flags: needinfo?(ahalberstadt)
The key here is the "right exit code".  It's getting set, but it's being set to 0.  Some reason trapping the exit of "exec' in this case is returning a code of 0 rather than what the code should have been (127).
Makes sense.. But if an error happens in-between that trap, and the 'exit status', then it will still come back successful right?

Anyway, this isn't blocking me or anything and probably isn't a huge deal. I'll do a quick test and we can call this good enough.
Flags: needinfo?(ahalberstadt)
Here's a quick try run I did:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=727167bdcc64384d471bb88c7f2b931761bcf8f5

The suggested fix works for this case at least. Because this script isn't using exec yet, I'm just going to roll this change into my patch from bug 1250904.

One minor detail.. the job color should probably be red. Orange usually means a test failure, and red is failures in the infra.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.