Closed Bug 1232385 Opened 9 years ago Closed 6 years ago

Interactive feature causes failing jobs to be marked successful

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Whiteboard: [docker-worker])

we need to exit with non zero on a failure, but upon failure we exit with 0!

https://public-artifacts.taskcluster.net/cTSOoLz4RKuUlteYTtDnog/0/public/logs/live_backing.log

this is when test.sh calls test-linux.sh and then test-linux.sh does a curl and gets a 404.

Our .sh scripts need to have better error handling.
Besides fixing the test jobs, should we make the build fail?

https://public-artifacts.taskcluster.net/Zs-Qz_hZTEixDMoJIBZOzA/0/public/logs/live_backing.log

6:54:24     INFO -  Notification center failed: Install the python dbus module to get a notification when the build finishes.
16:54:24    ERROR - Return code: 2
16:54:24  WARNING - setting return code to 2
16:54:24    FATAL - 'mach build' did not run successfully. Please check log for errors.
16:54:24    FATAL - Running post_fatal callback...
16:54:24    FATAL - Exiting -1
16:54:24     INFO - Running post-action listener: influxdb_recording_post_action
16:54:24     INFO - Running post-action listener: record_mach_stats
16:54:24     INFO - No build_resources.json found, not logging stats
16:54:24     INFO - Running post-run listener: _summarize
16:54:24    ERROR - # TBPL FAILURE #
16:54:24     INFO - #####
16:54:24     INFO - ##### FxDesktopBuild summary:
16:54:24     INFO - #####
16:54:24    ERROR - # TBPL FAILURE #
cleanup
+ cleanup
+ '[' -n 58 ']'
+ kill 58
[taskcluster] === Task Finished ===
[taskcluster] Artifact "public/build" not found at "/home/worker/artifacts/"
[taskcluster] Successful task run with exit code: 0 completed in 1711.082 seconds
I suspect that the script itself exited with 1 -- if not, then we would never have seen an orange test run!

As you suggested in irc, a good guess is that something in the wrapper used by --interactive did not properly carry forward the exit status.
Component: General → Docker-Worker
Summary: when we fail to download mozharness.zip, the job finishes and turns GREEN → Interactive feature causes failing jobs to be marked successful
What eveidence do we have that it is related?

I've got some oranges on a --interactive push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94a73d41899
all those oranges are 3600 second timeouts.
Probably an issue with our wrapper script... I'll take a look tomorrow...
Flags: needinfo?(jopsen)
Whiteboard: [docker-worker]
Component: Docker-Worker → Worker
I'll remove my own ni, as I clearly didn't take a look.
Flags: needinfo?(jopsen)
What is the value of the exit code for an interactive task? I think we should resolve as something that indicates it was an interactive task, and has no meaningful resolution status.

exception/interactive-task ?
exception/not-applicable ?
Flags: needinfo?(jopsen)
QA Contact: pmoore
You don't necessarily have to attach to an interactive task.
It's not a concept the queue is otherwise engaged in.

I agree, don't depend on interactive tasks...
Yet, I don't see it as necessary to further clutter APIs with specific about this.

I think it's smarter to add a log entry to task log, that said an interactive sessions was connected.
Maybe even write what IP was connected, if you really want to do something about security.
But that would be a different bug.
Flags: needinfo?(jopsen)
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #9)
> You don't necessarily have to attach to an interactive task.
> It's not a concept the queue is otherwise engaged in.
> 
> I agree, don't depend on interactive tasks...
> Yet, I don't see it as necessary to further clutter APIs with specific about
> this.
> 
> I think it's smarter to add a log entry to task log, that said an
> interactive sessions was connected.
> Maybe even write what IP was connected, if you really want to do something
> about security.
> But that would be a different bug.

Filed bug 1445094 for this.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: Worker → Workers
You need to log in before you can comment on or make changes to this bug.