Closed Bug 1186848 Opened 7 years ago Closed 6 years ago

Build failure on taskcluster build jobs turn them orange on treeherder instead of red

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Unassigned)

References

Details

No description provided.
:garndt and :armenzg met last Thursday to discuss current state and how to move forward. The underlying problem is that a task in TaskCluster isn't a set of steps. We may also have a problem with the assumptions treehderder makes about submitting task status information.
Treeherder status colors are documented here: https://wiki.mozilla.org/Sheriffing/How:To:TBPL#Red.2C_Green.2C_whatever_:.29

TaskCluster job states as produced by mozilla-taskcluster are defined here: http://mozilla-taskcluster.readthedocs.org/en/latest/treeherder/#actions
reading the status here, I am not clear on what is being done to address this?
If you click on the link Selena linked shows Buildbot jobs as red while TaskCluster jobs as orange.
what are the issues with orange vs red?  both indicate a failure?

do we want to make taskcluster build failures red, if so- is this a priority and who is doing it?  if not, can we indicate that here?
Yea some of these differences in what color is displayed on Treeherder when comparing taskcluster and buildbot jobs is from the fact that buildbox has a series of steps and knows which step fails causes a different result to be displayed (build vs build test failures) whereas taskcluster does not have any insight into what is running in the isolated environment.  It is given a command and then just waits for the exit code.
During this quarter we will be working on refactoring how jobs are submitted to treeherder.  When that is live, a task will declare itself to be a build, test, or "other", and if a failure for a build task should be red, it would be easier to determine that.  Currently we do not have any insight into what type of task it is without some kind of magic of looking at other things to infer the task type.
Why not codify that a job returning exit code 1 would be red and a job returning exit code 2 would be orange? That would also allow a different color on build jobs that fail to build at all from build jobs that fail during some post-build step.
I would prefer not relying on exit codes since it would mean that throughout the entire chain of commands they have to exit the same code no matter what the real code is.  It's also not immediately obvious to someone what they should exit as to ensure the right color on treeherder.

We are going to be adding a field to task definitions that will define if they are a build, test, or other.  Our integration component that updates the job status on treeherder could make the determination of what color to update it to based on this information.  This is work that could be added to our in-tree task definitions (it needs to be there anyways for the new work being done) and mozilla-taskcluster could be updated to look for this field and if present, adjust the status that is set.

Are we in agreement that if anything within a build fails then it's red, and anything failing within a test is orange?
As I mentioned, build can fail in post-build steps, and in that case, they are orange on buildbot. Tests can also fail in hard ways that end up red on builbot.
Thank you.  For some reason I misread the previous comment, but I do understand what you mean (now).
I opened up a thread to discuss this on the taskcluster list.

https://groups.google.com/d/msg/mozilla.tools.taskcluster/qgghtzNty1w/W9d1Lt92AwAJ
It's worth noting that mozharness already has quite a bit of support for handling exit statuses.
Component: Integration → Platform and Services
Duplicate of this bug: 1292353
Summarizing some thoughts around this, it's possible that a new json artifact type can be added that's stored in Azure table storage where some task metadata could be stored upon task completion.  One such piece of information would be the exit code.  

In the case of reporting job status, the exit code could be reviewed, and based on some information provided in task.extra.treeherder an appropriate treeherder status can be assigned.

This also would allow other components to trigger actions based on the exit status as well.
Adding a use case: In Bug 1278702, we'd like to have compiled-code tests fail early when run against an artifact build and report 'exception' (exit code 3), for example, to differentiate them from valid jobs.
Blocks: 1278702
Greg, is there more to do here, or does the division of tasks into kinds in tc-treeherder suffice?
Flags: needinfo?(garndt)
As far as the original problem, treeherder does differentiate between builds and tests now as long as that's specified as the jobKind in the task definition.  

We have not done anything around reporting jobs as a certain resolution status depending on exit status.  That has largely been frowned upon with no workable solution for it yet.
Flags: needinfo?(garndt)
I think that this is all we need.  We need to separate `make check` into another job anyway (bug 992323) to support cross-compiling on OS X.  At that point, any failure of the build job should turn red, which as Greg indicates is already implemented.
What about purple? (Comment 17) 
Should I file a separate bug for that?
Flags: needinfo?(dustin)
I think that would apply to the make-check task that's developed out of bug 992323 (I assume "compiled-code tests" is referring to `make check`?).  So yes, please file a new bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(dustin)
Resolution: --- → FIXED
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.