Closed Bug 1462511 Opened 6 years ago Closed 4 years ago

More flexible task definition metadata

Categories

(Tree Management :: Treeherder, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

Details

https://github.com/taskcluster/taskcluster-treeherder/blob/master/schemas/task-treeherder-config.yml defines how tasks get mapped into symbols, etc in Treeherder.

Treeherder requires a job to have a well-defined "jobKind," which is limited to "build" "test" or "other". And "machine" is composed of "platform" "os" and "architecture".

I /think/ this schema is a cargo cult (possibly from buildbot).

The more tasks we cram into Firefox CI, the more apparent it is that Treeherder's existing schema is constraining flexibility and natural groupings. It is also leading to a cluttered Treeherder UI.

For example, each "machine" in Treeherder's schema maps to a row in the Treeherder web UI. Because "machine" is supposed to map to some kind of "build" configuration or target architecture, we end up rows like "OS X Cross Compiled opt" "OS X Cross Compiled asan" "OS X Cross Compiled debug" and "OS X Cross Compiled NoOpt debug" containing a single task. This increases the vertical height of each revision. And the fact that Mac builds are cross-compiled doesn't really add much value to end-users of Treeherder: they just care "did the macOS build complete."

I'd like to request that Treeherder change its task schema to be more flexible.

Instead of attempting to codify "job kind" or "machine" in the Treeherder metadata, allow this type of metadata to be defined as arbitrary "labels."

Instead of implying that tasks sharing the same "machine" are rendered on the same row, allow tasks to explicitly define the "row" they belong in. (This would allow e.g. grouping all build tasks on the same row and sub-grouping builds by platform within that row.)

Instead of implying behavior due to setting "job kind" (e.g. failed "build" tasks are rendered red and failed "test" tasks are rendered "orange"), allow each task to define those semantics.

In other words, I want Treeherder's task schema to be less opinionated and to let whoever is defining tasks (taskgraph in Firefox CI's case) to have more control and flexibility over how things are rendered. This will allow us to more easily "refactor" how Firefox CI tasks are displayed in Treeherder and to iterate on new and potentially more readable and concise views of the data.

(FWIW this came up because a few of the build peers have been contemplating moving all the build tasks to a single row in Treeherder. This would require completely lying about the "machine" metadata. Also, a number of tasks masquerade as "build" or "test" in order to get their failure semantics. This all feels awkward and wrong.)
(In reply to Gregory Szorc [:gps] from comment #0)
> I /think/ this schema is a cargo cult (possibly from buildbot).

Yeah that is the reason. I agree this could be cleaned up now that:
(a) buildbot is on the way out (note: still being used by comm-central and ESR52 for now),
(b) the number of jobs that don't fit into that pattern (such as lint/cross-compiled/...) is only going to increase.

Though since this touches multiple repositories/teams and the benefits are more long-term than immediate, this is one of those issues that ends up not being prioritised.

See also:
* Bug 1060769 - PLATFORMS_BUILDERNAME's 'os_platform' is a mixture of OS type, OS version, architecture & product name
* Bug 1291689 - Add support for platforms that don't have a build type ("lint opt" -> "lint")
* Bug 1174186 - treeherder etl layer should use mozinfo platform names, not buildbot ones
* Bug 1056928 - Fix machine_platform or else remove it if it's not needed
* Bug 1458560 - [meta] Move platform display name mappings out of Treeherder's UI
Thanks for the triage, Ed!

Something else I considered is that changing the "platform" name has the potential to confuse Perfherder. That could be significantly disruptive. If we move forward with this, we may have to define the "backwards compatible name" in the schema to ease the transition.
Agree this would cause a Perfherder continuity break - though bug 1458560 comment 5 seems to suggest that is ok. (Guess it could be landed as a standalone change on mozilla-central and numerous retriggers performed on the before/after.)
Priority: -- → P3
See Also: → 1458560

:camd I think this is fixed, so I closed it. Reopen if I am wrong

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(cdawson)
Resolution: --- → FIXED

I think it is. Thanks for closing.

Flags: needinfo?(cdawson)
You need to log in before you can comment on or make changes to this bug.