Classifying tasks using the pulse messages
Categories
(Taskcluster :: General, defect)
Tracking
(Not tracked)
People
(Reporter: sfraser, Unassigned)
Details
At the moment, when we're examining CI costs and efficiency, we're looking at the pushlog and index to work out what's been happening. This has a high latency from event -> metric and a lot of s3 crawling for artifact data.
We were looking at using pulse, instead, and collecting metrics from task-completed, task-failed, task-exception and artifact-created.
This look like it'll work well, although some of the ways people will want to categorise the metrics would have to be added in post-processing, as information about the product, target platform and repository aren't in the task status. "Show me only mobile builds on mozilla-central" or "Show me linux64 firefox releases" are example ways we'd want to classify things when summarising the numbers.
What would be the best way of getting that information from the event stream? Could we add task tags to the data sent for task-* ? (Assuming we audit it first and remove any cruft). Even if only added to task-defined messages, we could then merge the data afterwards without making expensive listTaskGroup calls, examining the decision task, or similar.
Comment 1•6 years ago
|
||
This is a great idea! We export almost enough data now to support this kind of arbitrary, detailed query.
Tasks are not included in the pulse payload, but we could add them without too much trouble. I'd like to avoid adding too much data to the messages, just to keep total bandwidth down. There may come a point where whatever is listening to these events needs to fetch and cache the task definition separately.
I think we talked about a few missing things that will soon be available.
- artifact sizes (should be possible once we are using blob artifacts / object service)
- instance startup and shutdown times (should be possible with the worker manager)
| Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1)
Tasks are not included in the pulse payload, but we could add them without too much trouble. I'd like to avoid adding too much data to the messages, just to keep total bandwidth down.
Agreed, it would be easy for those messages to become huge. I can start to go through the tags and make sure they've got the fields we need, if this is a good option to take.
There may come a point where whatever is listening to these events needs to fetch and cache the task definition separately.
Won't that be a heavy workload? We'd be doing this for every task.
I think we talked about a few missing things that will soon be available.
- artifact sizes (should be possible once we are using blob artifacts / object service)
- instance startup and shutdown times (should be possible with the worker manager)
This sounds great! Do we know roughly when these might appear in the messages?
Comment 3•6 years ago
|
||
Tasks are not included
I think you understood correctly, but for the record I meant tags! Tasks are way too big :(
Won't that be a heavy workload?
Especially when we transition to using postgres for the queue, I don't think fetching the task definition will be a big load. That task definition will most likely be "hot" in the postgres server's memory since it was just active. We can also easily add caching of task definitions, since they are immutable.
This sounds great! Do we know roughly when these might appear in the messages?
Q3, ish? Both services are still in early stages of development, so it's hard to say precisely.
| Reporter | ||
Comment 4•6 years ago
|
||
Notes about the items we're hoping to add:
Platform
- Inheriting this from the build tasks seems most reliable, but also involves a lot of transform and schema modification to pass down the values.
- Normalising the treeherder platform does produce plausible looking short platform names, and can be done in transforms/task.py
Product:
- There's release_product and release_type for releases, but mostly this isn't populated
Origin:
- target_tasks_method seems best to identify push / nightly / release graph. It might also do double duty identifying the product with names like ship_fennec, nightly_desktop, and so on.
Comment 5•6 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #3)
Q3, ish? Both services are still in early stages of development, so it's hard to say precisely.
These won't happen until Q4.
Comment 6•5 years ago
|
||
Simon: we lost track of this, sorry. Do you still need this? Have your requirements changed?
| Reporter | ||
Comment 7•5 years ago
|
||
Hi coop,
Our currentl ETL listens for the task definition using pulse as well, and we merge data in using that. So our immediate problem was solved that way, although I think there's still a good discussion to be had about metadata, tags and attributes and how we could use them better. Could easily be a different bug, though.
Description
•