Closed
Bug 1289823
Opened 9 years ago
Closed 8 years ago
Add "backfill" support for TaskCluster jobs
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Assigned: bstack)
References
Details
Attachments
(2 files)
We're soon adding SETA for TaskCluster and when we find an orange for test jobs we someitmes would like to request backfilling of that job across pushes.
For Buildbot, mozci can determine which pushes lack the job as well as if the builds for those tests have completed.
With TC is easier since I can know what build a test job depends on.
Anyways, once SETA is enabled we can look into adding this feature.
Comment 1•8 years ago
|
||
Will this be done in pulse_actions?
Reporter | ||
Comment 2•8 years ago
|
||
Yes
Summary: Add backfilling support for TaskCluster jobs → pulse_actions - Add "backfill" support for TaskCluster jobs
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → bstack
Reporter | ||
Comment 3•8 years ago
|
||
Removing pulse_actions from the bug summary since this might be implemented under taskcluster-treeherder.
Summary: pulse_actions - Add "backfill" support for TaskCluster jobs → Add "backfill" support for TaskCluster jobs
Comment hidden (mozreview-request) |
Assignee | ||
Comment 5•8 years ago
|
||
Reporter | ||
Comment 6•8 years ago
|
||
mozreview-review |
Comment on attachment 8815086 [details]
Bug 1289823 - Make taskcluster action-task more flexible
https://reviewboard.mozilla.org/r/96072/#review96476
Attachment #8815086 -
Flags: review?(armenzg) → review+
Comment 7•8 years ago
|
||
mozreview-review |
Comment on attachment 8815086 [details]
Bug 1289823 - Make taskcluster action-task more flexible
https://reviewboard.mozilla.org/r/96072/#review96480
Attachment #8815086 -
Flags: review?(dustin) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8817283 -
Flags: review?(dustin)
Attachment #8817283 -
Flags: review?(armenzg)
Reporter | ||
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98116
This is on a good path.
Could you please address the comments and re-flag for review?
::: taskcluster/mach_commands.py:184
(Diff revision 3)
> + required=True,
> + help="Decision Task ID of the reference decision task")
> + @CommandArgument('--task-labels',
> + required=True,
> + help='Comma separated list of task labels to be scheduled')
> + def taskgraph_add_tasks(self, **options):
The subcommand is called 'action-task' while the function is called '_add_tasks'.
The next command is 'add-tasks' while the function is '_action'.
Is this intentional?
::: taskcluster/taskgraph/action.py:22
(Diff revision 3)
> from .taskgraph import TaskGraph
>
> logger = logging.getLogger(__name__)
> TASKCLUSTER_QUEUE_URL = "https://queue.taskcluster.net/v1/task/"
> +TREEHERDER_URL = "https://treeherder.mozilla.org/api/"
> +MAX_BACKFILL_RESULTSETS = 5
Let's add a note that currently SETA sets the counter for every repository to the same value.
If this was to change we would need an API on Treeherder to let you know the correct value for each repository.
::: taskcluster/taskgraph/action.py:82
(Diff revision 3)
> + a successful job is found or `N` jobs have been scheduled.
> + """
> + s = requests.Session()
> + s.headers.update({"User-Agent": "gecko-intree-backfill-task"})
> +
> + job_url = TREEHERDER_URL + "project/" + project + "/jobs/" + job_id + "/"
I believe this is easier to read:
```py
job_url = "{}/path/{}".format(foo, bar)
```
::: taskcluster/taskgraph/action.py:86
(Diff revision 3)
> +
> + job_url = TREEHERDER_URL + "project/" + project + "/jobs/" + job_id + "/"
> + job = s.get(url=job_url).json()
> +
> + if job["build_system_type"] != "taskcluster":
> + raise ValueError("Invalid build system type! Must be a Taskcluster job.")
Can we get to this situation?
How can someone request a backfill of a non-taskcluster job and this backfill task be triggered?
I thought the pulse messages from Treeherder would only notify you of TaskCluster requests.
::: taskcluster/taskgraph/action.py:90
(Diff revision 3)
> + if job["build_system_type"] != "taskcluster":
> + raise ValueError("Invalid build system type! Must be a Taskcluster job.")
> +
> + platform = job["build_platform_id"]
> + platform_option = job["platform_option"]
> + job_type = job["job_type_id"]
You can probably turn this into:
```py
job_to_backfill = {
'build_platform_id': ...
...
}
```
This will make it so you can call load_decisions() with just passing `job_to_backfill` and make the comparison of is_job_type() be a comparison between two dictionaries.
::: taskcluster/taskgraph/action.py:93
(Diff revision 3)
> + platform = job["build_platform_id"]
> + platform_option = job["platform_option"]
> + job_type = job["job_type_id"]
> + task_label = job["job_type_name"]
> +
> + resultset_url = TREEHERDER_URL + "project/" + project + "/resultset/"
Same as above.
::: taskcluster/taskgraph/action.py:103
(Diff revision 3)
> + decisions = load_decisions(s, project, platform, platform_option, resultsets, job_type)
> +
> + [add_tasks(decision, [task_label], '{}-'.format(decision)) for decision in decisions]
> +
> +
> +def load_decisions(s, project, platform, platform_option, resultsets, job_type):
Could you please add a docstring describing what we're doing in here?
::: taskcluster/taskgraph/action.py:104
(Diff revision 3)
> +
> + [add_tasks(decision, [task_label], '{}-'.format(decision)) for decision in decisions]
> +
> +
> +def load_decisions(s, project, platform, platform_option, resultsets, job_type):
> + url = TREEHERDER_URL + "project/" + project + "/jobs/"
Please get rid of this variable since you don't need it for both loops.
::: taskcluster/taskgraph/action.py:107
(Diff revision 3)
> +
> +def load_decisions(s, project, platform, platform_option, resultsets, job_type):
> + url = TREEHERDER_URL + "project/" + project + "/jobs/"
> + decisions = []
> + for resultset in resultsets:
> + params = {"push_id": resultset, "count": 2000}
I think that we're getting to the place where we can't use the magical 2000 count.
IIRC it returns 2000 jobs for a push.
We will soon be hitting that limit if we don't already.
Could you please file a bug for TH team to have a way to simply say "give me all jobs for a push" rather than "give me X jobs for a push".
Not blocking on landing. Double the number for now.
If you've seen that number anywhere else please let me know! :D
::: taskcluster/taskgraph/action.py:111
(Diff revision 3)
> + for resultset in resultsets:
> + params = {"push_id": resultset, "count": 2000}
> + results = s.get(url=url, params=params).json()["results"]
> + filtered = [j for j in results if is_job_type(j, platform, job_type, platform_option)]
> + if len(filtered) > 1:
> + raise Exception("Too many jobs matched. Aborting.")
I believe this raises an exception if too many jobs are found for the job we want to backfill on one of the many pushes we're considering.
In such case, will we schedule anything?
::: taskcluster/taskgraph/action.py:124
(Diff revision 3)
> + details = s.get(url=TREEHERDER_URL + "jobdetail/", params=params).json()["results"]
> + inspect = [detail["url"] for detail in details if detail["value"] == "Inspect Task"][0]
> +
> + # Pull out the taskId from the URL e.g.
> + # oN1NErz_Rf2DZJ1hi7YVfA from tools.taskcluster.net/task-inspector/#oN1NErz_Rf2DZJ1hi7YVfA/
> + decision_ids.append(inspect.partition('#')[-1].rpartition('/')[0])
Could you please start a thread for the TH team?
I would like the jobdetail (or another API) to have the task ID of each TaskCluster job.
You should not have to do this.
Not a blocker issue.
Attachment #8817283 -
Flags: review?(armenzg) → review-
Comment 12•8 years ago
|
||
mozreview-review |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98152
::: taskcluster/taskgraph/action.py:86
(Diff revision 3)
> +
> + job_url = TREEHERDER_URL + "project/" + project + "/jobs/" + job_id + "/"
> + job = s.get(url=job_url).json()
> +
> + if job["build_system_type"] != "taskcluster":
> + raise ValueError("Invalid build system type! Must be a Taskcluster job.")
This might be better as a short-circuit return -- no need to mark this action task as a failure in this csae..
::: taskcluster/taskgraph/action.py:100
(Diff revision 3)
> + results = s.get(url=resultset_url, params=params).json()["results"]
> + resultsets = [resultset["id"] for resultset in results]
> +
> + decisions = load_decisions(s, project, platform, platform_option, resultsets, job_type)
> +
> + [add_tasks(decision, [task_label], '{}-'.format(decision)) for decision in decisions]
I think the gecko style is to prefer explicit iteration (`for ..: ..`) when the side effect is more important than the resulting list.
Attachment #8817283 -
Flags: review?(dustin)
Comment 13•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98116
> Let's add a note that currently SETA sets the counter for every repository to the same value.
>
> If this was to change we would need an API on Treeherder to let you know the correct value for each repository.
This is the maximum to backfill, right? In which case this value can be fixed, but must be at least as large as the maximum SETA counter.
> I think that we're getting to the place where we can't use the magical 2000 count.
> IIRC it returns 2000 jobs for a push.
> We will soon be hitting that limit if we don't already.
>
> Could you please file a bug for TH team to have a way to simply say "give me all jobs for a push" rather than "give me X jobs for a push".
>
> Not blocking on landing. Double the number for now.
> If you've seen that number anywhere else please let me know! :D
If there's not an established way to use a continuation token of some sort to iterate over these results, we should probably raise that as an issue for the TH folks. 2000 is a lot for one HTTP request. 4000 is a LOT. Even more will be very problematic..
Assignee | ||
Comment 14•8 years ago
|
||
mozreview-review |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98416
::: taskcluster/taskgraph/action.py:86
(Diff revision 3)
> +
> + job_url = TREEHERDER_URL + "project/" + project + "/jobs/" + job_id + "/"
> + job = s.get(url=job_url).json()
> +
> + if job["build_system_type"] != "taskcluster":
> + raise ValueError("Invalid build system type! Must be a Taskcluster job.")
This is partially here just as a sanity check in case it gets triggered for a non-tc job. I can remove it if we think it is unnecessary, but I can see situations where I might mess up the service somehow and this would get triggered for some other sort of job.
I can see how this is kinda like a ```a = 2 + 2; assert(a == 4)```, but hopefully not that bad.
::: taskcluster/taskgraph/action.py:104
(Diff revision 3)
> +
> + [add_tasks(decision, [task_label], '{}-'.format(decision)) for decision in decisions]
> +
> +
> +def load_decisions(s, project, platform, platform_option, resultsets, job_type):
> + url = TREEHERDER_URL + "project/" + project + "/jobs/"
I've fixed this by naming it more nicely and pulling the other url up to the top as well. It flows a bit more nicely this way I think.
::: taskcluster/taskgraph/action.py:107
(Diff revision 3)
> +
> +def load_decisions(s, project, platform, platform_option, resultsets, job_type):
> + url = TREEHERDER_URL + "project/" + project + "/jobs/"
> + decisions = []
> + for resultset in resultsets:
> + params = {"push_id": resultset, "count": 2000}
I just saw that number in Treeherder UI itself and cargo-culted it here. The MAX_JOBS_COUNT is 2000 afaict, but it does appear to have "offset" available, so we can iterate through.
::: taskcluster/taskgraph/action.py:111
(Diff revision 3)
> + for resultset in resultsets:
> + params = {"push_id": resultset, "count": 2000}
> + results = s.get(url=url, params=params).json()["results"]
> + filtered = [j for j in results if is_job_type(j, platform, job_type, platform_option)]
> + if len(filtered) > 1:
> + raise Exception("Too many jobs matched. Aborting.")
I had this in here to assert that my understanding of the system was accurate. If I had too few filtered attributes, this check would fail. If we know that the amount of attributes we're filtering on is correct, then we can remove this.
::: taskcluster/taskgraph/action.py:124
(Diff revision 3)
> + details = s.get(url=TREEHERDER_URL + "jobdetail/", params=params).json()["results"]
> + inspect = [detail["url"] for detail in details if detail["value"] == "Inspect Task"][0]
> +
> + # Pull out the taskId from the URL e.g.
> + # oN1NErz_Rf2DZJ1hi7YVfA from tools.taskcluster.net/task-inspector/#oN1NErz_Rf2DZJ1hi7YVfA/
> + decision_ids.append(inspect.partition('#')[-1].rpartition('/')[0])
Created https://bugzilla.mozilla.org/show_bug.cgi?id=1323110
Comment hidden (mozreview-request) |
Reporter | ||
Comment 16•8 years ago
|
||
mozreview-review |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98574
LGTM
Attachment #8817283 -
Flags: review?(armenzg) → review+
Reporter | ||
Comment 17•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98416
> I had this in here to assert that my understanding of the system was accurate. If I had too few filtered attributes, this check would fail. If we know that the amount of attributes we're filtering on is correct, then we can remove this.
This is OK. I don't know how to clear the issue on mozreview. Ignore that is still active.
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8817283 [details]
Bug 1289823 - Add backfilling as an action-task
https://reviewboard.mozilla.org/r/97634/#review98592
Attachment #8817283 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•8 years ago
|
Status: NEW → ASSIGNED
Keywords: checkin-needed
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 19•8 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1bdd821a52e2
Make taskcluster action-task more flexible r=armenzg,dustin
https://hg.mozilla.org/integration/autoland/rev/5d8a7572ef59
Add backfilling as an action-task r=armenzg,dustin
Keywords: checkin-needed
Comment 20•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1bdd821a52e2
https://hg.mozilla.org/mozilla-central/rev/5d8a7572ef59
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 21•8 years ago
|
||
I see a pulse_actions PR so I assume we're not yet done.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 22•8 years ago
|
||
I've merged and deployed the latest pulse_actions' PR:
https://github.com/mozilla/pulse_actions/pull/121
Assignee | ||
Comment 23•8 years ago
|
||
Here's the current state of this:
* Treeherder uses tc login, but doesn't store the credentials needed to access tc services directly with a logged-in user's client. This means that until this is fixed (bug 1273096), we need to continue to use pulse_actions to trigger these tasks. Even once that bug is fixed, a decision must be made on bug 1278986.
* pulse_actions currently only has tc scopes to add jobs to try pushes. Pending bug 1284989, we won't be able to enable this for backfilling for any of the repos where backfilling is actually useful. The scopes in question are particularly ["queue:route:tc-treeherder.v2.autoland.*", "queue:route:tc-treeherder-stage.v2.autoland.*"].
* I temporarily gave pulse_actions those scopes to see what would happen. https://tools.taskcluster.net/task-inspector/#USPq6YayRhW0JgnpacFBrw/0 is the result. As you can see, it failed because the action task itself assumes it will only have access to level-1 caches. I assume some discussion about this will also happen in a risk assesment.
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 24•8 years ago
|
||
Backfilling should be working now. Tested and shown to work on this autoland range: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=linux%2064%20pgo%20tc%20mochitest&fromchange=487f977ef67d5ab211dd3fa5ca565948ededef4e&tochange=ff9a07a874fd1ef273f23aa26cac61a95ba0f342
bc5 tests were backfilled from ff9a07a874fd and added to the previous 5 jobs. Marking this resolved for now. Please reopen if this doesn't work!
Status: ASSIGNED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Component: Treeherder → Task Configuration
Product: Tree Management → Taskcluster
Version: --- → unspecified
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•