Closed Bug 1523848 Opened Last year Closed 4 months ago

JobModel.getList() doesn't include the "taskcluster_metadata" property, unlike JobModel.get()

Categories

(Tree Management :: Treeherder: API, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KWierso, Assigned: camd)

References

(Blocks 1 open bug)

Details

Bug 1519599 changed the way retriggering jobs works in Treeherder.

Previously, it would take the list of jobs ids to be retriggered, and loop through them converting each of them into JobModel jobs via JobModel.get(repoName, id), then looking up the action tasks available for that job via TaskclusterModel.load(decisionTaskId, job), then calling the 'retrigger' action for it. If there were a lot of jobs requested to be retriggered, this would spawn a lot of action tasks, in addition to taking a long time to get() each of the job's information one at a time.

After 1519599, we now take the list of job ids to be retriggered, and use JobModel.getList() to fetch the job info for all of the requested jobs at once. We then split the list of JobModel jobs and process them based on the push that contains them, calling TaskclusterModel.load(decisionTaskId) to fetch the per-push actions, using the 'add-new-jobs' action to submit these jobs to be re-run. This cut down on the turnaround time for fetching a large list of jobs since it's one request to the API for all of them, plus it only creates one action task per push involved in the requests.

That's great, and a good improvement, but I want to make retriggering even better in bug 1521032 (and the overlapping/complimentary bug 1510002), by batching up retrigger requests for the same job, going back to using the 'retrigger' action because it has a 'times' value you can set to request multiple copies of the job be re-run with a single request (select a job in treeherder and press the 'r' shortcut X times: Ideally, this will run all X copies of the selected job with a single action task spawning them. Less ideally, all X copies will be run with X divided by https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/actions/retrigger.py#78 action tasks spawning them.)

I'm most of the way there, but I can't find the 'retrigger' action anymore for jobs in the TaskclusterModel.

I'd like to keep using getList() to fetch all of the job information at once, but the "taskcluster_metadata" property on the JobModel jobs doesn't seem to be getting included, which is causing us to not meet this condition https://github.com/mozilla/treeherder/blob/6eeff4bafe3f87829304d36926ff4ddfd943235c/ui/models/taskcluster.js#L91 which I guess makes us fall back to the per-push actions?

Job info fetched via getList():

{
  "job_type_symbol": "wpt5",
  "reason": "scheduled",
  "job_group_description": "",
  "result_set_id": 427838,
  "tier": 1,
  "machine_platform_os": "-",
  "failure_classification_id": 1,
  "option_collection_hash": "32faaecac742100f7753f0c1d0aa0add01b4046b",
  "build_platform_id": 121,
  "last_modified": "2019-01-29T01:57:36.305553",
  "job_group_id": 461,
  "start_timestamp": 1548726704,
  "ref_data_name": "04fb0806d8272f0852eae30d32e9e92f198c2ca5",
  "build_architecture": "-",
  "who": "wkocher@mozilla.com",
  "result": "testfailed",
  "job_type_id": 40256,
  "id": 224608906,
  "machine_platform_architecture": "-",
  "end_timestamp": 1548727042,
  "push_id": 427838,
  "job_group_symbol": "W",
  "signature": "04fb0806d8272f0852eae30d32e9e92f198c2ca5",
  "job_group_name": "Web platform tests",
  "submit_timestamp": 1548726262,
  "build_system_type": "taskcluster",
  "machine_name": "i-098bbb6d8ba6bdec5",
  "job_guid": "5600fa65-f998-40d8-b81e-c999f0783e6b/0",
  "platform_option": "debug",
  "job_type_description": "",
  "platform": "linux32",
  "state": "completed",
  "build_os": "-",
  "build_platform": "linux32",
  "job_type_name": "test-linux32/debug-web-platform-tests-5"
}

Job info fetched via get():

{
  "push_id": 427838,
  "job_type_description": "",
  "reason": "scheduled",
  "job_type_id": 40256,
  "build_platform": "linux32",
  "build_platform_id": 121,
  "failure_classification_id": 1,
  "id": 224608906,
  "result": "testfailed",
  "job_group_name": "Web platform tests",
  "ref_data_name": "04fb0806d8272f0852eae30d32e9e92f198c2ca5",
  "option_collection_hash": "32faaecac742100f7753f0c1d0aa0add01b4046b",
  "result_set_id": 427838,
  "job_group_symbol": "W",
  "job_guid": "5600fa65-f998-40d8-b81e-c999f0783e6b/0",
  "machine_platform_architecture": "-",
  "job_group_description": "",
  "submit_timestamp": 1548726262,
  "start_timestamp": 1548726704,
  "machine_platform_os": "-",
  "signature": "04fb0806d8272f0852eae30d32e9e92f198c2ca5",
  "machine_name": "i-098bbb6d8ba6bdec5",
  "tier": 1,
  "last_modified": "2019-01-29T01:57:36.305553",
  "build_os": "-",
  "build_architecture": "-",
  "job_type_name": "test-linux32/debug-web-platform-tests-5",
  "state": "completed",
  "platform": "linux32",
  "who": "wkocher@mozilla.com",
  "end_timestamp": 1548727042,
  "job_type_symbol": "wpt5",
  "job_group_id": 461,
  "build_system_type": "taskcluster",
  "resource_uri": "/api/project/try/jobs/224608906/",
  "logs": [
    {
      "name": "builds-4h",
      "url": "https://queue.taskcluster.net/v1/task/VgD6ZfmYQNi4HsmZ8Hg-aw/runs/0/artifacts/public/logs/live_backing.log"
    },
    {
      "name": "errorsummary_json",
      "url": "https://queue.taskcluster.net/v1/task/VgD6ZfmYQNi4HsmZ8Hg-aw/runs/0/artifacts/public/test_info//wpt_errorsummary.log"
    }
  ],
  "platform_option": "debug",
  "taskcluster_metadata": {
    "task_id": "VgD6ZfmYQNi4HsmZ8Hg-aw",
    "retry_id": 0
  },
  "autoclassify_status": "skipped"
}

Is there any way to have getList() jobs include the taskcluster_metadata? Or some other way for me to get the 'retrigger' action to be listed?

Flags: needinfo?(emorley)

Hi!

The single job detail response is generated here:
https://github.com/mozilla/treeherder/blob/660200ecbc49c2ae79977e9ad486aac64d249f07/treeherder/webapp/api/jobs.py#L198-L236

In that block are a few lines that add the taskcluster metadata.

In comparison the job list response is generated here:
https://github.com/mozilla/treeherder/blob/660200ecbc49c2ae79977e9ad486aac64d249f07/treeherder/webapp/api/jobs.py#L238-L304

Flags: needinfo?(emorley)

At first glance I think adjusting the select_related should work - I would try it out locally and see what happens :-) (/add a test)

See Also: → 1496858
Priority: -- → P3

So I adjusted the select_related, but that didn't seem to change the results.

I added a 'taskcluster_metadata' entry to https://github.com/mozilla/treeherder/blob/660200ecbc49c2ae79977e9ad486aac64d249f07/treeherder/webapp/api/jobs.py#L117-L152 and I then see 'taskcluster_metadata' in the results, but it seems to always match the 'id' value, not some additional information.

Any ideas?

Flags: needinfo?(emorley)

Ordinarily one would modify the serializer being used to add the nested model metadata. However the jobs endpoint uses a custom approach to serialization [1], so will need changing a little differently. I think Cameron wrote the current handling, so might know the best way to change it. (I'm also pretty busy wrapping up the last few things before I leave, so tight for time at the moment.)

[1] it skips the duplicate key names in the response, I believe in order to try and improve performance, though might be worth benchmarking against a standard serializer approach at some point, to see if it's still worth the added complexity?

Flags: needinfo?(emorley)

I'm going to fix this in my next PR. I had to add it to the "select_related" but also in the output fields.

Assignee: nobody → cdawson
Status: NEW → ASSIGNED
Priority: P3 → P2
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.