decision task opt decision task for cron job nightly-mochitest-valgrind cron(vg) broken on central

RESOLVED FIXED

Status

Taskcluster
General
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: Tomcat, Assigned: pmoore)

Tracking

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

a year ago
like https://treeherder.mozilla.org/logviewer.html#?job_id=75018174&repo=mozilla-central&lineNumber=779 

[task 2017-02-07T04:01:56.234330Z] HTTPError: 409 Client Error: Conflict for url: http://taskcluster/queue/v1/task/XYQVC7MnQA2wZSjl949hXg

retrigger does not help. I guess whats the real error/conflict here that this error means is :

[task 2017-02-07T04:01:56.227014Z] "PUT /queue/v1/task/XYQVC7MnQA2wZSjl949hXg HTTP/1.1" 409 10376
[task 2017-02-07T04:01:56.228066Z] Task group HDMoXSanTVCGzQB-ODJ-4g contains tasks with
[task 2017-02-07T04:01:56.228133Z] schedulerId gecko-level-3-cron. You are attempting
[task 2017-02-07T04:01:56.228208Z] to include tasks from schedulerId gecko-level-3,
[task 2017-02-07T04:01:56.228275Z] which is not permitted.
[task 2017-02-07T04:01:56.228347Z] All tasks in the same task-group must have the same schedulerId.
[task 2017-02-07T04:01:56.228515Z] ----
[task 2017-02-07T04:01:56.228573Z] errorCode:  RequestConflict
[task 2017-02-07T04:01:56.228601Z] statusCode: 409
(Assignee)

Comment 1

a year ago
I suspect https://hg.mozilla.org/mozilla-central/file/af8a2573d0f1/taskcluster/taskgraph/cron/decision.py#l95 should be changed to remove -cron in the schedulerId name, or those cron tasks should be put in a dedicated task group.

The choice about whether one task group or two task groups should be used is probably mostly an aesthetic one.

Looks like fallout from bug 1252948.
Flags: needinfo?(dustin)
See Also: → bug 1252948
(Assignee)

Comment 3

a year ago
Ah, looks like these decision cron jobs are not scheduled on try - I see no jobs created there.
(Assignee)

Comment 4

a year ago
Well, I guess these are scheduled based on a cron - so that isn't surprising - maybe these tasks would get added later to the try push by the taskcluster-hooks service.

From https://bugzilla.mozilla.org/show_bug.cgi?id=1252948#c19 it looks like these valgrind tasks only get run once-per-week, but probably all cron tasks are affected, not just the valgrind ones (at a guess - as I guess all taskcluster-hooks added tasks will get this new schedulerId).

Long story short, I haven't dived deeply into the code, but based on my superficial understanding, I'm guessing that https://hg.mozilla.org/try/rev/329b2478d8e8af9550a53015874127eef095fe3b will probably fix things, although it could impact roles that might have been set up which include the schedulerId in scopes it/they contain. In other words, it might require also adjusting some taskcluster roles.

I'm guessing it is probably best for us to wait until dustin/Callek/kmoir get in, who know this stuff far better than me. :-)
(Assignee)

Comment 5

a year ago
However, if this becomes tree-closing, I'm happy to work on rolling out the patch, looking for auth failures, and adjusting roles as necessary.
(Assignee)

Comment 6

a year ago
So looks like this cron runs every 15 mins, and then presumably based on the in-tree cron schedules decides what to run (so stuff can be scheduled more infrequently than every 15 mins), and when it runs, it uses the head of the default branch in mozilla-central.

https://tools.taskcluster.net/hooks/#project-releng/cron-task-mozilla-central
(Assignee)

Comment 7

a year ago
From that hook, we can see these are the scopes available to the cron task that runs:

https://tools.taskcluster.net/auth/roles/#hook-id:project-releng%252fcron-task-mozilla-central

So this task already has

queue:create-task:aws-provisioner-v1/gecko-1-*
queue:create-task:aws-provisioner-v1/gecko-2-*
queue:create-task:aws-provisioner-v1/gecko-3-*

which means, removing '-cron' from the schedulerId name (e.g. gecko-level-3-cron => gecko-level-3) shouldn't break anything, since the 'gecko-{level}-*' still matches the shortened scheduler name.
(Assignee)

Comment 8

a year ago
Created attachment 8834324 [details] [diff] [review]
bug1337300_gecko_v1.patch
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8834324 - Flags: review?(rgarbas)
(Assignee)

Updated

a year ago
Flags: needinfo?(dustin)
Attachment #8834324 - Flags: review?(rgarbas) → review?(dustin)
:pmoore: I'm not sure I have the understanding (yet) of what this change my cause (have not yet play much with intree stuff). But thank you for adding me, I follow the discussion and hopefully learn something. :dustin: might be a better person to review it.
Comment on attachment 8834324 [details] [diff] [review]
bug1337300_gecko_v1.patch

Review of attachment 8834324 [details] [diff] [review]:
-----------------------------------------------------------------

Great minds think alike:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/e74dc930625cea6f16fb5f9f9bb13f1431261521
so r+, but no need to land this since it's already landed.
Attachment #8834324 - Flags: review?(dustin) → review+
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
5 failures in 836 pushes (0.006 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* mozilla-central: 5

Platform breakdown:
* gecko-decision: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1337300&startday=2017-02-06&endday=2017-02-12&tree=all
You need to log in before you can comment on or make changes to this bug.