Closed Bug 1364421 Opened 7 years ago Closed 7 years ago

unable to backfill or add new jobs for buildbot bridge job (linux64 talos, OSX *)

Categories

(Taskcluster :: General, enhancement)

enhancement
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: bstack)

Details

Attachments

(5 files)

I have been trying for the last hour to 'add new jobs' and 'backfill' for some missing talos data on:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&group_state=expanded&filter-searchStr=linux%20talos%20e10s&tochange=0515ebda07af3263ab124ba1a6eabe212e9e1b89&fromchange=a5e5a6e086f8689b1a481af2393a52deeca25e27

this is frustrating as performance is a priority item.  I would like to request that we close the trees until this is fixed.
:garndt, this looks to be a taskcluster issue, can you get someone on the taskcluster team to look into this?
Flags: needinfo?(garndt)
Severity: normal → blocker
as a note, I can 'backfill' and 'add new jobs' for windows talos tests- so this looks to be exclusively related to taskcluster
Summary: unable to backfill or add new jobs for linux64 talos → unable to backfill or add new jobs for buildbot bridge job (linux64 talos, OSX *)
So far this seems to be limited to buildbot-bridge jobs.

For the OS X and Linux jobs that were requested to be backfilled (both of which are BBB jobs), these errors appear within pulse_actions:
https://tc-gp-public-31d.s3-us-west-2.amazonaws.com/ateam/pulse-action-dev/9ea6fed3-63ab-402e-9e8b-1e9679e7d73d

There is the chance that backfilling is having trouble tracing a job back to the builder.

Investigation is ongoing and involves looking into : https://github.com/mozilla/mozilla_ci_tools/blob/master/mozci/platforms.py#L172
So far the investigation is pointing to the fact that builder schedulers are no longer defined for OS X and Linux because they either run in TC entirely (linux) or scheduled via BBB (OS X).

Linux has been this way for quite some time, and OS X was changed on May 4th.

There are some possibilities found in mozci and pulse_actions of things to change, but not 100% certain.
Flags: needinfo?(garndt)
That would seem to suggest that this is not a new issue, so can we get trees reopened while it's being fixed?
I would suggest turning SETA off and opening the trees until this is fixed.  It will greatly increase our load, but allow us to not depend on backfilling or adding arbitrary jobs.
Attachment #8867306 - Flags: review?(bstack)
Keywords: leave-open
Attachment #8867306 - Flags: review?(bstack) → review+
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f8c033cd3e9
temporarily disable SETA. r=bstack, a=CLOSED TREE
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fec1331f50b8
temporarily disable SETA for BBB only. r=bstack, a=CLOSED TREE
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/73b3fc64525b
actually disable SETA, instead of never running talos; r=bstack a=infra-fix
Comment on attachment 8869544 [details]
Bug 1364421 - Allow BBB jobs to be backfilled

https://reviewboard.mozilla.org/r/141128/#review144710
Attachment #8869544 - Flags: review?(garndt) → review+
Attachment #8869565 - Flags: review?(cdawson)
Attachment #8869565 - Flags: review?(cdawson) → review+
Pushed by kwierso@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1cd72f93f155
Allow BBB jobs to be backfilled r=garndt
So, once the gecko patch merges to mozilla-central (sometime over the weekend or monday morning, most likely) and the Treeherder patch gets deployed to production (probably sometime on Monday), I think something can land to re-enable SETA for the affected jobs.

Another patch to allow BBB jobs to be triggered via the "Add New Jobs" feature in Treeherder would probably be good at some point, too. Jmaher can probably speak to whether backfilling alone is sufficient for reenabling SETA.
Flags: needinfo?(jmaher)
we can work around add new jobs by using backfilling- so Monday I will get this enabled!
Flags: needinfo?(jmaher)
Can you verify that backfilling works from Treeherder stage? You'll need to be on a branch/push with that gecko commit on it (and probably need to have had it landed a few pushes earlier).
I don't think we ever made backfilling work on stage. There's another patch for it floating around that I can try to land again. I broke everything the last time I tried to land it though. Might be bitrotted so I can try to make it work on Monday.
Attachment #8870127 - Flags: review?(cdawson)
Attachment #8870127 - Flags: review?(cdawson) → review+
should we go forward and enable SETA again?  I am not sure if all the pieces we know about are landed and fully deployed.
afaict we should be good to re-enable seta. I believe the backfill patch for treeherder is in production and the add-new-jobs one is landed in master. As far as the in-tree patch is concerned, once it has been merged around into all of the branches it is done.

* This all assumes that there are no new bugs introduced by the changes of course.

But I think the next steps would be to have somebody with the permissions to backfill these jobs to try one out in the real world and then turn seta back on if it works!
(In reply to Joel Maher ( :jmaher) from comment #28)
> should we go forward and enable SETA again?  I am not sure if all the pieces
> we know about are landed and fully deployed.

SETA could be enabled again I think.  We've done about as much testing as I think we could at this point.  SETA/backfilling is pretty hard to test in a non-live environment.
Flags: needinfo?(jmaher)
Flags: needinfo?(jmaher)
Attachment #8871792 - Flags: review?(dustin)
Attachment #8871792 - Flags: review?(dustin) → review+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: