A lot of stylo jobs got enabled on more trees than just central. It seems SETA allows these to always run. It could be due that the 2-week grace period is kicking in. We probably should remove that or bring it to a couple of days. Stylo jobs don't show up in here, thus, always being scheduled: https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities/?build_system_type=buildbot https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities/?build_system_type=taskcluster Now, the jobs do show up as valid job-types: [ "reftest-stylo-e10s-14", "opt", "macosx64-stylo" ], in https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-types/ There's an expiration column somewhere that we can change to some value (going off memory here). Now, could someone remind me where do I need to connect? If I'm right about this, we should add this information to the documentation for the next time like this happens or how to remedy it.
All Mac stylo jobs are marked as "high value" (priority=1) and their expiration date is set of the 13th of August. I've requested that they get updated. You can edit a query of Treeherder's job priority to see for yourself: https://sql.telemetry.mozilla.org/queries/10649/source#table
Assignee: nobody → armenzg
Hi Joel, I would like to remove the 2 weeks grace period from SETA's code. We had to disable Mac stylo jobs in most palces because they tipped over our Mac test capacity. As far as I'm concerned with the 2 weeks grace period is that it will bite us again. Getting into the same situation is more troublesome than having the grace period. OK with removing it?
I really don't like the idea of removing this- basically a brand new job that we enable will be run periodically, can we reduce it to 1 week? can we special case osx stylo?
What would be the worse it could happen if we did not have such grace period? We find a regression on a change being considered for merge and need to wait for backfill results? Too many hours were wasted this week trying to understand what was going on and on getting us out of the hole. We're still not out of it. Another solution would be if we run new jobs *first* on a repository for few days (reduce grace period to such N days) and use that as reference. We currently use 'mozilla-inbound' as our reference repository and a 2 week grace period. This would be a procedure change and require human enforcement (maybe some code in-tree could be placed to enforce it).
iirc when we switched Android to running on emulators on AWS we pre-seeded the seta data with try runs on specific revisions so when we made the switch our AWS bill didn't spike dramatically.
I don't want to make a change because of one fire drill. This 2 week period has been in place >1 year, I would like to think carefully before getting rid of it. If we did get rid of it, any new job could be perma fail and we would have little to no data points to determine what is going on. Right now in the self serve model any developer can add new jobs (and they do). Sheriffs don't have a clear picture of all possible jobs to expect and if there is a job that is perma-failing or intermittently passing but run once every 5th push (which in practice is skipped often) it is easy to miss the pattern and assume each failure is unique. It would take a few days to get a signal that things are bad- yes we could back out the patch then, so that would be worse case. Usually what will happen is we turn on a job and there is much confusion and randomization as people don't see the new job running. By default it will be every 5th push. I would prefer to pre-seed the tests in SETA than turn off the 2 weeks period.
I see your point there. I don't have a suggestion on how to make devs preseed SETA since it is so hands-off these days. In any case, this is done.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.