Prepare credentials for backfill bot
Categories
(Tree Management :: Perfherder, task, P3)
Tracking
(Not tracked)
People
(Reporter: igoldan, Assigned: mtabara)
References
Details
Attachments
(1 file, 1 obsolete file)
The bot we plan to define in bug 1612547 will definitely require Taskcluster
credentials, otherwise it won't be able to request any backfills for our perf jobs.
At the moment, we're able to simulate this ability using the clientId
& access token
from our own Taskcluster
accounts. But we agree this is not ideal.
We need to create dedicated credentials for the bot. Currently, I'm not sure what's the best approach out of these two:
- should we require one of our current
Taskcluster
accounts to be granted the ability to create client credentials? (I'm referring to accounts with Perf sheriff-specific scopes) - or should we require a standalone client credential (via a Bugzilla bug maybe)?
Reporter | ||
Comment 1•4 years ago
|
||
Dustin, could I grab an answer from you on this?
Comment 2•4 years ago
|
||
This is a better question for release engineering. I suspect your first approach is closer to the desired process.
Comment 4•4 years ago
|
||
(In reply to Ionuț Goldan [:igoldan] from comment #3)
Or maybe Jordan could also?
Ionuț Goldan: could you provide some context into what you are currently doing vs what you tend to do with the bot? What will automatically be rerun? I'd like to get a sense of the net change in CI load this bot would bring. Tom can help you with creds/clients
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 5•4 years ago
|
||
Greg, once you sync with Jordan, you can reassign this to me.
Reporter | ||
Comment 7•4 years ago
•
|
||
The automatic backfill algorithm is as follows.
For every performance alert summary (i.e.), we cherry pick a max of 5 alerts. (Even if that summary has 50 of them!)
Then, we generate a backfill record for each of the cherry picked alerts.
A backfill record basically stores a max of 5 perf jobs, which correspond to the suspect time range for when a perf change happened (regression or improvement).
These jobs are used as guidelines for performing the actual backfill. We'll backfill between them.
E.g. for 5 jobs, we'll do 4 backfills. For 3 jobs, we'll do 2 backfills.
Moving on, I'm providing some rough cost estimations, in terms of tasks triggered by the automated backfills.
last week had 200 records = 200
last 2 weeks had 340 records = 170 (per week)
last month had 660 records = 165 (per week)
last quarter had 1,500 records = 125 (per week)
average = 165 (per week)
each record would equivalate to ~4 backfills
each backfill would equivalate to 1 decision task + max 9 build tasks + 9 perf tasks
During a week, that would mean on average:
WEEKLY_TASK_AVERAGE = 165 records * 4 backfills * (1 decision task + 9 build tasks + 9 perf tasks)
Ignoring the decision tasks, as they are quick on exec time, gives us:
WEEKLY_TASK_AVERAGE = 660 * 9 build tasks + 660 * 9 perf tasks
= 6,120 build tasks + 6,120 perf tasks
each build tasks takes ~30 minutes to finish (but I believe Jordan or someone from the build team have a better estimation for this)
each perf tasks takes ~20 minutes to finish
Comment 8•4 years ago
|
||
currently builds are guaranteed to exist, if they do not exist it is either an intermittent issue, or they are broken - so your total time incurred seems great.
Can you cross reference this with actual work done by perf sheriffs in reality? I assume humans might be more efficient (which is ok), but it is good to know what we see for number of jobs.
questions:
- Is this for a specific platform?
- Can you explain what a backfill is?
- Can you explain why the need for ~4 backfills?
- are there no retriggers as part of this?
I only ask so I can help provide context to all in concrete terms without ambiguity.
Reporter | ||
Comment 9•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #8)
[...]
questions:
- Is this for a specific platform?
My cost estimations weren't platform specific. But this is a good point. For 2020/Q1, we plan to backfill only on Linux.
Gut feeling: this would further reduce the above costs by ~50%.
- Can you explain what a backfill is?
It's an AC(Bk) task.
- Can you explain why the need for ~4 backfills?
That's the general suspect range perf sheriffs tend to backfill/retrigger.
- are there no retriggers as part of this?
Nope. For 2020/Q1 we're only planning to add automated backfilling.
[...]
Comment 10•4 years ago
|
||
thanks for the update. To clarify, for a given alert on linux, backfill bot would look 2 revisions forward in history and 2 revisions backwards in history and fill in the holes to have a 40 revision window of complete data.
On linux, I count 120 jobs run for talos/raptor (I assume that is the scope of this, please correct me if not) on autoland- these run every 10th push. Simple math dictates that if our volume is 700 commits/week then:
max load = 120*700 = 84000
as we currently run every 10th push (but on slow times more frequently), I will say default load is every 9th push:
default load = 120*700/9 = 9333
doing a rough query, I see 1430 backfill jobs in total on autoland for January and February- I assume some are for unittests as well and this would be for all platforms; this helps validate your math.
Comment 11•4 years ago
|
||
Can you cross reference this with actual work done by perf sheriffs in reality? I assume humans might be more efficient (which is ok), but it is good to know what we see for number of jobs.
This is the main thing I am interested. what would be the delta change from sheriffing manual work vs what this bot would do. At any rate, if this bot has dials that can be more and less aggressive at backfilling, I don't want to block now and we can proceed. Feel free to reach out to Tom
Reporter | ||
Comment 12•4 years ago
•
|
||
We're still working on the query script that shows the actual figures.
Currently, the perf sheriffs as a whole would roughly perform 3,000 build tasks + 15,000 perf tasks every week.
The delta (the bot will perform) should be: 3,000 build tasks + 3,000 perf tasks every week.
The other 12,000 perf tasks will retrigered manually by the perf sheriffs.
We do have dials that can be more and less aggressive at backfilling.
We've expressed them in harcoded limits. If the bot exceeds a limit, it won't backfill anymore.
Comment 13•4 years ago
|
||
I mentioned in comment 8 that builds are going to exist, can you outline where builds won't exist?
Reporter | ||
Comment 14•4 years ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13)
I mentioned in comment 8 that builds are going to exist, can you outline where builds won't exist?
Oh, I think I didn't read that properly. Cool! That means the bot's delta will only perform 3,000 perf tasks.
Comment 15•4 years ago
|
||
(In reply to Ionuț Goldan [:igoldan] from comment #12)
We're still working on the query script that shows the actual figures.
Currently, the perf sheriffs as a whole would roughly perform 3,000 build tasks + 15,000 perf tasks every week.
The delta (the bot will perform) should be: 3,000 build tasks + 3,000 perf tasks every week.
The other 12,000 perf tasks will retrigered manually by the perf sheriffs.We do have dials that can be more and less aggressive at backfilling.
We've expressed them in harcoded limits. If the bot exceeds a limit, it won't backfill anymore.
this to me reads that we won't be adding much additional load. merely offloading manual work that sheriffs do. For the most part. If I have that right, works for me. Feel free to get creds off Tom
Reporter | ||
Comment 16•4 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #15)
[...]
this to me reads that we won't be adding much additional load. merely offloading manual work that sheriffs do. For the most part. If I have that right, works for me. Feel free to get creds off Tom
That's entirely correct.
Reporter | ||
Comment 17•4 years ago
|
||
Tom, could you help me setup the credentials? Out of the 2 options highlighted in comment 0, could we go with the 1st option, as Dustin hinted?
That is:
require one of our current
Taskcluster
accounts to be granted the ability to create client credentials? (I'm referring to accounts with Perf sheriff-specific scopes)
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 18•4 years ago
|
||
For testing, individual credentials can be used. Once this is ready to be deployed, I can create the appropriate clients.
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Comment 19•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
Updated•3 years ago
|
Comment 21•3 years ago
|
||
Pushed by mtabara@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/63beab494973 add Treeherder Sheriffs bot client. r=releng-reviewers,jmaher
Assignee | ||
Comment 22•3 years ago
|
||
The bot landed and the credentials have been moved into Heroku env vars. I think we're done here, please reopen if I'm wrong.
Assignee | ||
Updated•3 years ago
|
Description
•