Add mach try command to rerun performance tests from a specific alert
Categories
(Testing :: Performance, enhancement, P1)
Tracking
(firefox118 fixed)
Tracking | Status | |
---|---|---|
firefox118 | --- | fixed |
People
(Reporter: kimberlythegeek, Assigned: myeongjun.ko)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [fxp])
Attachments
(1 file)
Add a ./mach try command that will re-run the failing tests from a performance alert. Something like mach try perfalert 88348
or just mach try 88348
where 88348 is the bug number for that alert, or instead use the alert number from Perfherder
I guess the difference here would be that with the alert number, I would assume tests would re-run as soon as an alert is generated. If using a bug number, the tests would re-run when a bug is filed for that alert.
This came up in a brainstorming session, further input is welcome
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
I think there are good ideas here. I would like clarification.
-
./mach try perfalert https://treeherder.mozilla.org/perf.html#/alerts?id=27035
- this would push to try server, run the builds, and then run the tests; the comment mentions automatically rerunning tests as soon as the alert is generated and I would like clarification if this is designed as a developer tool for reproducing tests, or for automation? -
./mach try perfalert https://bugzilla.mozilla.org/show_bug.cgi?id=1656546
, if this is a bug, it would be nice to run the tests necessary- I am not sure how to get the exact set of tests to run. Would the intention here be to query the perfherder database for all alert summaries associated with the referenced bug and run the tests?
In order to make this work, we will need a clear mapping of alert name
-> scheduled test name
. This would be more straightforward for AWSY and Raptor, Talos would need a mapping file.
I guess the question is what specific problem/use case are you trying to solve with this, and ensure that the final technical solution meets that.
Overall, I assume this would take a few days to hack together. I would want to make sure it could work easily for:
- bisecting a failure
- testing a new fix
- running on a specific platform (if 200 tests need to run, but you can reproduce it on windows, just run tests there only)
Reporter | ||
Comment 3•4 years ago
|
||
Good questions, Joel. I think #1 would be a better implementation, with the use case of a developer manually re-running tests
Comment 4•4 years ago
|
||
I see this being valuable for confirming/bisecting a regression and for testing a potential fix. By referencing the alert summary and automatically selecting the jobs containing the affected tests, the developer doesn't need to understand the different harnesses or tests. I feel that this could significantly improve the user experience for handling regressions.
Joel brings up a good point on filtering, which could be a valuable advanced feature. If we're concerned about running a large number of tests we could build in a limit, or default to only running jobs for alerts that have been marked as 'important'.
Comment 5•4 years ago
|
||
I am concerned about volume of tests as many people do --rebuild 20, this tool should rebuild the correct amount.
Comment 7•3 years ago
|
||
a redash query to generate this given a summary id:
select
distinct jt.name
from
performance_signature ps,
performance_datum pd,
performance_alert_summary pas,
performance_alert pa,
job j,
job_type jt
where
ps.id=pa.series_signature_id and
ps.repository_id=pd.repository_id and
ps.id=pd.signature_id and
pd.job_id=j.id and
pas.id=31099 and
(pa.summary_id=pas.id or pa.related_summary_id=pas.id) and
pd.push_id=pas.push_id and
j.job_type_id=jt.id
theoretically we just need to hook this up to an API, maybe https://treeherder.mozilla.org/api/performance/alertsummary/?id=31099 could be extended to https://treeherder.mozilla.org/api/performance/alertsummary/?id=31099&format=try_tasks
or instead we could do:
https://treeherder.mozilla.org/api/performance/alertsummary_tasks/?id=31099
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 8•1 years ago
|
||
I doubt that one wants to push to try by the summary alert id. There's a lot of reassignments, and what is an initial alert summary with a couple of items, could end up having a dozen more items reassigned to it. And usually developers use profiles and code to fix regressions, and end up needing more control of the try pushes than just passing the summary alert id.
BUT this would be useful to be implemented at alert item level and would benefit to have the task id associated with the alert item. This would actually be a step forward to filling the data rupture between alerts view and job view, and would ultimately help us having the triggered side-by-side task id directly in alerts view.
Comment 9•1 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #8)
I doubt that one wants to push to try by the summary alert id. There's a lot of reassignments, and what is an initial alert summary with a couple of items, could end up having a dozen more items reassigned to it. And usually developers use profiles and code to fix regressions, and end up needing more control of the try pushes than just passing the summary alert id.
I believe this came out of developer surveys. I don't think this needs to be perfect to start, and we can improve it with other work around alert reassignments (or maybe we can make sure the API endpoint handles this?). We'll also be implementing a --alert
option in mach try perf to do this, so they'll still be able to do everything else. I just realized there haven't been any comments about this so the command will look like this: ./mach try perf --alert <ALERT-ID>
Comment 10•1 years ago
|
||
I remember this coming up as early as 2019 when we were brainstorming improvements for performance testing. The idea was that an engineer identified as the author of a regression patch could easily confirm and iterate on fixing the issue without needing to work out what tasks to run in CI.
It's worth keeping in mind that the alert/summary distinction is not one that's exposed in the Perfherder UI, so for most users, the summary is the alert. I don't think we want to complicate things by exposing this implementation detail. I would include any reassigned alerts, and even though we have some big alert summaries, I imagine this is mostly due to the many metrics we have and the number of actual tasks will be relatively few.
I also don't think we need to worry too much about volume of tests now that we're storing replicates, but perhaps we could warn users trying to run with --rebuild
as it will take a while to break this habit. Rather than interrupt their flow, we could log a clear warning and reduce any --rebuild
value to a hardcoded maximum, optionally with a command line override.
I'm very excited about this. Being able to say in comment 0 that you can just run ./mach try perf --alert XXXXXX
to reproduce the regression is a huge win for the user experience!
Updated•1 years ago
|
Updated•1 years ago
|
Assignee | ||
Comment 11•1 years ago
|
||
Comment 13•1 year ago
|
||
Comment 14•1 year ago
|
||
bugherder |
Description
•