Remove empty alert summaries
Categories
(Tree Management :: Perfherder, task, P1)
Tracking
(Not tracked)
People
(Reporter: igoldan, Assigned: igoldan)
References
Details
Attachments
(3 files)
The perf data expiring algorithm can remove inactive perf signatures, which cascades to deleting perf alerts.
In turn, there are cases when alert summaries no longer have any perf alerts. This makes them useless & increases the query time without any real benefit.
Thus, we should expire perf alert summaries that are empty.
Note: normally, alert summaries & alerts should be created in a transaction. However, I don't think Perfherder used this approach. We should be careful how we delete summaries, such that we don't remove one of them while it's alerts are being created. A naive approach would be to simply let an alert summary cool down for a day or two & then remove it if it's still empty.
Comment 1•4 years ago
|
||
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
•
|
||
I've tested this on treeherder-stage
& treeherder-prototype
& it worked OK, without any problems. However, I've just realized the PR is a bit risky.
It's too eager to remove expired data. I mean: if the data is stale for more than an hour, it deletes it.
In the long term, if some extra buggy filter clauses are added here (in the highlighted code), we risk deleting all alert summaries older than an hour.
Perf sheriffing would run into chaos for possibly 2 weeks (it would be basically blind & clueless).
I'd rather revert this from master & increase the essential filter condition from created__lt=one_hour_ago
to created__lt=one_year_ago
.
It will make it much more bug tolerant & still expire stale summaries. Perf sheriffs really don't care about perf data older than 1 year.
Sarah, could you revert PR 6786 so I can make it safer?
Assignee | ||
Updated•4 years ago
|
Comment 3•4 years ago
|
||
Comment 4•4 years ago
|
||
(In reply to Ionuț Goldan [:igoldan] from comment #2)
I've tested this on
treeherder-stage
&treeherder-prototype
& it worked OK, without any problems. However, I've just realized the PR is a bit risky.
It's too eager to remove expired data. I mean: if the data is stale for more than an hour, it deletes it.In the long term, if some extra buggy filter clauses are added here (in the highlighted code), we risk deleting all alert summaries older than an hour.
Perf sheriffing would run into chaos for possibly 2 weeks (it would be basically blind & clueless).I'd rather revert this from master & increase the essential filter condition from
created__lt=one_hour_ago
tocreated__lt=one_year_ago
.
It will make it much more bug tolerant & still expire stale summaries. Perf sheriffs really don't care about perf data older than 1 year.Sarah, could you revert PR 6786 so I can make it safer?
That sounds like a good call. What about alert summaries older than 6 months? I've reverted your pr and merged it to master.
Comment 5•4 years ago
|
||
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Sarah Clements [:sclements] from comment #4)
[...] What about alert summaries older than 6 months? I've reverted your pr and merged it to master.
Yes, good idea. I got approval from @fstrugariu & @davehunt to expire summaries older than 6 months.
Assignee | ||
Updated•4 years ago
|
Description
•