IrrelevantDataRemoval may stop early depending on repository order
Categories
(Tree Management :: Perfherder, defect)
Tracking
(Not tracked)
People
(Reporter: myeongjun.ko, Unassigned)
Details
IrrelevantDataRemoval[0] removes performance_datum rows that are more than 6 months old, except for repositories listed in RELEVANT_REPO_NAMES.
However, there is a case where some repositories may be skipped depending on the repository iteration order.
Example:
Target repositories and target row counts:
- mozilla-esr140: 100 rows
- firefox-ios: 0 row
- mozilla-release: 30 rows
- mozilla-esr128: 20 rows
These repositories are processed sequentially. If the strategy encounters a repository with 0 removable rows (i.e, firefox-ios), the cleanup process may stop early, causing the next repositories (mozilla-release, mozilla-ear128) to miss their cleanup opportunity.
Suggestion
Instead of removing data directly in the remove method[1], IrrelevantDataRemoval could follow a retry style approach similar to StalledDataRemoval or TryDataRemoval.
Note
Now, the data does not accumulate permanently. Any skipped data will be removed later by MainRemovalStrategy.
[0]
https://github.com/mozilla/treeherder/blob/a82c683f60df2372e6e6995a44ad5f91b2136f73/treeherder/model/data_cycling/removal_strategies.py#L195
[1]
https://github.com/mozilla/treeherder/blob/a82c683f60df2372e6e6995a44ad5f91b2136f73/treeherder/model/data_cycling/removal_strategies.py#L247-L254
Description
•