Open Bug 2008333 Opened 3 days ago

IrrelevantDataRemoval may stop early depending on repository order

Categories

(Tree Management :: Perfherder, defect)

defect

Tracking

(Not tracked)

People

(Reporter: myeongjun.ko, Unassigned)

Details

IrrelevantDataRemoval[0] removes performance_datum rows that are more than 6 months old, except for repositories listed in RELEVANT_REPO_NAMES.
However, there is a case where some repositories may be skipped depending on the repository iteration order.

Example:

Target repositories and target row counts:

  • mozilla-esr140: 100 rows
  • firefox-ios: 0 row
  • mozilla-release: 30 rows
  • mozilla-esr128: 20 rows

These repositories are processed sequentially. If the strategy encounters a repository with 0 removable rows (i.e, firefox-ios), the cleanup process may stop early, causing the next repositories (mozilla-release, mozilla-ear128) to miss their cleanup opportunity.

Suggestion

Instead of removing data directly in the remove method[1], IrrelevantDataRemoval could follow a retry style approach similar to StalledDataRemoval or TryDataRemoval.

Note

Now, the data does not accumulate permanently. Any skipped data will be removed later by MainRemovalStrategy.

[0]
https://github.com/mozilla/treeherder/blob/a82c683f60df2372e6e6995a44ad5f91b2136f73/treeherder/model/data_cycling/removal_strategies.py#L195
[1]
https://github.com/mozilla/treeherder/blob/a82c683f60df2372e6e6995a44ad5f91b2136f73/treeherder/model/data_cycling/removal_strategies.py#L247-L254

You need to log in before you can comment on or make changes to this bug.