Open Bug 1578195 Opened 5 years ago Updated 5 years ago

Not cleaning up old machine data

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: aryx, Unassigned)

Details

It seems old data about machines doesn't get expunged.

select count(*)
from machine

returns 8905675.

select count(*)
from machine
left join job
on machine.id = job.machine_id
where job.id is NULL

returns 2112934

The documentation says cycle_data runs daily. Its filter what to delete looks sane on first inspection: https://github.com/mozilla/treeherder/blob/a5df8a966b1202f3f80872a78f6093ea060cdb77/treeherder/model/management/commands/cycle_data.py#L61-L75

Any idea why there are 2 million machine names without jobs associated?

Component: Treeherder → Treeherder: Data Ingestion
Summary: old machine data not expunged? → Not cleaning up old machine data

This would be good to get to. In fact, not sure it's still valueable to even KEEP machine data. Aren't they all virtual these days? We should probably just stop saving them. We don't download them with the jobs in the Treeherder view. So unless they're fetched by somebody (which I doubt) then let's kill the table completely.

Priority: -- → P3

Don't. We have several scripts on sql.telemetry.mozilla.org which aggregate machine health and rely on that.

You need to log in before you can comment on or make changes to this bug.