Closed Bug 1124723 Opened 9 years ago Closed 9 years ago

Add the performance_artifact table to the list of data cycling targets

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdoglio, Assigned: wlach)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

At the moment there's no data cycling setup for those.
Blocks: 1120019
Blocks: 1078392
Priority: -- → P1
Assignee: nobody → wlachance
Working on this.
Attachment #8553353 - Flags: review?(mdoglio) → review+
The patch looks good, it just needs to be rebased on top of master. The github ui says there are some conflicts.
Rebased and pushed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I think this will break as-is - performance_artifact has a foreign key constraint on jobs.id:
https://github.com/mozilla/treeherder-service/blob/c1713cab92e353c12a0288be6b16dc2de141bebe/treeherder/model/sql/template_schema/project_jobs_1.sql.tmpl#L278

But the change here put the jobs.deletes.cycle_performance_artifact after the jobs delete:
https://github.com/mozilla/treeherder-service/blob/93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.py#L162
(See order dependency comment a few lines above that)

Also I don't see any handling for the performance_series and series_signature tables?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 1120019
(In reply to Ed Morley [:edmorley] from comment #6)
> I think this will break as-is - performance_artifact has a foreign key
> constraint on jobs.id:
> https://github.com/mozilla/treeherder-service/blob/
> c1713cab92e353c12a0288be6b16dc2de141bebe/treeherder/model/sql/
> template_schema/project_jobs_1.sql.tmpl#L278
> 
> But the change here put the jobs.deletes.cycle_performance_artifact after
> the jobs delete:
> https://github.com/mozilla/treeherder-service/blob/
> 93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.
> py#L162
> (See order dependency comment a few lines above that)

Hmm, this worked locally for me? In any case, can we fix this just by shuffling the performance artifact cycling above the jobs delete?

> Also I don't see any handling for the performance_series and
> series_signature tables?

Correct; we should do a followup for those. That said, the performance artifacts are taking up the vast majority of space.
(In reply to William Lachance (:wlach) from comment #7)
> Hmm, this worked locally for me? In any case, can we fix this just by
> shuffling the performance artifact cycling above the jobs delete?

Indeed :-)

> > Also I don't see any handling for the performance_series and
> > series_signature tables?
> 
> Correct; we should do a followup for those. That said, the performance
> artifacts are taking up the vast majority of space.

My concern was more that once we purge performance_artifact we lose easier ways to rescue orphans (eg job_id < N) and have to resort to checking if every signature in the other tables is present in performance_artifact.
Though tbh we're going to have to mess around already with orphan rows, once the various issues are fixed, so it can probably be fixed in another bug.
(In reply to Ed Morley [:edmorley] from comment #8)
> (In reply to William Lachance (:wlach) from comment #7)
> > Hmm, this worked locally for me? In any case, can we fix this just by
> > shuffling the performance artifact cycling above the jobs delete?
> 
> Indeed :-)

Ok, I'll look into this.

> > > Also I don't see any handling for the performance_series and
> > > series_signature tables?
> > 
> > Correct; we should do a followup for those. That said, the performance
> > artifacts are taking up the vast majority of space.
> 
> My concern was more that once we purge performance_artifact we lose easier
> ways to rescue orphans (eg job_id < N) and have to resort to checking if
> every signature in the other tables is present in performance_artifact.

Hmm, I don't think perfomance artifacts are really all that helpful in determining when series signatures and the series themselves should be expired. I think we're just going to plain have to dig into the contents of these tables to determine whether the data is still valid or not. In the case of the series themselves, we should expire tables which are empty. In the case of series signatures, we should expire entries that no longer have a series associated with them.
Ok, let's break them out :-)
I've filed bug 1126937 for performance_series and series_signature tables.

Morphing summary to what landed here (I only reopened because the summary and comment 0 implied perfherder tables in general).
Summary: Add the perfherder tables to the list of data cycling targets → Add the performance_artifact table to the list of data cycling targets
Attachment #8556005 - Flags: review?(emorley)
Comment on attachment 8556005 [details] [review]
Cycle performance artifacts before jobs

Thanks :-)

(btw I think maybe the reason this didn't break is that InnoDB does cascade deletes by default? I could be wrong, just skimmed some google foo. Either way I guess we want to avoid the cascade deletes for perf reasons)
Attachment #8556005 - Flags: review?(emorley) → review+
Gonna resolve this for now, feel free to reopen if there are more issues. :)
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: