Closed Bug 1598333 Opened 5 years ago Closed 4 years ago

Optimise job_details inserts

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jgraham, Assigned: camd)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

(I don't really know if this is related to 1597136 but was asked to file)

https://github.com/mozilla/treeherder/blob/4704764a896417c58e9cc3daf97bcdebb2f33b72/treeherder/etl/artifact.py#L24-L45 looks like it's doing 1 insert per row in the job_details table. That seems suspicious on performance grounds. I also wonder how common it is to reprocess jobs with different artifacts (i.e. it's unclear why this operation is update_or_create). A possible solution would be:

  • Use the bulk_create API to insert the new rows with a single query
  • If that fails due to having duplicates, select the existing rows, remove the entries that already exist and try again (I assume it's not possible to get duplicates with different auxillary data here since creating an artifact is idempotent).
Blocks: 1599095
Assignee: nobody → cdawson
Priority: -- → P1
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: