Use a single transaction per channel when aggregating.

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: rvitillo, Assigned: rvitillo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

The v4 aggregation job uses multiple separate connections to speed up upserts. To avoid to rollback to a backup in case of failure though, we should parallelize on channels, i.e. have a single transaction for each channel, rather than on the individual upserts.

Updated

3 years ago
Priority: -- → P2
(Assignee)

Comment 1

3 years ago
As the db is partitioned in subtables by (channel, build-id), I decided to use a transaction per subtable. There is a rather long tail when updating recent build-ids but, as the upsert performance degraded only by about 2x, I consider it still acceptable considering the gains. By combining multiple channels we should be able to allievate the long tail effect.

I am also keeping track of the dates in which a subtable was updated, which effectively renders the aggregation updates idempotent.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.