Closed Bug 1171046 Opened 9 years ago Closed 9 years ago

Use a single transaction per channel when aggregating.

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: rvitillo)

Details

The v4 aggregation job uses multiple separate connections to speed up upserts. To avoid to rollback to a backup in case of failure though, we should parallelize on channels, i.e. have a single transaction for each channel, rather than on the individual upserts.
Priority: -- → P2
As the db is partitioned in subtables by (channel, build-id), I decided to use a transaction per subtable. There is a rather long tail when updating recent build-ids but, as the upsert performance degraded only by about 2x, I consider it still acceptable considering the gains. By combining multiple channels we should be able to allievate the long tail effect.

I am also keeping track of the dates in which a subtable was updated, which effectively renders the aggregation updates idempotent.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.