Closed Bug 985794 Opened 11 years ago Closed 2 years ago

More robust handling of broken storage nodes in tokenserver cleanup script

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa+])

This is a follow-up to Bug 984297, which added a script for purging old records from the tokenserver database. That script directly uses the tokenserver database as its work queue, so if there's a problem with deleting the data for any row, all subsequent rows are blocked until it can be resolved. We should try to be a little more robust here. One suggestion (Bug 984297 Comment 8) is to adjust the replaced_at timestamp on failed records to simulate "moving it to the back of the queue". A more general approach might be to use an explicit workflow solution based on AWS SWF.
Whiteboard: [qa+]
Priority: -- → P2
Assignee: nobody → rfkelly
I'm realistically not going to be able to work on this bug, relinquishing ownership.
Assignee: rfkelly → nobody
Priority: P2 → --
This script (tokenserver/scripts/purge_old_records.py) is an important part of our end-to-end data handling story, as it's the thing that ultimately deletes a user's abandoned sync data from the storage nodes. What sort of monitoring do we have for it in production to ensure that it's working as intended?
Flags: needinfo?(jrgm)

This is an old bug, but it still seems important; +:pjenvey for consideration.

Flags: needinfo?(jrgm)
Component: Server: Token → Server: Sync

Transferring to internal issue tracking and closing this.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.