996751 - Run the tokenserver purge_old_records script from the tokenserver webheads

Reporter

Description

•

12 years ago

(Spinning this off from Bug 971907 Comment 37) The tokenserver provides a "purge_old_records" script that will search for old records in the tokenserver database, delete the corresponding data from the sync node via HTTP DELETE request, then remove the old record from the tokenserver db. This is necessary both for general garbage-reduction purposes, and to meet the timely-deletion-of-user-data requirements from Bug 971907. Please arrange for it to be run as a background process to ensure that old records are cleaned up in a timely manner. The necessary command is: ./bin/python -m tokenserver.scripts.purge_old_records -v ./path/to/tokenserver/config.ini It will run in a loop, periodically querying the db and doing the necessary cleanup, and logging to stderr.

Gene Wood [:gene]

Comment 1

•

12 years ago

Do you mind if we run this from the tokenserver itself?

James Bonacci [:jbonacci]

Updated

•

12 years ago

Whiteboard: [qa+]

Ryan Kelly [:rfkelly]

Reporter

Comment 2

•

12 years ago

> Do you mind if we run this from the tokenserver itself? I don't mind where it runs from. The only trick with running it on the webheads is to prevent several webheads from running it concurrently. This won't *break* anything but it's wasteful.

Benson Wong [:mostlygeek]

Comment 3

•

12 years ago

This is the process that needs to connect to individual sync nodes to delete data?

Ryan Kelly [:rfkelly]

Reporter

Comment 4

•

12 years ago

> This is the process that needs to connect to individual sync nodes to delete data? Correct.

James Bonacci [:jbonacci]

Comment 5

•

12 years ago

Not in Stage yet. Is this a prerequisite for bug 996915? (noting that 1.2.0 is already out in Stage and Prod)

Status: NEW → ASSIGNED

Ryan Kelly [:rfkelly]

Reporter

Comment 6

•

12 years ago

> Is this a prerequisite for bug 996915? No, it's not a prereq for any particular deployment, but we do need to figure it out in time for FF29 to satisfy Bug 971907.

James Bonacci [:jbonacci]

Comment 7

•

11 years ago

Well, we have less than a week for Fx29 in Beta. Is this a hard blocker or a soft blocker for Fx29?

Benson Wong [:mostlygeek]

Comment 8

•

11 years ago

The node_manager would be a nice place to put it since it will only be a single server and we don't have to do any coordination between nodes. However, it depends on tokenserver's configuration/secrets. Though it already has a copy of these, it'd be more things to keep in sync. If we did it on the tokenserver, which is a cluster of machines, we would need some sort of semaphore or synchronization system so only one process is happening at a time. We don't want to go full SWF (simple work flow). I think an SQS queue would be enough. We could have a wrapper script that receives a message from the queue, runs the script, deletes the old message and then puts a new message on the queue with a timeout for running the job again. It is a lot more complicated than a cronjob, a lot less complicated than SWF and less technical debt than having duplicate values in the node manager. I think this is what my preferred solution would be.

Ryan Kelly [:rfkelly]

Reporter

Comment 9

•

11 years ago

> If we did it on the tokenserver, which is a cluster of machines, we would > need some sort of semaphore or synchronization system so only one process is happening at a time We could use the database for this, via "SELECT ... FOR UPDATE" or similar. > I think an SQS queue would be enough. We could have a wrapper script that receives a message > from the queue, runs the script, deletes the old message and then puts a new message on the > queue with a timeout for running the job again. There are still windows for job loss or duplication here, I think a db-based approach may be better. Will have to dig in in more detail to see for sure. I propose the following: * We run this on the webheads as a persistent background process, not via cron. The process will do a run, sleep for a bit, do a run, sleep for a bit, repeat. * Initially, we use a randomized sleep interval so that webheads are unlikely to be running the process at the same time. Even if they do happen to overlap by chance, this won't break anything, it will just result in some useless duplicate work. * For a future release we move to a more nuanced scheduling system, the details of which are yet to be determined. Thoughts?

Gene Wood [:gene]

Comment 10

•

11 years ago

:rfkelly, I agree. Having it run on multiple machines as a background process which runs at randomized times is a good solution. Moving to something later where they coordinate, maybe through the DB since they're already talking to it, and these are infrequent coordination events (daily), sounds good.

Ryan Kelly [:rfkelly]

Reporter

Comment 11

•

11 years ago

I'm taking an r=me liberty with this simple change: https://github.com/mozilla-services/tokenserver/commit/240330432396576509b776cc10cf37e42856f76a

Ryan Kelly [:rfkelly]

Reporter

Updated

•

11 years ago

Summary: Periodically run the tokenserver purge_old_records script from the node-admin server → Run the tokenserver purge_old_records script from the tokenserver webheads

Ryan Kelly [:rfkelly]

Reporter

Comment 12

•

11 years ago

OK, here's what we need to do, in order: * Get 1.2.1 deployed, which we need for db migrations but doesn't include this code change (Bug 996915) * Update the config to run purge_old_records (https://github.com/mozilla-services/puppet-config/pull/396) * Do another deployment with this config change in place

Ryan Kelly [:rfkelly]

Reporter

Comment 13

•

11 years ago

In the interests of getting this out before FF29 release, I've tweaked the upcoming deployments to include it *before* the db migration stuff. So the new plan is: * Update the config to run purge_old_records (https://github.com/mozilla-services/puppet-config/pull/396) * Get 1.2.3 deployed (Bug 996915) * Follow up with a second deployment to complete the db migrations, which we can do after FF29

Benson Wong [:mostlygeek]

Comment 14

•

11 years ago

Built a new stack in stage with this running. Seems to work looking at the logs. It would be good if the logs had a time stamp for the entries. Maybe 1.2.4 ;)

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

James Bonacci [:jbonacci]

Comment 15

•

11 years ago

I can verify this once bug 1001305 is Verified.

James Bonacci [:jbonacci]

Comment 16

•

11 years ago

Which, of course, is already out there and running in Prod...

Status: RESOLVED → VERIFIED

Bugzilla

Run the tokenserver purge_old_records script from the tokenserver webheads

Categories

(Cloud Services :: Operations: Miscellaneous, task)

Tracking

(Not tracked)

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa+])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16