Closed
Bug 996751
Opened 12 years ago
Closed 11 years ago
Run the tokenserver purge_old_records script from the tokenserver webheads
Categories
(Cloud Services :: Operations: Miscellaneous, task)
Cloud Services
Operations: Miscellaneous
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Unassigned)
References
Details
(Whiteboard: [qa+])
(Spinning this off from Bug 971907 Comment 37)
The tokenserver provides a "purge_old_records" script that will search for old records in the tokenserver database, delete the corresponding data from the sync node via HTTP DELETE request, then remove the old record from the tokenserver db. This is necessary both for general garbage-reduction purposes, and to meet the timely-deletion-of-user-data requirements from Bug 971907.
Please arrange for it to be run as a background process to ensure that old records are cleaned up in a timely manner. The necessary command is:
./bin/python -m tokenserver.scripts.purge_old_records -v ./path/to/tokenserver/config.ini
It will run in a loop, periodically querying the db and doing the necessary cleanup, and logging to stderr.
Comment 1•12 years ago
|
||
Do you mind if we run this from the tokenserver itself?
Updated•12 years ago
|
Whiteboard: [qa+]
| Reporter | ||
Comment 2•12 years ago
|
||
> Do you mind if we run this from the tokenserver itself?
I don't mind where it runs from. The only trick with running it on the webheads is to prevent several webheads from running it concurrently. This won't *break* anything but it's wasteful.
Comment 3•12 years ago
|
||
This is the process that needs to connect to individual sync nodes to delete data?
| Reporter | ||
Comment 4•12 years ago
|
||
> This is the process that needs to connect to individual sync nodes to delete data?
Correct.
Comment 5•12 years ago
|
||
Not in Stage yet. Is this a prerequisite for bug 996915?
(noting that 1.2.0 is already out in Stage and Prod)
Status: NEW → ASSIGNED
| Reporter | ||
Comment 6•12 years ago
|
||
> Is this a prerequisite for bug 996915?
No, it's not a prereq for any particular deployment, but we do need to figure it out in time for FF29 to satisfy Bug 971907.
Comment 7•11 years ago
|
||
Well, we have less than a week for Fx29 in Beta. Is this a hard blocker or a soft blocker for Fx29?
Comment 8•11 years ago
|
||
The node_manager would be a nice place to put it since it will only be a single server and we don't have to do any coordination between nodes. However, it depends on tokenserver's configuration/secrets. Though it already has a copy of these, it'd be more things to keep in sync.
If we did it on the tokenserver, which is a cluster of machines, we would need some sort of semaphore or synchronization system so only one process is happening at a time. We don't want to go full SWF (simple work flow).
I think an SQS queue would be enough. We could have a wrapper script that receives a message from the queue, runs the script, deletes the old message and then puts a new message on the queue with a timeout for running the job again.
It is a lot more complicated than a cronjob, a lot less complicated than SWF and less technical debt than having duplicate values in the node manager. I think this is what my preferred solution would be.
| Reporter | ||
Comment 9•11 years ago
|
||
> If we did it on the tokenserver, which is a cluster of machines, we would
> need some sort of semaphore or synchronization system so only one process is happening at a time
We could use the database for this, via "SELECT ... FOR UPDATE" or similar.
> I think an SQS queue would be enough. We could have a wrapper script that receives a message
> from the queue, runs the script, deletes the old message and then puts a new message on the
> queue with a timeout for running the job again.
There are still windows for job loss or duplication here, I think a db-based approach may be better. Will have to dig in in more detail to see for sure.
I propose the following:
* We run this on the webheads as a persistent background process, not via cron. The process will
do a run, sleep for a bit, do a run, sleep for a bit, repeat.
* Initially, we use a randomized sleep interval so that webheads are unlikely to be running the
process at the same time. Even if they do happen to overlap by chance, this won't break
anything, it will just result in some useless duplicate work.
* For a future release we move to a more nuanced scheduling system, the details of which are yet
to be determined.
Thoughts?
Comment 10•11 years ago
|
||
:rfkelly, I agree. Having it run on multiple machines as a background process which runs at randomized times is a good solution.
Moving to something later where they coordinate, maybe through the DB since they're already talking to it, and these are infrequent coordination events (daily), sounds good.
| Reporter | ||
Comment 11•11 years ago
|
||
I'm taking an r=me liberty with this simple change:
https://github.com/mozilla-services/tokenserver/commit/240330432396576509b776cc10cf37e42856f76a
| Reporter | ||
Updated•11 years ago
|
Summary: Periodically run the tokenserver purge_old_records script from the node-admin server → Run the tokenserver purge_old_records script from the tokenserver webheads
| Reporter | ||
Comment 12•11 years ago
|
||
OK, here's what we need to do, in order:
* Get 1.2.1 deployed, which we need for db migrations but doesn't include this code change (Bug 996915)
* Update the config to run purge_old_records (https://github.com/mozilla-services/puppet-config/pull/396)
* Do another deployment with this config change in place
| Reporter | ||
Comment 13•11 years ago
|
||
In the interests of getting this out before FF29 release, I've tweaked the upcoming deployments to include it *before* the db migration stuff. So the new plan is:
* Update the config to run purge_old_records (https://github.com/mozilla-services/puppet-config/pull/396)
* Get 1.2.3 deployed (Bug 996915)
* Follow up with a second deployment to complete the db migrations, which we can do after FF29
Comment 14•11 years ago
|
||
Built a new stack in stage with this running. Seems to work looking at the logs.
It would be good if the logs had a time stamp for the entries. Maybe 1.2.4 ;)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 15•11 years ago
|
||
I can verify this once bug 1001305 is Verified.
Comment 16•11 years ago
|
||
Which, of course, is already out there and running in Prod...
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•