Closed Bug 664658 Opened 13 years ago Closed 12 years ago

tool to wipe all memcache data for given user(s)

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: Atoll, Assigned: telliott)

Details

(Whiteboard: [qa+])

Attachments

(2 files, 1 obsolete file)

As part of solving the HMAC mismatch bug, it turns out that we're not wiping the users data in memcache.  So let's do that.  Need a tool that can wipe out the data in memcache for a given list of userids on STDIN.  Userids can be in any format, just let us know.  (Username, numeric userid, ldap dn, whatever.)

This is also known as the "HMAC mismatch bug prevention tool", decide ETA accordingly.
Background explanation: an HMAC mismatch occurs when a client downloads a record which was encrypted with a different key than the one they are holding for that collection.

In this case, it's possible for the following to happen:

* Client A uploads key K1.
* Client A uploads tab record K1(R1).
* Server is partially wiped: DB-stored data (K1) is eliminated.
* Client B generates and uploads new key K2.
* Client B downloads tab record K1(R1).
* HMAC mismatch.

Builds after Bug 650208 are no longer susceptible to this bug, because when a new key is uploaded, all server data is deleted. However, this won't land for another 3 or 4 months (Firefox 6), so it would be valuable to avoid the situation. The most direct way to do that is to kill the contents of memcache for a migrated user.

(A workaround for tab HMAC problems is to temporarily disable, sync, then re-enable tab sync.)
to clarify - "server is partially wiped" meaning that we do the wiping, due to a disk problem or migration?
yes
(oops.)

Yes, that was the known cause.  There could be other ways to trigger that scenario, but this tool was requested specifically to address the one scenario where we know we're causing the problem.
Group: services-infra
Assignee: nobody → telliott
cat data | memcache_clean.py server1 server2 server3 server4

Will output something like:

Cleaning id: 1 (OK)
Cleaning id: 2 (OK)
Cleaning id: 3 (OK)
...


It's an ops script, so I don't think it belongs in any of our hg repos. Just wherever you have the rest of this chain.
Attachment #539934 - Flags: review?(rsoderberg)
Attachment #539934 - Flags: review?(rsoderberg) → review+
Curious if attachment 539934 [details] works for python and php, or just python.
It doesn't actually care about the data, so unless the keys are encoded differently (which would shock me) or the hashing algo to determine a server is different (which wouldn't), should work for either.
Linkage: Bug 646269.

atoll, if this is r+, how close is this to RESOLVED FIXED?
OS: Mac OS X → All
Hardware: x86 → All
(In reply to comment #8)
> Linkage: Bug 646269.
> 
> atoll, if this is r+, how close is this to RESOLVED FIXED?

This bug appears to live outside of process (no hg, no developer r?, no qa), so there's no clear answer to your question.

I would expect no sooner than Python sync is deployed to production.
(In reply to comment #7)
> It doesn't actually care about the data, so unless the keys are encoded
> differently (which would shock me) or the hashing algo to determine a server
> is different (which wouldn't), should work for either.

"just python", then. Thanks!
(In reply to comment #9)

> This bug appears to live outside of process (no hg, no developer r?, no qa),
> so there's no clear answer to your question.

I assume you're going to take it and put it in the same location as the rest of the script of which it's part of the chain. This is really just an ops script that happened to be written by a developer, so I'd put it through the same process you put the rest of that script.
Maybe https://hg.mozilla.org/services/admin-scripts/ could be the place were we collect all those scripts. 

In the future, it could be packaged and contain more python goodies to deal with the various servers, use core etc..
Comment on attachment 539934 [details]
Script to clean out a user's memcache. Designed to work in the ops node purge chain

since we did not retain the memcached python/php compatibility, the keys are different in python.

The keys to wipe are:

"UID:tabs"
"UID:meta:global"
"UID:size"
"UID:collections:stamps:NAME"
"UID:stamps"

with UID = user id, NAME = collection name
scratch this one (typo) "UID:collections:stamps:NAME"
Attaching one with the new names. We should run this against an actual python install, though I don't know how we'd pick up ones that we missed.
Attachment #542199 - Attachment is obsolete: true
Attachment #542204 - Flags: review+
While testing to see if it can handle uidnumbers, I determined that it cannot:

[root@wp-web01.phx.weave petef]# echo 2222 | ./memcache_dump.py
Traceback (most recent call last):
  File "./memcache_dump.py", line 21, in <module>
    print "%s:%s\t%s" % (username, key, memc.get("%s:%s" % (id, key)))
  File "/usr/lib/python2.6/site-packages/memcache.py", line 793, in get
    return self._get('get', key)
  File "/usr/lib/python2.6/site-packages/memcache.py", line 762, in _get
    server, key = self._get_server(key)
  File "/usr/lib/python2.6/site-packages/memcache.py", line 296, in _get_server
    server = self.buckets[serverhash % len(self.buckets)]
ZeroDivisionError: integer division or modulo by zero
above comment in wrong bug, sorry
Script in use and seems happy.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
lives in sysadmins/svc/scripts/ now, which only ops can see.
Whiteboard: [qa+]
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: