As part of solving the HMAC mismatch bug, it turns out that we're not wiping the users data in memcache. So let's do that. Need a tool that can wipe out the data in memcache for a given list of userids on STDIN. Userids can be in any format, just let us know. (Username, numeric userid, ldap dn, whatever.) This is also known as the "HMAC mismatch bug prevention tool", decide ETA accordingly.
Background explanation: an HMAC mismatch occurs when a client downloads a record which was encrypted with a different key than the one they are holding for that collection. In this case, it's possible for the following to happen: * Client A uploads key K1. * Client A uploads tab record K1(R1). * Server is partially wiped: DB-stored data (K1) is eliminated. * Client B generates and uploads new key K2. * Client B downloads tab record K1(R1). * HMAC mismatch. Builds after Bug 650208 are no longer susceptible to this bug, because when a new key is uploaded, all server data is deleted. However, this won't land for another 3 or 4 months (Firefox 6), so it would be valuable to avoid the situation. The most direct way to do that is to kill the contents of memcache for a migrated user. (A workaround for tab HMAC problems is to temporarily disable, sync, then re-enable tab sync.)
to clarify - "server is partially wiped" meaning that we do the wiping, due to a disk problem or migration?
(oops.) Yes, that was the known cause. There could be other ways to trigger that scenario, but this tool was requested specifically to address the one scenario where we know we're causing the problem.
Created attachment 539934 [details] Script to clean out a user's memcache. Designed to work in the ops node purge chain cat data | memcache_clean.py server1 server2 server3 server4 Will output something like: Cleaning id: 1 (OK) Cleaning id: 2 (OK) Cleaning id: 3 (OK) ... It's an ops script, so I don't think it belongs in any of our hg repos. Just wherever you have the rest of this chain.
It doesn't actually care about the data, so unless the keys are encoded differently (which would shock me) or the hashing algo to determine a server is different (which wouldn't), should work for either.
Linkage: Bug 646269. atoll, if this is r+, how close is this to RESOLVED FIXED?
(In reply to comment #8) > Linkage: Bug 646269. > > atoll, if this is r+, how close is this to RESOLVED FIXED? This bug appears to live outside of process (no hg, no developer r?, no qa), so there's no clear answer to your question. I would expect no sooner than Python sync is deployed to production.
(In reply to comment #7) > It doesn't actually care about the data, so unless the keys are encoded > differently (which would shock me) or the hashing algo to determine a server > is different (which wouldn't), should work for either. "just python", then. Thanks!
(In reply to comment #9) > This bug appears to live outside of process (no hg, no developer r?, no qa), > so there's no clear answer to your question. I assume you're going to take it and put it in the same location as the rest of the script of which it's part of the chain. This is really just an ops script that happened to be written by a developer, so I'd put it through the same process you put the rest of that script.
Maybe https://hg.mozilla.org/services/admin-scripts/ could be the place were we collect all those scripts. In the future, it could be packaged and contain more python goodies to deal with the various servers, use core etc..
Comment on attachment 539934 [details] Script to clean out a user's memcache. Designed to work in the ops node purge chain since we did not retain the memcached python/php compatibility, the keys are different in python. The keys to wipe are: "UID:tabs" "UID:meta:global" "UID:size" "UID:collections:stamps:NAME" "UID:stamps" with UID = user id, NAME = collection name
scratch this one (typo) "UID:collections:stamps:NAME"
Created attachment 542199 [details] revised version with python memcache names Attaching one with the new names. We should run this against an actual python install, though I don't know how we'd pick up ones that we missed.
Created attachment 542204 [details] revised version with python memcache names
While testing to see if it can handle uidnumbers, I determined that it cannot: [firstname.lastname@example.org petef]# echo 2222 | ./memcache_dump.py Traceback (most recent call last): File "./memcache_dump.py", line 21, in <module> print "%s:%s\t%s" % (username, key, memc.get("%s:%s" % (id, key))) File "/usr/lib/python2.6/site-packages/memcache.py", line 793, in get return self._get('get', key) File "/usr/lib/python2.6/site-packages/memcache.py", line 762, in _get server, key = self._get_server(key) File "/usr/lib/python2.6/site-packages/memcache.py", line 296, in _get_server server = self.buckets[serverhash % len(self.buckets)] ZeroDivisionError: integer division or modulo by zero
above comment in wrong bug, sorry
Script in use and seems happy.
lives in sysadmins/svc/scripts/ now, which only ops can see.