If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

tool to wipe all memcache data for given user(s)

VERIFIED FIXED

Status

Cloud Services
Server: Sync
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: atoll, Assigned: telliott)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [qa+])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

6 years ago
As part of solving the HMAC mismatch bug, it turns out that we're not wiping the users data in memcache.  So let's do that.  Need a tool that can wipe out the data in memcache for a given list of userids on STDIN.  Userids can be in any format, just let us know.  (Username, numeric userid, ldap dn, whatever.)

This is also known as the "HMAC mismatch bug prevention tool", decide ETA accordingly.
Background explanation: an HMAC mismatch occurs when a client downloads a record which was encrypted with a different key than the one they are holding for that collection.

In this case, it's possible for the following to happen:

* Client A uploads key K1.
* Client A uploads tab record K1(R1).
* Server is partially wiped: DB-stored data (K1) is eliminated.
* Client B generates and uploads new key K2.
* Client B downloads tab record K1(R1).
* HMAC mismatch.

Builds after Bug 650208 are no longer susceptible to this bug, because when a new key is uploaded, all server data is deleted. However, this won't land for another 3 or 4 months (Firefox 6), so it would be valuable to avoid the situation. The most direct way to do that is to kill the contents of memcache for a migrated user.

(A workaround for tab HMAC problems is to temporarily disable, sync, then re-enable tab sync.)
(Assignee)

Comment 2

6 years ago
to clarify - "server is partially wiped" meaning that we do the wiping, due to a disk problem or migration?
(Reporter)

Comment 3

6 years ago
yes
(Reporter)

Comment 4

6 years ago
(oops.)

Yes, that was the known cause.  There could be other ways to trigger that scenario, but this tool was requested specifically to address the one scenario where we know we're causing the problem.

Updated

6 years ago
Group: services-infra
(Assignee)

Updated

6 years ago
Assignee: nobody → telliott
(Assignee)

Comment 5

6 years ago
Created attachment 539934 [details]
Script to clean out a user's memcache. Designed to work in the ops node purge chain

cat data | memcache_clean.py server1 server2 server3 server4

Will output something like:

Cleaning id: 1 (OK)
Cleaning id: 2 (OK)
Cleaning id: 3 (OK)
...


It's an ops script, so I don't think it belongs in any of our hg repos. Just wherever you have the rest of this chain.
Attachment #539934 - Flags: review?(rsoderberg)
(Reporter)

Updated

6 years ago
Attachment #539934 - Flags: review?(rsoderberg) → review+
(Reporter)

Comment 6

6 years ago
Curious if attachment 539934 [details] works for python and php, or just python.
(Assignee)

Comment 7

6 years ago
It doesn't actually care about the data, so unless the keys are encoded differently (which would shock me) or the hashing algo to determine a server is different (which wouldn't), should work for either.
Linkage: Bug 646269.

atoll, if this is r+, how close is this to RESOLVED FIXED?
OS: Mac OS X → All
Hardware: x86 → All
(Reporter)

Comment 9

6 years ago
(In reply to comment #8)
> Linkage: Bug 646269.
> 
> atoll, if this is r+, how close is this to RESOLVED FIXED?

This bug appears to live outside of process (no hg, no developer r?, no qa), so there's no clear answer to your question.

I would expect no sooner than Python sync is deployed to production.
(Reporter)

Comment 10

6 years ago
(In reply to comment #7)
> It doesn't actually care about the data, so unless the keys are encoded
> differently (which would shock me) or the hashing algo to determine a server
> is different (which wouldn't), should work for either.

"just python", then. Thanks!
(Assignee)

Comment 11

6 years ago
(In reply to comment #9)

> This bug appears to live outside of process (no hg, no developer r?, no qa),
> so there's no clear answer to your question.

I assume you're going to take it and put it in the same location as the rest of the script of which it's part of the chain. This is really just an ops script that happened to be written by a developer, so I'd put it through the same process you put the rest of that script.
Maybe https://hg.mozilla.org/services/admin-scripts/ could be the place were we collect all those scripts. 

In the future, it could be packaged and contain more python goodies to deal with the various servers, use core etc..
Comment on attachment 539934 [details]
Script to clean out a user's memcache. Designed to work in the ops node purge chain

since we did not retain the memcached python/php compatibility, the keys are different in python.

The keys to wipe are:

"UID:tabs"
"UID:meta:global"
"UID:size"
"UID:collections:stamps:NAME"
"UID:stamps"

with UID = user id, NAME = collection name
scratch this one (typo) "UID:collections:stamps:NAME"
(Assignee)

Comment 15

6 years ago
Created attachment 542199 [details]
revised version with python memcache names

Attaching one with the new names. We should run this against an actual python install, though I don't know how we'd pick up ones that we missed.
(Assignee)

Comment 16

6 years ago
Created attachment 542204 [details]
revised version with python memcache names
Attachment #542199 - Attachment is obsolete: true

Updated

6 years ago
Attachment #542204 - Flags: review+
(Reporter)

Comment 17

6 years ago
While testing to see if it can handle uidnumbers, I determined that it cannot:

[root@wp-web01.phx.weave petef]# echo 2222 | ./memcache_dump.py
Traceback (most recent call last):
  File "./memcache_dump.py", line 21, in <module>
    print "%s:%s\t%s" % (username, key, memc.get("%s:%s" % (id, key)))
  File "/usr/lib/python2.6/site-packages/memcache.py", line 793, in get
    return self._get('get', key)
  File "/usr/lib/python2.6/site-packages/memcache.py", line 762, in _get
    server, key = self._get_server(key)
  File "/usr/lib/python2.6/site-packages/memcache.py", line 296, in _get_server
    server = self.buckets[serverhash % len(self.buckets)]
ZeroDivisionError: integer division or modulo by zero
(Reporter)

Comment 18

6 years ago
above comment in wrong bug, sorry
(Assignee)

Comment 19

6 years ago
Script in use and seems happy.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 20

6 years ago
lives in sysadmins/svc/scripts/ now, which only ops can see.
Whiteboard: [qa+]
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.