Closed Bug 688623 Opened 14 years ago Closed 12 years ago

Storage server is matching on 'client', not 'clients'

Categories

(Cloud Services Graveyard :: Server: Sync, defect, P1)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: telliott, Assigned: rfkelly)

Details

(Whiteboard: [qa+][sync 2.0])

Attachments

(2 files, 4 obsolete files)

http://hg.mozilla.org/services/server-storage/file/9823a3f4b519/syncstorage/storage/sql.py#l61 s/client/clients/ Right now, every user is creating an entry in the collections table for 'clients', which is supposed to be the standard collection.
Assignee: nobody → telliott
Attached patch changes over to 'clients' (obsolete) — Splinter Review
Attachment #561896 - Flags: review?(tarek)
OK, here's the full scoop: As a DB optimization, we use numeric codes to identify collections. Because we want to support collections beyond the 'official' ones (or did, perhaps this needs revisiting) we needed a way to map those collection names to numbers. Thus was born the collections table. Official collections are stored in a constant list with ids 1-12. User-generated collections get added to the collections db, starting with 100. Since most users never go beyond the standard collections, this is pretty efficient. However, there was a typo in the transition from php to python. The 'clients' collection was listed in the constants hash as 'client' (as, it turns out, was 'key'/'keys'). The result of this when we moved over to python is that every user tried to get their clients collection, discovered it didn't exist and reuploaded everything (this, along with the memcache reset, was probably why the python transition was so melty). When they uploaded to 'clients', the server happily complied, giving them a row in the collections table. This, by the way, is why it never showed up on tests - everything performed just fine; the only (minor) hit was to efficiency when starting from empty. So, the vast majority of our users have a row in the collections table that looks like: <userid> | 100 | clients Of course, the collectionid will be different for a few users, especially if they created an unofficial collection during the php era. The easiest fix is as follows: 1) Down the servers 2) Push updated code 3) for each wbo table run: update wboX as w join collections as c on w.username = c.userid and c.name = 'clients' and w.collection = c.collectionid set w.collection = 1; and update wboX as w join collections as c on w.username = c.userid and c.name = 'keys' and w.collection = c.collectionid set w.collection = 5; 4) Bring the servers back up. Things should work as expected. 5) Once we're happy, delete all rows in collections table that refer to 'keys' and 'clients' A couple other notes that will get their own bugs: * Early on, the ability to migrate a user was very important. This is why collectionids are per-user for custom collections. Since migrations have become less important I've been toying with moving this to a per-collection value, and this is just more impetus to do so. Things will still be different between nodes, but all the users on a node will be constant. * There also appears to be a problem in the python code where custom codes start right after the constants, rather than with 100. This gives us no room to add new constants in the future and will need to be fixed.
Attachment #561896 - Attachment is obsolete: true
Attachment #561896 - Flags: review?(tarek)
Attachment #562200 - Flags: review?(tarek)
Attachment #562200 - Flags: review?(tarek) → review+
OK, figured out a way to do the fixes with no/minimal downtime 1) set up node assignment such that new assignments are all going to a node or two that have no current users on them 2) pull a webhead out of standard rotation and use it to only serve those nodes. Install the updated version on it. 3) do the following sql on all the other nodes: update wboX as w join collections as c on w.username = c.userid and c.name = 'clients' and w.collection = c.collectionid set w.collection = 1, c.collectionid = 1; update wboX as w join collections as c on w.username = c.userid and c.name = 'keys' and w.collection = c.collectionid set w.collection = 5, c.collectionid = 5; at this point, the collections tables will have entries for those collections that correspond to the 'correct' constants, and the wbo tables will match the collections table, thus looking up correctly. 4) install the updated server code on all webheads. The 'correct' constants are still used, but now they come out of the constants hash rather than the collections table. 5) undo steps 1 and 2 6) execute the cleanup sql: delete from collections where collectionid = 1 delete from collections where collectionid = 5
actually, reverse 1 and 2 here, but close enough
Toby - is this planned for any upcoming deployment, maintenance window, or change window?
(In reply to James Bonacci [:jbonacci] from comment #6) > Toby - is this planned for any upcoming deployment, maintenance window, or > change window? Ops and Dev have not scheduled a date for this to go to production. We'll know more next week once Pete is back and we've all had a chance to discuss.
Another potential solution that doesn't require putting users on new nodes: We add a config variable that says "fixed = true" if the node has been fixed. In the storage code, when we initialize the node, we set the constant list for that instance to the corrected ones where fixed is true. Once all nodes are fixed, we push code without the path change, then drop fixed = true. The downside of this approach is that the node will need to be downed for the portion of time where the sql is being run.
Whiteboard: [qa+]
I have backed out the patch for this bug at telliott's recommendation: http://hg.mozilla.org/services/server-storage/rev/e71423ce22de We need a plan for fixing the data before we can deploy this change, so we don't want it in the tree delaying the release of other fixes
Assignee: telliott → rkelly
Whiteboard: [qa+] → [qa+][sync 2.0]
2.0 will fix this and 1.1 will eventually diaf.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Worth considering for the AWS version? We're on separate webservers there, so there's no danger of clobbering old data, and we get to avoid a ton of collections lookups. On the downside, we'd be running a forked version on AWS. Thoughts?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to Toby Elliott [:telliott] from comment #11) > Worth considering for the AWS version? We're on separate webservers there, > so there's no danger of clobbering old data, and we get to avoid a ton of > collections lookups. On the downside, we'd be running a forked version on > AWS. Thoughts? It's funny, we've probably re-installed just about every DB machine on which this bug was present by now - if only we'd known we could have actually fixed the problem! Rather than forking an aws-specific version of the code, we could make a new config switch that's off by default, but switched on in AWS. It would be something awful like "[storage]extra_standard_collections = true" but it would nicely solve this issue without introducing too much management overhead. Attaching a quick attempt at something along those lines...
Ugh. Awful from code POV, but nice from management POV. The code might look nicer if I merged STANDARD_COLLECTIONS and EXTRA_STANDARD_COLLECTIONS into a local instance attribute. Anyway, thoughts?
Attachment #759530 - Flags: feedback?(telliott)
(In reply to Ryan Kelly [:rfkelly] from comment #12) > It's funny, we've probably re-installed just about every DB machine on which > this bug was present by now - if only we'd known we could have actually > fixed the problem! > While this is true, it unfortunately wasn't sufficient - they had to all be done at the same time. A config option to fix it is just fine from my perspective.
Comment on attachment 759530 [details] [diff] [review] quick hack to demo how extra_standard_collections might work Review of attachment 759530 [details] [diff] [review]: ----------------------------------------------------------------- This will work. See below for a couple of questions. ::: syncstorage/storage/sql.py @@ +373,4 @@ > table.create(checkfirst=True) > self.engine_name = self._engine.name > self.standard_collections = standard_collections > + self.extra_standard_collections = extra_standard_collections Better to check two hashes than simply add the extras into the collections hash on init? @@ +545,5 @@ > > + if self.extra_standard_collections: > + ids = EXTRA_STANDARD_COLLECTIONS.keys() > + min_id = max(ids) + 100 > + elif self.standard_collections: The lines below are the other big error, but is this a better fix than the original (php) min_id = 100? Unlikely to ever come up given the state of the project, but it seems like it would be easier for ops to have a consistent starting point in the collections table across all instances.
Attachment #759530 - Flags: feedback?(telliott) → feedback+
It occurs to me that if we're going to do this, we should add 'addons' to the fixed collection lists. That's getting some uptake.
Given that we're not actively working on this, I think an approach like this might work. (Note: untested. My server-storage is horked. Will chat about that this afternoon)
Attachment #761086 - Flags: feedback?(rfkelly)
(Fixed an error)
Attachment #761086 - Attachment is obsolete: true
Attachment #761086 - Flags: feedback?(rfkelly)
Attachment #761089 - Flags: feedback?(rfkelly)
Toby and I discussed another benefit of this approach IRL, and I want to note it here. Since the "fixed_collections" setting is per Storage backend, we can selectively enable it for some databases and leave it disabled for others. This gives us a nice path towards fixing the issue on the current production machines: take a db down, flag it as fixed_collections in the config, bring it back up and let everyone re-upload their data into the newly-fixed database. Repeat one at a time for all dbs.
While purging currently-ok dbs may be overkill, this is definitely something we should do on any new node we spin up.
Comment on attachment 761089 [details] [diff] [review] Another approach to the different collections Review of attachment 761089 [details] [diff] [review]: ----------------------------------------------------------------- I like this tweaked approach, will start from your patch and add a couple of tests etc.
Attachment #761089 - Flags: feedback?(rfkelly) → feedback+
OK, here's an updated patch with a few tweaks: * take a local reference to the appropriate dicts, to avoid lots of duplicate code that switches between STANDARD_COLLECTIONS and FIXED_COLLECTIONS * enable fixed_collections=true in the tests * remove an apparently-useless call to set_collection() * and a little bit of formatting while I was in there
Attachment #759530 - Attachment is obsolete: true
Attachment #761089 - Attachment is obsolete: true
Attachment #762536 - Flags: review?(telliott)
Comment on attachment 762536 [details] [diff] [review] patch to allow use of fixed collection names Man, I spent forever tracing through to try to figure out what that set_collection was trying to do. Wrapping it in that if statement makes no sense at all! Anyway, this does seem like the best of all worlds. Looks good.
Attachment #762536 - Flags: review?(telliott) → review+
Committed in http://hg.mozilla.org/services/server-storage/rev/6f2ea2dbcc08 I think that's about as "fixed" as this is going to get, so closing out the bug.
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: