Closed Bug 781727 Opened 12 years ago Closed 12 years ago

Migrate sync1.1 to umemcache instead of pylibmc

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: rfkelly, Assigned: rfkelly)

Details

(Whiteboard: [qa+])

Attachments

(3 files)

Per Bug 774544, pylibmc blocks the gevent event loop.  This is usually find because memcache returns very quickly, but we've seen it causing problems on stage when the couchbase server gets bogged down.

We should change to use umemcache, which explicitly uses the python socket module for gevent-compatibility.
I've had a first go at this, attaching the results in two parts.  The first is a copy of the mozsvc/storage/mcclient.py file to syncstorage/mcclient.py, with some light edits for compatibility with the pre-pyramid codebase.

We have plans to split this out as part of a utility lib, which could be shared between mozsvc-based apps and server-core-based ones.  But that's not ready yet.  Building on a local copy of the same API seems like a good way to move forward while planning to merge implementations in the future.
Attaching changes to the existing server-storage code to use the mcclient.py functionality.

It's a bit fiddly because we have to emulate the serialization behaviour of pylibmc/python-memcache, which store strings and ints directly but pickle more complex objects.  I personally find this a bit wierd, but we could consider making the default behaviour of mcclient.py in the name of compatability.

And a minor annoyance: umemcache doesn't have an API for the flush_all() command, so the unittest cleanup code gets a bit uglier.
OK, three parts.  We also need some custom build commands because the umemcache sdist from pypi doesn't play nicely with pypi2rpm.
I haven't flagged these patches for review because it all seems a little bit messy, so I'm not sure if this is necessarily the best way to go.  I'd be happy for them to be committed as-is, there's similar code in mozsvc already.  But it's a lot of changes == a lot of chances for subtle bugs or incompatibilities.

We want to commit *something* that's gevent compatible so we can push ahead with load testing on stage.  But I'm wondering if it would be simpler to fall back to python-memcached for now, and spend a bit more time on the ultramemcached changes.  Thoughts?
I'm a bit nervous about python-memcached. Despite the fact that it should be non-blocking, other folks who have tested w/ it have shown it to perform less well under gevent than w/o:

https://groups.google.com/forum/#!msg/gevent/Rm1Bsd7qpJs/jJ5UG6Z3OJkJ[1-25]
Whiteboard: [qa+]
I think getting involved with the ultramemcached folks looks like a good longterm solution, but I don't want us to get hung up in the short run. So, the question for me is "does the python-memcached impact our current level of perormance?"

If the locking is just causing the webheads to be able to handle 3x our current traffic rather than 10x, this is a high-class problem and shouldn't block getting out all those other goodies that we really want out there. If it's actually hurting performance, the we obviously need to look at other solutions.
Let's not do this, it's too much code churn in pursuit of a problem we don't 100% understand yet.  We can try going back to python-memcached if it proves worthwhile in stage (Bug 782577) and will look to incorporating a proposed new umemcached-based library once we have it a little more ready.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Status: RESOLVED → VERIFIED
Reopening this after the rather dismal failure of our initial attempt to use python-memcached.

I have pushed this into the 1.13-release branch on mercurial, so that we can try it out on stage in the hopes of getting to the bottom of our problems:

  http://hg.mozilla.org/services/server-storage/rev/1021a4016599

It's marked with a big "DO NOT MERGE" since obviously this hasn't been reviewed or even decided upon.  Hopefully we'll be able to try this out tomorrow and get some data.
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
Please, be sure to have a look at what tarek did about umemcache (we needed it for zamboni) https://github.com/tarekziade/ultramemcache/
:alexis thanks for the heads-up!

Bug 786569 reports the results of our experimental deployment of this.  Summary: umemcache worked fine, suspect some buginess in my queueing implementation.  We'll coordinate with the stuff you guys are doing for zamboni and try to bring it is as a dependency in some future release.

Returning to WONTFIX.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → WONTFIX
OK.Returning to Verified.
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: