Closed Bug 782577 Opened 12 years ago Closed 12 years ago

Migrate sync1.1 to python-memcached instead of pylibmc

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: rfkelly, Unassigned)

Details

(Whiteboard: [qa+])

Attachments

(2 files)

This is a potential alternative to Bug 781727.

In my tests, python-memcached is approximately twice as slow as pylibmc but it does not block the gevent event loop.

When we had the always-very-fast response guarantees of memcached I think pylibmc was clearly the better option.  But with couchbase, we've been seeing that heavy write loads can cause it to start responding very slowly, making pylibmc block for a second or more.  I'd like to evaluate python-memcached in stage to see how it handles this scenario.

Posting for review, but I don't think we should commit it until we've tried a fully-healthy stage setup with the loadtest changes from Bug 782002 and seen what effect that has on couchbase performance.
Attachment #651670 - Flags: review?(telliott)
Whiteboard: [qa+]
Attachment #651670 - Flags: review?(telliott) → review?(rmiller)
Attachment #651670 - Flags: review?(rmiller) → review+
Ugh, forgot to change the import in test_memcachedsql.py to match.
Attachment #653593 - Flags: review?(rmiller)
Attachment #653593 - Flags: review?(rmiller) → review+
Committed in:

http://hg.mozilla.org/services/server-storage/rev/f2964668e926
http://hg.mozilla.org/services/server-storage/rev/65e3698f2433

Leaving the bug open while we see whether this helps us in stage.
Just FYI - I verified that these changes made it to Stage as part of this bug:
784229
Initial deployment of this to stage did not go well.  Under modest load we are getting "SERVER ERROR proxy downstream timeout" errors from moxi, which make python-memcache barf in strange ways.

Moxi gives these errors when the downstream couchbase server does not respond within 5s, so it might not be python-memcache's fault.  But I don't think we were seeing these errors with pylibmc.  So more investigation is needed.

We're also still getting some reports that the gevent eventloop is being blocked, although I think they might be spurious.  Bug 786176 adds a better detector which should give us a more definitive answer in that regard.
OK. So marking this appropriately, since we, in effect, rolled back.
See bug 786479
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → VERIFIED
I'm going to re-open this since it is still on default branch but was backed out in the release branch.  We need to remember to back it out on default branch once we're happy that this was the right move.
Status: VERIFIED → REOPENED
Resolution: INCOMPLETE → ---
We've been having good results from pylibmc with the adjusted funkload suite, and there doesn't seem to be an easy way to work around the lack-of-connection-pooling issues we hit when trying to push this to stage.  So, I'm marking this WONTFIX.

The high-level goal to replace pylibmc will be pursued via umemcache and a custom pooling layer.

Backed out on default branch in three steps:

http://hg.mozilla.org/services/server-storage/rev/cd93acfa2cbe
http://hg.mozilla.org/services/server-storage/rev/c0b6bde29ea6
http://hg.mozilla.org/services/server-storage/rev/74800783777e
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → WONTFIX
OK. Once again, marking this Verified.
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: