Closed Bug 738381 Opened 12 years ago Closed 12 years ago

Setup memcache for bedrock/mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlong, Assigned: nmaul)

References

Details

We could really use memcache on mozilla.org to cache things that should only be generated once. Is it possible to set it up by next Thursday, when we are hoping to push a bunch of bedrock stuff live?
How much cache do you think you might need? Seamicro Atom nodes and/or VM's should be able to do this pretty well and we have plenty of them available right now, but the total RAM might be somewhat limited. But that might be enough to get you started.

We can do a "real" memcache node (several GB of usable cache space), but might involve ordering hardware.

We also have some ready-to-go "generic" memcache nodes, but using them would violate the idea that www.mozilla.org should be somewhat isolated from other things, so as to avoid resource contention causing issues with it (very important site, don't want random-thing.mozilla.org breaking it). So that's non-ideal, IMO.


Any comments, cshields? I don't think the Seamicro Xeons will be ready any time soon, since that's still blocked on so many power/network things. So it seems to me like either spin up some Atoms, VMs, or allocate/buy some blades.
At least at first, we'll just use it to cache the output of RSS feeds from blogs and things like that. I can't imagine needing very much RAM. It will be a global cache, the exact same for every single user. It will be refreshing every 30 minutes or so.

I would bet even 256 or 512MB would be enough, but I can't say for sure until we profile it.
Blocks: 736338
Any word on getting this done by Thursday? We'd like to use it to cache a blog feed.
Hardware for this is allocated:

https://inventory.mozilla.org/en-US/systems/show/5042/

@phong, can you have someone kickstart/puppetize this? RHEL6, x86-64... I can get it's puppet manifests straightened out once it's online.

Thanks!
Assignee: server-ops → phong
Thanks Jake! I'll assume it'll be ready by Thursday. If not, I think I can switch to a filesystem-based cache.
kickstarting right now.
bedrock-memcache1.webapp.phx1.mozilla.com is kickstarted, puppetized and added in Nagios with generic checks.
Reassigning to Jakem.
Assignee: phong → nmaul
We're not relying on this for tomorrow's release, though it would be great to get up and running soon.
No longer blocks: 736338
Working on this today.
This system is up and should be usable, at least for prod (need to open a netops ACL bug for dev/stage).

Let me know what settings you need in place to start using this.
Jake: Were you able to hook up memcache for dev/stage yet too? Those don't need to be too beefy, they just need to work :)

jlongster: Can you tell Jake what settings need to be put in place for memcache? Standard django stuff, I presume.
I think this is all you need, right?

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}

Replace the LOCATION with the appropriate thing, of course.
can LOCATION be a list?
(In reply to Fred Wenzel [:wenzel] from comment #13)
> can LOCATION be a list?

I don't see anything about that: https://docs.djangoproject.com/en/dev/ref/settings/#std:setting-CACHES-LOCATION
(In reply to James Long (:jlongster) from comment #14)
> (In reply to Fred Wenzel [:wenzel] from comment #13)
> > can LOCATION be a list?
> 
> I don't see anything about that:
> https://docs.djangoproject.com/en/dev/ref/settings/#std:setting-CACHES-
> LOCATION

It should, we use it on a variety of sites, e.g.

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
            'memcache-generic01:11211',
            'memcache-generic02:11211',
        ],
        'KEY_PREFIX': 'mozillalabs_stage'
    }
}
Yeah seems like a documentation fail. A list should work indeed:

http://stackoverflow.com/questions/6876250/how-does-django-handle-multiple-memcached-servers

All right, use a list then! :)
How should we go about testing this? Should we add the settings to www-dev.allizom.org and test it there, and then roll the code to production? Should www-dev continue using the production memcache (probably not)?
I'm not sure on the status of memcache servers for bedrock itself, given it's out of two DCs, etc

302 jake and maybe corey on where that's at
(In reply to James Long (:jlongster) from comment #17)
> Should www-dev continue using the production memcache (probably not)?

God no :)
I'm actually okay with dev using the prod memcache, because comment 15 indicates you can specify a KEY_PREFIX, which should effectively prevent any collisions. I've never known memcache to be noticeably slower under extra load (at least until you have kernel-level problems), so the only real concern is cache size. There's around 10GB of cache space, and we can always add more if needed.

In any case, there seems to be some sort of issue with this. Once this setting is in place, manage.py jobs no longer work. This is for www-dev.allizom.org:

[root@bedrockadm.private.phx1 bedrock]# python manage.py compress_assets 2>&1 1> /dev/null | grep -v 'old-style Playdoh layout'
Error: No module named memcache

Any ideas?
Prod will be a bit more of a pain, we'll need to spin up a memcache node there still and use puppet to set a /etc/hosts entry to use that goes to the right place.

This means memcache would not be suitable for session info, as users will bounce between DC's. Is that okay or do we need to come up with something different?
I disabled this on www-dev.allizom.org to stop the emails. Let us know when the module is installed (something in vendor I guess) and we'll uncomment this setting.
(In reply to Jake Maul [:jakem] from comment #20)
> 
> [root@bedrockadm.private.phx1 bedrock]# python manage.py compress_assets
> 2>&1 1> /dev/null | grep -v 'old-style Playdoh layout'
> Error: No module named memcache
> 
> Any ideas?

The memcache package wasn't installed. I just installed it and pushed to dev.
Heh... well, I just installed python-memcached package as well (RPM). I don't know which of our fixes fixed it, but it's fixed now and turned back on.
(In reply to Jake Maul [:jakem] from comment #21)
> Prod will be a bit more of a pain, we'll need to spin up a memcache node
> there still and use puppet to set a /etc/hosts entry to use that goes to the
> right place.
> 
> This means memcache would not be suitable for session info, as users will
> bounce between DC's. Is that okay or do we need to come up with something
> different?

Yep, for now at least. We usually can't depend on sessions because that inherently means we can't cache the page.


(In reply to Jake Maul [:jakem] from comment #22)
> I disabled this on www-dev.allizom.org to stop the emails. Let us know when
> the module is installed (something in vendor I guess) and we'll uncomment
> this setting.

Sorry about that, I was trying to get to it earlier but some other stuff came up.
For stage (www.allizom.org, which still needs moved over to the bedrock cluster), we can use the same memcache node as here. no problems, stage is only in PHX1, like dev.

For prod, we'll need to set up another memcache node in SCL3. Then we'll set the configs up to look at a simple/short hostname (just "memcache" or something), and then use puppet to deploy a proper /etc/hosts record for that to each cluster. That'll work just fine for the web nodes accessing memcache normally.


This brings up a problem with the cron job though... it will need to be able to import things to *both* sets of memcache nodes. I'd rather not set up a separate admin node for bedrock in SCL3, just because that seems likely to make things more complicated than they need to be (they might get out of sync).

So let's open this up for discussion: how can we have the one admin node write to both sets of memcache nodes, and yet still have the settings_local.py file only point to the "local" set of memcache nodes?

Lots of ideas come to mind:

1) Maintain separate settings for the update_feeds cron, so that it knows about both sets of memcache nodes, and can set keys on both of them.

2) Perhaps the update_feeds cron could take an argument indicating which set of nodes to update, and we could just call it twice... once with each argument. This would still require some setting somewhere so that it would know which nodes are which (or perhaps the argument could be the whole node list for a cluster).

3) Make a separate cron that builds on the update_feeds job... fetch the keys from the PHX1 memcache and insert them in SCL3. This could potentially be done entirely outside of Django as a shell script or something. Same settings problem as #1 and #2 though.

4) Put the update_feeds cron on one or more of the web nodes, instead of the admin node. Pretty strange organization (generally our web nodes have only system-level crons on them). Also, either some nodes become "special", or they all get the cron and we do extra work.

5) Ultimately eschew memcache and put these records in MySQL, which should eventually be a master/master cluster, in which case writes anywhere will be available in both places. Obviously this would be dependent on having MySQL up and running.

6) Find some way to mirror data between the 2 memcache clusters. Maybe there's a generic solution that will work.

7) Don't do this in cron. Instead, have the nodes pull down this information if it's not already there, and also set a 1-hour timestamp key, and automatically refresh when it's hit and the value is past (or nearing) TTL. Basically, treat it more like a traditional memcache installation.

Thoughts?
(In reply to Jake Maul [:jakem] from comment #26)
> 1) Maintain separate settings for the update_feeds cron, so that it knows
> about both sets of memcache nodes, and can set keys on both of them.
> 
> 2) Perhaps the update_feeds cron could take an argument indicating which set
> of nodes to update, and we could just call it twice... once with each
> argument. This would still require some setting somewhere so that it would
> know which nodes are which (or perhaps the argument could be the whole node
> list for a cluster).

manage.py does indeed have a --settings option (sometimes used to run tests, for example):

  --settings=SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
                        used.

We could use that to feed two different sets of memcache settings into the settings file (all else be equal), then call the cron job twice?
Blocks: 759564
Blocks: 753566
Jakem: ping. :)
No longer blocks: 753566
Blocks: 753566
I have bedrock ready to go for prod... not counting the cron job, which is a separate issue to be handled in bug 753566.

Puppet manages a "memcache1-prod" entry in /etc/hosts, which correctly points to the local memcache node in SCL3 and PHX1. There is no synchronization between PHX1/SCL3 memcache, but this should pose no issues for normal memcache usage patterns. If it becomes a problem we can investigate Couchbase licensing, which theoretically implements this.


Please let me know if I should enable the new CACHES block. It looks like this:

#CACHES = {
#    'default': {
#        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
#        'LOCATION': [
#            'memcache1-prod:11211',
#        ],
#        'KEY_PREFIX': 'bedrock_prod'
#    }
#}


The current CACHES block looks like this:
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'translations'
    }
}
We should enable it on stage (www.allizom.org) first. Please ping me before doing it since we don't want www.allizom.org to be broken as we are preparing for the Firefox 14 launch.
This is deployed for stage, and we confirmed that keys are being set properly. I have PTO tomorrow and Monday, and can enable this for prod on Tuesday... or you can have someone else from webops take care of it. The new CACHES block is already in the settings/local.py file, just commented out. Simply remove the CACHES block for locmem and uncomment the one for memcache.

Note that this bug is purely about making memcache work and enabling it in Django... it's not about setting up any cron's to write to memcache. There are other bugs for that (notably 759564 and 753566, at least).
Blocks: 760570
This is completed! So far everything seems fine.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
No longer blocks: 753566
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.