753566 - Set up databases for bedrock

Reporter

Description

•

12 years ago

We need a cron job which updates all the feeds on mozilla.org (caches them in memcache). It can run every hour, and should execute this:

cd bedrock && ./manage.py cron update_feeds

The dev site is the only one that has this cronjob right now, and is the only one with memcache, so just set it up there first and we'll test it. If all is good, we will setup memcache in prod and roll out the updates, then setup the cron job.

Jake Maul [:jakem]

Assignee

Comment 1

•

12 years ago

Checked into puppet yesterday. I see it in crontab, and it hasn't emailed me any errors...

Assignee: server-ops → nmaul

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

James Long (:jlongster)

Reporter

Comment 2

•

12 years ago

Hm, there *should* be posts listed under "in the news"...

http://www-dev.allizom.org/en-US/

Can you check to see when cron was last run? Or check that memcache is setup correctly and the dev server is pointing to it?

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

James Long (:jlongster)

Reporter

Comment 3

•

12 years ago

Checking in on this, still don't see the feeds on the home page. Any chance you could poke around it some more?

Jake Maul [:jakem]

Assignee

Comment 4

•

12 years ago

The cron is:

0       * * * * root cd /data/bedrock-dev/src/www-dev.allizom.org-django/bedrock; python manage.py cron update_feeds 2>&1 1> /dev/null | grep -v 'old-style Playdoh layout'

You, me, and Fred are in the MAILTO for it, so it'd be emailing us if there was any output. I just ran it by hand and the only thing is that old-style warning that gets filtered out.

What file(s) is this supposed to write to, and/or what DB/table does it put data in? I can check to see if there's any data actually present.

James Long (:jlongster)

Reporter

Comment 5

•

12 years ago

It should be dumping a pickled python object to memcache under the key "feeds-mozilla".

Make sure this is in settings/base.py:

FEEDS = {
    'mozilla': 'http://blog.mozilla.org/feed/'
}

Also make sure the cache backend in settings/local.py is set to memcache and all those settings are correct.

Thanks for looking into this!

Jake Maul [:jakem]

Assignee

Comment 6

•

12 years ago

I see that FEEDS block in settings/base.py.

In settings/local.py, I have this CACHES block:
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': [
            'bedrock-memcache1.webapp.phx1.mozilla.com:11211',
        ],
        'KEY_PREFIX': 'bedrock_dev'
    }
}

This is similar (but not identical to) what SUMO has in their settings_local.py.

I looked into memcache itself a bit:

> stats items
STAT items:2:number 4
STAT items:2:age 1665586
STAT items:2:evicted 0
STAT items:2:evicted_nonzero 0
STAT items:2:evicted_time 0
STAT items:2:outofmemory 0
STAT items:2:tailrepairs 0
STAT items:2:reclaimed 3255

> stats cachedump 2 100
ITEM bedrock_dev:1:dotlang-en-US-research/emscripten [6 b; 1337035506 s]
ITEM bedrock_dev:1:dotlang-en-US-newsletter [6 b; 1337035506 s]
ITEM bedrock_dev:1:dotlang-en-US-base [6 b; 1337035506 s]
ITEM bedrock_dev:1:dotlang-en-US-main [6 b; 1337035506 s]

So there are only 4 keys in memcache, and none of them are "feeds-mozilla".


Is there a suitable way to test memcache connectivity with manage.py somehow? Perhaps I can try that locally and see what happens.

James Long (:jlongster)

Reporter

Comment 7

•

12 years ago

You can run the cronjob yourself:

./manage.py cron update_feeds

I'm not sure how to specifically test memcache. If it helps, I can modify the cron job to print output, or you can do that in /apps/mozorg/cron.py. The job is very simple, it just uses feedparse to load a feed and then it sets it in the cache (which should set it in memcache).

Jake Maul [:jakem]

Assignee

Comment 8

•

12 years ago

It doesn't throw any errors, but I've discovered the problem. We never opened an ACL between bedrockadm and the memcache node for port 11211 (memcache)... just SSH for controlling it. Bug 756179 has been opened to rectify this. Once it's fixed, this should magically start working on the next cron interval (hourly, on the hour).

James Long (:jlongster)

Reporter

Comment 9

•

12 years ago

Unfortunately I don't see the feed appearing yet: http://www-dev.allizom.org/en-US/

Can you list the memcache keys again to see if it's at least in there?

Jake Maul [:jakem]

Assignee

Comment 10

•

12 years ago

I see feeds there now.

stats cachedump 29 100
ITEM bedrock_dev:1:feeds-mozilla [50924 b; 1337717112 s]
END

Perhaps the TTL on the memcache entry is shorter than the cron interval, and the cache entry gets expired? The cron runs hourly, on the hour. I don't know what the key's TTL is.

This would be another reason I think this might ultimately be better as a more normal memcache usage pattern (apart from the whole "how to keep scl3 and phx1 in sync" issue)... in a normal memcache pattern, the code would realize the key doesn't exist and would go fetch/generate it, and then store the results.

James Long (:jlongster)

Reporter

Comment 11

•

12 years ago

> Perhaps the TTL on the memcache entry is shorter than the cron interval, and
> the cache entry gets expired? The cron runs hourly, on the hour. I don't
> know what the key's TTL is.

Ah, that's got to be it. It looks like the key TTL is only 5 minutes. I'll change that and hope to see feeds appearing. Thanks.

James Long (:jlongster)

Reporter

Comment 12

•

12 years ago

Unfortunately I still don't see feeds under "In The News" on http://www-dev.allizom.org/en-US/

If it's in memcache though, it might be something on the Python side. I'll try to look into it soon.

James Long (:jlongster)

Reporter

Comment 13

•

12 years ago

Jake, I just ssh'ed into the dev box and did this:

[jlong@node273.seamicro.phx1 ~]$ telnet bedrock-memcache1.webapp.phx1.mozilla.com 11211
Trying 10.8.81.90...
Connected to bedrock-memcache1.webapp.phx1.mozilla.com.
Escape character is '^]'.
stats items
END

It appears that there are no items? I may have done this wrong.

Jake Maul [:jakem]

Assignee

Comment 14

•

12 years ago

Nope, you did that right. You can do just 'stats' to get a more general overview. One of the things there is "curr_items", which *should* be fairly accurate... although I don't think it's an actual honest count of items, so it might not be perfect.

I ran the cron by hand, and how I get this:

[root@bedrockadm.private.phx1 bedrock]# telnet bedrock-memcache1.webapp.phx1.mozilla.com 11211
Trying 10.8.81.90...
Connected to bedrock-memcache1.webapp.phx1.mozilla.com.
Escape character is '^]'.
stats
<snip>
STAT curr_items 1
<snip>
END
stats items
STAT items:29:number 1
STAT items:29:age 2955192
STAT items:29:evicted 0
STAT items:29:evicted_nonzero 0
STAT items:29:evicted_time 0
STAT items:29:outofmemory 0
STAT items:29:tailrepairs 0
STAT items:29:reclaimed 66
END
stats cachedump 29 100
ITEM bedrock_dev:1:feeds-mozilla [50892 b; 1338325352 s]
END

James Long (:jlongster)

Reporter

Comment 15

•

12 years ago

I just checked it myself and saw some stuff in there but after a little while it disappeared. I think that we're still dealing with the problem that the TTL on the key is too short and it is dropped before the next cron job.

I just made it set the key for 1 year since the cron job has its own interval. This also makes it so that if the cron job fails the content will still appear on the site.

James Long (:jlongster)

Reporter

Comment 16

•

12 years ago

It looks like this is all working now. The dev site is displaying the feed.

We'll need to set this up in production and get memcache ready for production now.

James Long (:jlongster)

Reporter

Comment 17

•

12 years ago

Jake, how much more work is it to make sure memcache is set up for production usage?

Chris More [:cmore]

Comment 18

•

12 years ago

Jakem: ping. If we can get this set up sooner than later, it would save IT time from having to do a push to update a new item link on the homepage. Same thing for bug 738381. Thanks!

Depends on: 738381

Chris More [:cmore]

Updated

•

12 years ago

No longer depends on: 738381

Jake Maul [:jakem]

Assignee

Updated

•

12 years ago

Depends on: 738381

Jake Maul [:jakem]

Assignee

Comment 19

•

12 years ago

I have rolled out this cron for www.allizom.org:

0 * * * * *  root echo "cd /data/www/www.allizom.org-django/bedrock; python manage.py cron update_feeds | /usr/bin/issue-multi-command bedrock-stage

I'm not entirely confident it will work, but we shall see. I suspect it may be a bit noisy.

Please let me know if it seems to work properly.

Anthony Ricaud (:rik)

Comment 20

•

12 years ago

So I think we need to change our approach here. I don't like how we use memcache as a database.

What about a cron that copies the feeds on the harddrives of the webheads? And then our code parses this local feed and puts it into memcache.

Anthony Ricaud (:rik)

Comment 21

•

12 years ago

Jake: Question in comment 20 was for you. (in case I was not clear)

Jake Maul [:jakem]

Assignee

Comment 22

•

12 years ago

With Chief in place, I'm wondering if we still need this at all, or if you're comfortable just doing a deploy whenever this needs updated.

I'm also wondering if we can just start using an actual database. What do you think about that? It's a little heavyweight for just this, but having a database is (or at least was) a planned dependency for Bedrock anyway, and we did lay a bit of infrastructure for it some time ago.

Anthony Ricaud (:rik)

Comment 23

•

12 years ago

Chief does reduce the number of person that needs to be involved by one (no IT). But it still involves webdev. In the end, we want only the blogger to be involved.

A database sure sounds nice. I was not involved in the original discussions for Bedrock infra. Was there any issue with a database ?

Jake Maul [:jakem]

Assignee

Comment 24

•

12 years ago

The only significant concern is making the Bedrock database work in both locations, but I don't think that's likely to be a problem. CC'ing :sheeri to see if she'd like to comment. I believe the PHX1 side is already set up, and we'd need to do the SCL3 side.

Apart from that I believe it was just a general desire not to add too many new dependencies too quickly. Adding a DB at some point has always been part of the plan. :jlongster and :wenzel might know more on the dev side of this.

Sheeri Cabral [:sheeri]

Comment 25

•

12 years ago

Actually, we have a database cluster for bedrock in scl3 - bedrock1.db.scl3.mozilla.org and bedrock2.db.scl3.mozilla.org. Right now there are no db's and no users (other than nagios, root and replication users), but it's ready to be in production (it's already being backed up and monitored) whenever you're ready for content.

I'm not sure where in phx1 we have bedrock set up, but I don't believe we have a separate cluster like we do in scl3, with 24G RAM, 200+ Gb disk space and 12-core  2.27GHz CPU. Jake - let me know what you were thinking regarding the phx1 side of things.

Jake Maul [:jakem]

Assignee

Comment 26

•

12 years ago

@sheeri: this should be pretty simple, I think, as the needs are pretty small and not likely to become massive in the near future. 2 servers on each end, replicating with each other, should do fine. I don't really know how hard that would be. The specs you've listed should be more than adequate in all regards... feel free to halve them if it helps speed things up any. :)


The dev env exists only in PHX1... we can use the existing dev db cluster in PHX1 for that. In fact Rik, if you want I can probably fork off a separate bug to get that set up soon-ish... I know you also want 3 extra environments for feature presentations, we can probably do that in dev as well, also on the dev db cluster...

Stage exists in PHX1 and SCL3. However the usage is very light/sporadic, and I'm okay with simply piggybacking that on the prod cluster, with different DB names and grants. Someday we may want a separate stage db cluster that exists in both locations, but I don't foresee that being a concern in the near future.

Whiteboard: [waiting][dba][webdev]

Sheeri Cabral [:sheeri]

Comment 27

•

12 years ago

When you say "2 servers on each end" do you mean 2 in phx and 2 in scl3?

If so, we have generic clusters in both phx and scl3, do you want to just use those? Or did we need the separate bedrock cluster in scl3? (if we don't need it, is it something we can/should reclaim? or keep around in case?)

Sheeri Cabral [:sheeri]

Comment 28

•

12 years ago

Also, we don't piggyback stage onto prod. We have a stage setup in scl3, and in phx folks use the dev machine for both dev and stage.

Jake Maul [:jakem]

Assignee

Comment 29

•

12 years ago

(In reply to Sheeri Cabral [:sheeri] from comment #27)
> When you say "2 servers on each end" do you mean 2 in phx and 2 in scl3?

Yes. The interesting part is that all 4 machines should always have the same data (excepting replication delay, of course). To me this implies a master/slave pair in each DC, with the masters replicating with each other... but I don't presume to dictate implementation... you certainly know better than I. :)

> If so, we have generic clusters in both phx and scl3, do you want to just
> use those? Or did we need the separate bedrock cluster in scl3? (if we don't
> need it, is it something we can/should reclaim? or keep around in case?)

It should be separate hardware. Bedrock (www.mozilla.org) is one of the few properties that is supposed to have dedicated resources, to avoid it being impacted by other properties.


(In reply to Sheeri Cabral [:sheeri] from comment #28)
> Also, we don't piggyback stage onto prod. We have a stage setup in scl3, and
> in phx folks use the dev machine for both dev and stage.

We can definitely use the dev cluster in PHX1 for dev stuff.

However, I don't know that it's feasible to use the stage cluster in SCL3... stage is active/active in PHX1 and SCL3, so we need the data replicated to both. We could possibly host the stage DB in only one place or the other, and just live with cross-datacenter DB accesses from the remote side.

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Depends on: 779947

Sheeri Cabral [:sheeri]

Comment 30

•

12 years ago

Found an existing bug (linked to this one now) to order hardware for bedrock in phx1. 

I'll think about dev tomorrow, it's time for dinner :D

Sheeri Cabral [:sheeri]

Comment 31

•

12 years ago

Jake - let's talk today in IRC about stage options in phx1.

Anthony Ricaud (:rik)

Comment 32

•

12 years ago

Sorry for the delay, lots of other fires since last week. Thanks for working on this.
Renaming cause the purpose of this bug changed.

Jake: Sure I definitely would like to see that set up on dev :)

Notes for when we start using this in the codebase (so not a blocker for setting up the databases):
- Update our chief script to include south migrations.
- Stop auto-updating stage.

Summary: Set up a cron job to update feeds on bedrock → Set up databases for bedrock

Jake Maul [:jakem]

Assignee

Comment 33

•

12 years ago

Sheeri and I discussed comment 29 in IRC. The consensus is as follows:

Dev DB will be in PHX1. There is already a suitable cluster, so it's just a matter of some network ACLs, making a DB, and GRANTing a user. The Dev site lives only in PHX1, so there's no concerns here. Easy peasy.

Stage DB will be in SCL3. Same scenario, except we'll need some cross-DC ACLs for this, so that the PHX1 stage node can reach it.

Prod DB will be in both locations, as discussed. Puppet will push out an appropriate /etc/hosts entry to point to the "correct" DB node for the web nodes in each datacenter. This is how we've dealt with this sort of thing in the past, so this is not a one-off / unique config situation.

I will open the appropriate NetOps ACL bugs and have them block this bug. I'll also open a bug for DBA's to create the databases and users in all the relevant places.

Once those are all resolved, it's a simple matter of adding the appropriate config stuff to settings/local.py.

Thanks all!

No longer depends on: 738381

Whiteboard: [waiting][dba][webdev] → [triaged 20120904][waiting][webops][dba]

Mike Alexis [:malexis]

Updated

•

12 years ago

Blocks: 750912

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Whiteboard: [triaged 20120904][waiting][webops][dba] → [triaged 20120904][waiting][webops][dba hardware]

Jake Maul [:jakem]

Assignee

Comment 34

•

12 years ago

Rik / Mike / Chris: How should we handle databases for the 3 www-demoX.allizom.org sites? I suppose the most complete solution is to have 3 separate DBs.

Sheeri: any word on the DB hardware for prod? I think you had some nodes already, or had a bug for them...

Component: Server Operations: Web Operations → Server Operations: Database

Whiteboard: [triaged 20120904][waiting][webops][dba hardware] → [triaged 20120904][waiting][dba hardware]

Sheeri Cabral [:sheeri]

Comment 35

•

12 years ago

according to :csheilds, hardware has been ordered. We do have the old sfx01 hardware but they might already be appropriated (datazilla stage/dev).....plus for something like bedrock we probably want the latest and greatest anyway.

Jake Maul [:jakem]

Assignee

Comment 36

•

12 years ago

Waiting for confirmation for the dev and demoX sites

DB and GRANTs completed for staging on stage1.db.scl3... will use SCL3 Zeus VIP

Config in place for stage, but left commented out in case it actually breaks anything at the moment.


Will open ACLs all at once... easier on NetOps/OpSec if they can see the whole picture.

Anthony Ricaud (:rik)

Comment 37

•

12 years ago

Sorry for the late answer, I was focused on other tasks and I'm now not working on Bedrock anymore.

CCing devs who are currently discussing the issue.

Osmose [:osmose, :mkelly]

Comment 38

•

12 years ago

(In reply to Jake Maul [:jakem] from comment #36)
> Waiting for confirmation for the dev and demoX sites

We'd like separate databases for each, but a single server will be fine. We plan to (for the demo servers) wipe and re-run the database on each push to ensure that we have good sample data included.

We'd like to do this with a commander script in the github repo, and we'd need some way of determining if the server is a demo, dev, stage, or prod server in the script. 

Once the database is in place for demo I can start work on adding the commander script in the repo. Is there anything special I need to do besides mimicking what other sites do? Can I get a copy of whatever script/commands chief runs when it pushes out bedrock?

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Whiteboard: [triaged 20120904][waiting][dba hardware] → [triaged 20120904][waiting][dba build]

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Whiteboard: [triaged 20120904][waiting][dba build] → [triaged 20120904][waiting][sre kickstart]

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Whiteboard: [triaged 20120904][waiting][sre kickstart] → [triaged 20120904]

Sheeri Cabral [:sheeri]

Updated

•

12 years ago

Whiteboard: [triaged 20120904] → [triaged 20120904][waiting][netflows]

Sheeri Cabral [:sheeri]

Comment 39

•

12 years ago

I have set up the databases in phx to slave scl3, so there is data flowing and usable data.

The architecture is:

bedrock1.db.scl3 -> bedrock1.db.phx1 -> bedrock2.db.phx1
| ^
V |
bedrock2.db.scl3

That is, the failover for bedrock1 in scl3 is bedrock2 in scl3, right now. We can change this to bedrock1.db.scl3 <-> bedrock1.db.phx if desired.


I have set up bug 823252 for monitoring and bug 823254 for backups.

Blocks: 823252, 823254

Whiteboard: [triaged 20120904][waiting][netflows] → [triaged 20120904]

Sheeri Cabral [:sheeri]

Comment 40

•

11 years ago

Resolving in favor of bug 901078.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → Data & BI Services Team