622752 - Build out SUMO in PHX

Assignee

Description

•

14 years ago

1 - we will build the following systems in PHX: 5 app servers 2 db servers 1 celery server 2 - puppetize them 3 - install SUMO, copy data 4 - zeus config 5 - test

Corey Shields [:cshields]

Assignee

Comment 1

•

14 years ago

A status update to this... All of the servers for step #1 are physically present in PHX. 2 of the app servers have been installed today, the other 3 will be reinstalled tomorrow (replacing RHEL5.5 with RHEL6) along with the celery server. The db servers are already setup and will remain as-is on RHEL5.5

James Socol [:jsocol, :james]

Comment 2

•

14 years ago

Is going from 4 database servers (1 master + 3 slaves) to 2 going to increase the risk of downtime? That seems like a lot less redundancy and load balancing.

Corey Shields [:cshields]

Assignee

Comment 3

•

14 years ago

Ahh.. This is my mistake, I took what servers we already had allocated for sumo in phx (being 3 app servers and 2 db servers) and added 2 more app servers and a celery server.. I think we will be okay though, I believe we still have another 2 systems matching the db specs down there. I will double check tomorrow and add them to the list. Thanks for pointing that out, I was going on the assumption that the current setup in phx was duplicating what we already had.

James Socol [:jsocol, :james]

Comment 4

•

14 years ago

We only added the third slave fairly recently, but the total, for the record, now is: 4 app servers (one running celery) (pm-app-sumo0{1-4}) 1+3 database servers (tm-sumo01-master01, tm-sumo01-slave0{1-3}) and we depend on pm-app-sphinx01/02 and pm-app-memcache(01-03, I think). pm-app-sphinx01/02 serves a number of sites, and seems to be doing fine, but is there a plan to move some shared sphinx/memcache infrastructure to PHX so we're not doing the 1400-mile round-trip during requests? Or is that not worth worrying about at this point?

Shyam Mani [:fox2mike]

Comment 5

•

14 years ago

(In reply to comment #4) > pm-app-sphinx01/02 serves a number of sites, and seems to be doing fine, but is > there a plan to move some shared sphinx/memcache infrastructure to PHX so we're > not doing the 1400-mile round-trip during requests? Or is that not worth > worrying about at this point? We already have sphinx in phx, spinning up memcache servers shouldn't be too hard.

James Socol [:jsocol, :james]

Comment 6

•

14 years ago

(In reply to comment #5) > We already have sphinx in phx, spinning up memcache servers shouldn't be too > hard. Awesome! Just making sure everything's recorded.

Corey Shields [:cshields]

Assignee

Comment 7

•

14 years ago

app servers are kickstarted.. memcache server has been kickstarted. I'll kickstart the db and celery hosts tomorrow, and start to puppetize them.

OS: Mac OS X → Other

Justin Dow [:jabba]

Comment 9

•

14 years ago

Missing packages from rhel6: pylibmc libmemcached 0.38-1 (only 0.31-1 is available) php-eaccelerator php-pear-HTML-Common php-pear-HTML-QuickForm PIL I've put in conditionals in the puppet manifests to distinguish between rhel5 and rhel6 and have removed the offending packages for now to make puppet stop complaining. Some of these might not be needed anymore, but if they are, then they will need to be built from SRPM and dropped into mrepo and then re-added to puppet.

Justin Dow [:jabba]

Comment 10

•

14 years ago

removed php-pecl-fileinfo as well as it was conflicting with php-common. Also seems there are some apache issues. I've tried replacing httpd.conf with a rhel6 version, but I see lots of wsgi related issues in error_log, probably needs some debugging and testing from someone that knows about wsgi.

James Socol [:jsocol, :james]

Comment 11

•

14 years ago

If it saves you time, we don't need PHP at all.

Justin Dow [:jabba]

Comment 12

•

14 years ago

That saves loads of time actually. And I think it will be best to re-write the puppet class altogether. Is there a comprehensive list of all the requirements for SUMO, including apache modules and such?

James Socol [:jsocol, :james]

Comment 13

•

14 years ago

Requirements (from our docs/requirements files): Python and compiled Python packages (we should have RPMs already): * Python 2.6 * MySQL-Python 1.2.3 * Jinja2 >= 2.5.2 * PIL 1.1.7 * lxml 2.2.6 Apache: * mod_wsgi * mod_rewrite * mod_expires Honestly I think those are our only Apache requirements. Misc: * RabbitMQ (assuming it runs on the celery box, it only needs to be on that one). * Sphinx Search (already exists in PHX [comment 5]). * Memcached (comment 5). All the other requirements that I can think of are pure Python and in our vendor library.

James Socol [:jsocol, :james]

Comment 14

•

14 years ago

Oh, I lied a little: PIL has additional requirements: * libjpeg * zlib (and/or whatever it takes for PIL to compile with PNG support)

Justin Dow [:jabba]

Comment 15

•

14 years ago

is libmemcached required to be version 0.38-1 ? that is what the current puppet manifest calls for, and I suspect that is an RPM that we built ourselves for RHEL5. Also, PIL doesn't seem to be in the RHEL6 repos, so we'll have to build that one as well. Not sure yet about Jinja. I need to check versions of the other stuff.

James Socol [:jsocol, :james]

Comment 16

•

14 years ago

Strictly speaking, I don't believe we need libmemcached, either. We aren't using pylibmc, we're using python-memcached, which is pure-Python. If we do go with pylibmc, which AMO is using, I think we need >= 0.34. They should have RPMs that are at least the correct version of libmemcached, if not the correct RHEL.

Justin Dow [:jabba]

Comment 17

•

14 years ago

Great, pylibmc isn't in the repo either, so I'm happy skipping that dependency and libmemcached...

Justin Dow [:jabba]

Comment 18

•

14 years ago

[root@support1 ~]# rpm -qa |grep MySQL MySQL-python-1.2.3-0.3.c1.1.el6.x86_64 [root@support1 ~]# rpm -qa |grep Jinja Jinja2-2.5.2-2.x86_64 [root@support1 ~]# rpm -qa |grep lxml python-lxml-2.2.3-1.1.el6.x86_64 [root@support1 ~]# rpm -qa |grep mod_wsgi mod_wsgi-3.2-1.el6.x86_64 is that version of lxml going to work?

James Socol [:jsocol, :james]

Comment 19

•

14 years ago

I'll need time to test that version. Are you sure 2.2.6 isn't available anywhere?

Corey Shields [:cshields]

Assignee

Updated

•

14 years ago

Depends on: 624819

Corey Shields [:cshields]

Assignee

Comment 20

•

14 years ago

There is a 2.2.6 rpm in fedora 13. Sounds like if you need 2.2.6 we should build our own, which might be easier than troubleshooting against an older version.

James Socol [:jsocol, :james]

Comment 21

•

14 years ago

(In reply to comment #20) > There is a 2.2.6 rpm in fedora 13. Sounds like if you need 2.2.6 we should > build our own, which might be easier than troubleshooting against an older > version. What version are we running on the pm-app-sumo cluster? Clearly it works.

Corey Shields [:cshields]

Assignee

Comment 22

•

14 years ago

(In reply to comment #21) > What version are we running on the pm-app-sumo cluster? Clearly it works. [root@pm-app-sumo01 ~]# rpm -qa |grep lxml lxml-2.2.6-1 so yeah...

matthew zeier [:mrz]

Updated

•

14 years ago

Depends on: 615547

Corey Shields [:cshields]

Assignee

Comment 23

•

14 years ago

Quick update: database servers are all kickstarted and puppetized. Replication is going between 1-2, will setup 3 & 4 tomorrow. James, we should chat soon about what we will need to do DB wise when it comes time to cutover.

Corey Shields [:cshields]

Assignee

Updated

•

14 years ago

Depends on: 625646

Corey Shields [:cshields]

Assignee

Comment 24

•

14 years ago

James, we are going to re-do the webheads with rhel5.5 like the others, since it will mean a quicker time to migration. We don't have the bandwidth to fix and test a new stack with everything else going on right now.

Corey Shields [:cshields]

Assignee

Comment 25

•

14 years ago

Current plan is for SUMO to move to PHX during the 2/10 maintenance window. We have the RHEL5.5 environment "up" but untested, and the admin functions still need moved over. As discussed on a call this morning, jabba is going to keep working on a RHEL6 + puppet environment for SUMO in PHX. The drop-dead date for this to be working is EOB Friday (2/4). If we get that working in time we will change the other webheads which should be no problem using puppet.

matthew zeier [:mrz]

Updated

•

14 years ago

Flags: needs-downtime+

Whiteboard: [tracker] → [tracker][02/10/2011]

matthew zeier [:mrz]

Updated

•

14 years ago

Whiteboard: [tracker][02/10/2011] → [02/10/2011 @ 4pm][tracker]

Corey Shields [:cshields]

Assignee

Comment 26

•

14 years ago

Steps for today: 1) set the following in settings_local.py to set read_only: read_only_mode(globals()) then /data/sumo/deploy 2) dump the openfire_chat and support_mozilla_com databases from tm-sumo01-master01.mozilla.org 3) copy and import those databases to support1.db.phx1.mozilla.com 4) change the rw and ro db VIPs in SJC to point to the 4 servers in PHX (necessary for dm-chat01 and metrics) 5) Have QA test and check phx1 setup 6) change support.mozilla.com dns from 63.245.209.132 to 63.245.217.50 7) un-set read_only_mode(globals()) in settings_local.py then /data/sumo/deploy

James Socol [:jsocol, :james]

Comment 27

•

14 years ago

(In reply to comment #26) > 1) set the following in settings_local.py to set read_only: > read_only_mode(globals()) then /data/sumo/deploy Note that this must be the *LAST* line of settings_local.py.

Corey Shields [:cshields]

Assignee

Comment 28

•

14 years ago

SUMO is now live in phx1 and tested successfully.. We will have a postmortem for the move tomorrow at 16:00.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Stephen Donner [:stephend] Not actively reading bugmail

Comment 29

•

14 years ago

All dependent bugs are fixed, and the move has "long-since" been verified. Verified FIXED.

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard