Closed
Bug 622752
Opened 14 years ago
Closed 14 years ago
Build out SUMO in PHX
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: cshields, Assigned: cshields)
References
Details
(Whiteboard: [02/10/2011 @ 4pm][tracker])
1 - we will build the following systems in PHX:
5 app servers
2 db servers
1 celery server
2 - puppetize them
3 - install SUMO, copy data
4 - zeus config
5 - test
Assignee | ||
Comment 1•14 years ago
|
||
A status update to this... All of the servers for step #1 are physically present in PHX. 2 of the app servers have been installed today, the other 3 will be reinstalled tomorrow (replacing RHEL5.5 with RHEL6) along with the celery server. The db servers are already setup and will remain as-is on RHEL5.5
Comment 2•14 years ago
|
||
Is going from 4 database servers (1 master + 3 slaves) to 2 going to increase the risk of downtime? That seems like a lot less redundancy and load balancing.
Assignee | ||
Comment 3•14 years ago
|
||
Ahh.. This is my mistake, I took what servers we already had allocated for sumo in phx (being 3 app servers and 2 db servers) and added 2 more app servers and a celery server.. I think we will be okay though, I believe we still have another 2 systems matching the db specs down there. I will double check tomorrow and add them to the list.
Thanks for pointing that out, I was going on the assumption that the current setup in phx was duplicating what we already had.
Comment 4•14 years ago
|
||
We only added the third slave fairly recently, but the total, for the record, now is:
4 app servers (one running celery) (pm-app-sumo0{1-4})
1+3 database servers (tm-sumo01-master01, tm-sumo01-slave0{1-3})
and we depend on pm-app-sphinx01/02 and pm-app-memcache(01-03, I think).
pm-app-sphinx01/02 serves a number of sites, and seems to be doing fine, but is there a plan to move some shared sphinx/memcache infrastructure to PHX so we're not doing the 1400-mile round-trip during requests? Or is that not worth worrying about at this point?
Comment 5•14 years ago
|
||
(In reply to comment #4)
> pm-app-sphinx01/02 serves a number of sites, and seems to be doing fine, but is
> there a plan to move some shared sphinx/memcache infrastructure to PHX so we're
> not doing the 1400-mile round-trip during requests? Or is that not worth
> worrying about at this point?
We already have sphinx in phx, spinning up memcache servers shouldn't be too hard.
Comment 6•14 years ago
|
||
(In reply to comment #5)
> We already have sphinx in phx, spinning up memcache servers shouldn't be too
> hard.
Awesome! Just making sure everything's recorded.
Assignee | ||
Comment 7•14 years ago
|
||
app servers are kickstarted.. memcache server has been kickstarted. I'll kickstart the db and celery hosts tomorrow, and start to puppetize them.
OS: Mac OS X → Other
Comment 9•14 years ago
|
||
Missing packages from rhel6:
pylibmc
libmemcached 0.38-1 (only 0.31-1 is available)
php-eaccelerator
php-pear-HTML-Common
php-pear-HTML-QuickForm
PIL
I've put in conditionals in the puppet manifests to distinguish between rhel5 and rhel6 and have removed the offending packages for now to make puppet stop complaining. Some of these might not be needed anymore, but if they are, then they will need to be built from SRPM and dropped into mrepo and then re-added to puppet.
Comment 10•14 years ago
|
||
removed php-pecl-fileinfo as well as it was conflicting with php-common. Also seems there are some apache issues. I've tried replacing httpd.conf with a rhel6 version, but I see lots of wsgi related issues in error_log, probably needs some debugging and testing from someone that knows about wsgi.
Comment 11•14 years ago
|
||
If it saves you time, we don't need PHP at all.
Comment 12•14 years ago
|
||
That saves loads of time actually. And I think it will be best to re-write the puppet class altogether. Is there a comprehensive list of all the requirements for SUMO, including apache modules and such?
Comment 13•14 years ago
|
||
Requirements (from our docs/requirements files):
Python and compiled Python packages (we should have RPMs already):
* Python 2.6
* MySQL-Python 1.2.3
* Jinja2 >= 2.5.2
* PIL 1.1.7
* lxml 2.2.6
Apache:
* mod_wsgi
* mod_rewrite
* mod_expires
Honestly I think those are our only Apache requirements.
Misc:
* RabbitMQ (assuming it runs on the celery box, it only needs to be on that one).
* Sphinx Search (already exists in PHX [comment 5]).
* Memcached (comment 5).
All the other requirements that I can think of are pure Python and in our vendor library.
Comment 14•14 years ago
|
||
Oh, I lied a little: PIL has additional requirements:
* libjpeg
* zlib (and/or whatever it takes for PIL to compile with PNG support)
Comment 15•14 years ago
|
||
is libmemcached required to be version 0.38-1 ? that is what the current puppet manifest calls for, and I suspect that is an RPM that we built ourselves for RHEL5. Also, PIL doesn't seem to be in the RHEL6 repos, so we'll have to build that one as well. Not sure yet about Jinja. I need to check versions of the other stuff.
Comment 16•14 years ago
|
||
Strictly speaking, I don't believe we need libmemcached, either. We aren't using pylibmc, we're using python-memcached, which is pure-Python.
If we do go with pylibmc, which AMO is using, I think we need >= 0.34. They should have RPMs that are at least the correct version of libmemcached, if not the correct RHEL.
Comment 17•14 years ago
|
||
Great, pylibmc isn't in the repo either, so I'm happy skipping that dependency and libmemcached...
Comment 18•14 years ago
|
||
[root@support1 ~]# rpm -qa |grep MySQL
MySQL-python-1.2.3-0.3.c1.1.el6.x86_64
[root@support1 ~]# rpm -qa |grep Jinja
Jinja2-2.5.2-2.x86_64
[root@support1 ~]# rpm -qa |grep lxml
python-lxml-2.2.3-1.1.el6.x86_64
[root@support1 ~]# rpm -qa |grep mod_wsgi
mod_wsgi-3.2-1.el6.x86_64
is that version of lxml going to work?
Comment 19•14 years ago
|
||
I'll need time to test that version. Are you sure 2.2.6 isn't available anywhere?
Assignee | ||
Comment 20•14 years ago
|
||
There is a 2.2.6 rpm in fedora 13. Sounds like if you need 2.2.6 we should build our own, which might be easier than troubleshooting against an older version.
Comment 21•14 years ago
|
||
(In reply to comment #20)
> There is a 2.2.6 rpm in fedora 13. Sounds like if you need 2.2.6 we should
> build our own, which might be easier than troubleshooting against an older
> version.
What version are we running on the pm-app-sumo cluster? Clearly it works.
Assignee | ||
Comment 22•14 years ago
|
||
(In reply to comment #21)
> What version are we running on the pm-app-sumo cluster? Clearly it works.
[root@pm-app-sumo01 ~]# rpm -qa |grep lxml
lxml-2.2.6-1
so yeah...
Assignee | ||
Comment 23•14 years ago
|
||
Quick update: database servers are all kickstarted and puppetized. Replication is going between 1-2, will setup 3 & 4 tomorrow.
James, we should chat soon about what we will need to do DB wise when it comes time to cutover.
Assignee | ||
Comment 24•14 years ago
|
||
James,
we are going to re-do the webheads with rhel5.5 like the others, since it will mean a quicker time to migration. We don't have the bandwidth to fix and test a new stack with everything else going on right now.
Assignee | ||
Comment 25•14 years ago
|
||
Current plan is for SUMO to move to PHX during the 2/10 maintenance window. We have the RHEL5.5 environment "up" but untested, and the admin functions still need moved over. As discussed on a call this morning, jabba is going to keep working on a RHEL6 + puppet environment for SUMO in PHX. The drop-dead date for this to be working is EOB Friday (2/4). If we get that working in time we will change the other webheads which should be no problem using puppet.
Updated•14 years ago
|
Flags: needs-downtime+
Whiteboard: [tracker] → [tracker][02/10/2011]
Updated•14 years ago
|
Whiteboard: [tracker][02/10/2011] → [02/10/2011 @ 4pm][tracker]
Assignee | ||
Comment 26•14 years ago
|
||
Steps for today:
1) set the following in settings_local.py to set read_only: read_only_mode(globals()) then /data/sumo/deploy
2) dump the openfire_chat and support_mozilla_com databases from tm-sumo01-master01.mozilla.org
3) copy and import those databases to support1.db.phx1.mozilla.com
4) change the rw and ro db VIPs in SJC to point to the 4 servers in PHX (necessary for dm-chat01 and metrics)
5) Have QA test and check phx1 setup
6) change support.mozilla.com dns from 63.245.209.132 to 63.245.217.50
7) un-set read_only_mode(globals()) in settings_local.py then /data/sumo/deploy
Comment 27•14 years ago
|
||
(In reply to comment #26)
> 1) set the following in settings_local.py to set read_only:
> read_only_mode(globals()) then /data/sumo/deploy
Note that this must be the *LAST* line of settings_local.py.
Assignee | ||
Comment 28•14 years ago
|
||
SUMO is now live in phx1 and tested successfully..
We will have a postmortem for the move tomorrow at 16:00.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
All dependent bugs are fixed, and the move has "long-since" been verified.
Verified FIXED.
Status: RESOLVED → VERIFIED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•