Closed Bug 652292 Opened 10 years ago Closed 9 years ago

Set up "bedrock" "dev-style" staging server

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: wenzel, Assigned: nmaul)

References

Details

(Whiteboard: [2011q3])

Please set up a staging server for the new Django app "bedrock". This is the codename of the new mozilla.org, but its current and first focus is to serve the mozilla careers page.

The code is here:
https://github.com/mozilla/bedrock

Please set up a DB, check out the code (git clone --recursive ...), edit settings_local, allow it to auto-update by setting up a cron job for ``bin/update_site.py stage``.

The site won't do much yet, except show an empty page when going to /careers.
In the new order of things, where we'll have "dev", "stage", and "prod" environments, this is meant to be the "dev" environment, thus auto-updating from master.
Blocks: 648195
Currently, www.mozilla.org is a completely static site (no database at all). Does this new site mean that a database will be required for it? That seems like such a large change will require IT planning to support it, especially since I believe there is a specific static cluster that www.mozilla.org and www.mozilla.com are on, which is completely separate from the other various clusters. This may have changed in the year+ since I was a sysadmin, but it still would be good to check on that to see if that's still true.
Yup, thanks for bringing that up. We're working closely with IT to make this as smooth as possible. The goal is to use a DB not at all for regular content served on mozilla.org, but having the ability to do so for select subsites, such as the careers site (and others in the future).
Assignee: server-ops → jdow
We'd like to start iterating on the careers site pretty soon. Any chance of getting a staging server set up this week?
Severity: normal → major
Paul, yes I can get started on it. First thing we'll need is a design document. Please go to https://mana.mozilla.org/wiki/display/websites/Bedrock and edit the page, filling in as much information as you know about the project, so I'll know how to get started.

For dev and stage sites, we've been using little low power servers, and I've been setting up a dev "cluster" with a single node and a stage "cluster" with 2 nodes, where the dev cluster gets auto-updated from github or wherever your code is and the stage cluster will be treated like production and updated manually or however production gets updated. Please let me know if this setup will work.
(In reply to comment #5)
> Paul, yes I can get started on it. First thing we'll need is a design document.
> Please go to https://mana.mozilla.org/wiki/display/websites/Bedrock and edit
> the page, filling in as much information as you know about the project, so I'll
> know how to get started.

Thanks Justin. I'm getting a 401 on that page after logging in with my LDAP credentials.
That's strange. I have it set to same permissions as the Intranet. Are you able to log into the intranet?
I think I've fixed mana now. Took some tweaking of the database to workaround an ldap connection bug... Please try again :)
(In reply to comment #8)
> I think I've fixed mana now. Took some tweaking of the database to workaround
> an ldap connection bug... Please try again :)

Looks good! Thanks jabba. I'll fill out more information and update here.
(In reply to comment #5)
> Paul, yes I can get started on it. First thing we'll need is a design document.
> Please go to https://mana.mozilla.org/wiki/display/websites/Bedrock and edit
> the page, filling in as much information as you know about the project, so I'll
> know how to get started.

I've filled out some details:

https://mana.mozilla.org/wiki/display/websites/Bedrock

Pretty light right now, but hopefully enough to get you started.

> For dev and stage sites, we've been using little low power servers, and I've
> been setting up a dev "cluster" with a single node and a stage "cluster" with 2
> nodes, where the dev cluster gets auto-updated from github or wherever your
> code is and the stage cluster will be treated like production and updated
> manually or however production gets updated. Please let me know if this setup
> will work.

That should be fine.
fyi this is going to sit for a bit until we get db servers for it going (656189)
Component: Server Operations → Server Operations: Projects
Assignee: jdow → jeremy.orem+bugs
Whiteboard: [2011q3]
Hey Corey, any ETA on this? Are the db servers up? (I can't access bug 656189)
Unassigning Jeremy who is ooto, and CCing JakeM who was working on the mozilla.org merge: Can we get this started? I can also file a new bug if this is getting too crufty.

What we are looking for is to co-host this with the existing www.mozilla.org staging site. I *think* we'd want to hook up bedrock under:
www-dev.mozilla.org/b/...

, much like AMO/Zamboni has been doing it. This way all of bedrock would be available under that directory. On a case-by-base basis, we'd then make directory aliases, like:
([-a-z]+/)?/careers -> /b/...

Perhaps we should discuss a way to add such directories without needing to involve IT, but it won't be often, so it might be fine. Let me know if we should have a meeting or start an Etherpad or something to discuss this further?

Thanks!
Assignee: jeremy.orem+bugs → nobody
Moving out of projects and into the Web Ops component. Also dropping prio to keep from paging.

I *think* we can set this up fairly easily for the current dev and stage environments, but not so easily for the current prod, which is older and lacks any pre-existing django stuff.

It might be a better idea to relocate the current dev/stage environments over to the generic cluster in PHX1, which should be comparatively good-to-go, and will likely be the final resting place for the prod site later on anyway. This would obviously be a bit more involved, but will be better in the long run. It would also probably solve the "dev/stage for www.mozilla.org is down" bugs we've been getting every few days, due to mrapp-stage04 apparently having some trouble.
Assignee: nobody → server-ops
Severity: major → normal
Component: Server Operations: Projects → Server Operations: Web Operations
QA Contact: mrz → cshields
(In reply to Jake Maul [:jakem] from comment #14)
> I *think* we can set this up fairly easily for the current dev and stage
> environments, but not so easily for the current prod, which is older and
> lacks any pre-existing django stuff.

That makes sense. Let's figure this out as soon as possible, so we can make the necessary changes over the course of the quarter and not turn into panic mode halfway through December. We're in the lucky position here that this won't have to go live with a "bang" so we should be able to make infrastructure changes gradually and magically end up with bedrock in production (for a handful of paths) without major disruptions.

What will be the best way for you to proceed for this? Should we have a call, or start a wiki page outlining all the necessary steps, and file bugs as appropriate?

> It might be a better idea to relocate the current dev/stage environments
> over to the generic cluster in PHX1, which should be comparatively
> good-to-go, and will likely be the final resting place for the prod site
> later on anyway. This would obviously be a bit more involved, but will be
> better in the long run. It would also probably solve the "dev/stage for
> www.mozilla.org is down" bugs we've been getting every few days, due to
> mrapp-stage04 apparently having some trouble.

I'll trust your judgment there: If the other cluster has Python and a DB that we can hook into, then this sounds like a good idea. Will you be able to transfer the merged mozilla.org staging instance over somewhat efficiently?
(In reply to Fred Wenzel [:wenzel] from comment #15)
> That makes sense. Let's figure this out as soon as possible, 

mrapp-stage04 does have the WSGI stuff, so it would be capable of running bedrock if we wanted to. Still thinking it's better to just move to generic in phx1 and be done with it, though.

> What will be the best way for you to proceed for this? Should we have a
> call, or start a wiki page outlining all the necessary steps, and file bugs
> as appropriate?

First step is just to migrate www-dev.allizom.org and www.allizom.org from mrapp-stage04 to the generic cluster in PHX1. Should be relatively easy, especially since there's no database or anything to migrate just yet.

Next step will be to have you check in whatever django-type stuff you want into the repo, or possibly maintain a separate repo that has only bedrock stuff in it... not really sure. We might want to follow in the footsteps of either AMO or MDN on this... they both have PHP+Python websites.

Yeah, maybe a wiki page to keep track of the whole project and outline steps would be a good idea. Want to make one with what you know of and I'll fill in the things I can think of?

> I'll trust your judgment there: If the other cluster has Python and a DB
> that we can hook into, then this sounds like a good idea. Will you be able
> to transfer the merged mozilla.org staging instance over somewhat
> efficiently?

Should be easy since at the moment they're just simple PHP sites with no DB backend. And yes, that cluster does have access to a DB cluster, so that'll be taken care of too.
(In reply to Jake Maul [:jakem] from comment #16)
> Yeah, maybe a wiki page to keep track of the whole project and outline steps
> would be a good idea. Want to make one with what you know of and I'll fill
> in the things I can think of?

Sounds good, jlong and I will make one and get back to you. Thanks!
One change to my comment 14 and comment 16: This will not actually be on the "generic" cluster in PHX1, but rather a new cluster we're going to spin up just for this. We'd discussed this a couple months ago during the IT Puppet All-Hands week (you/me/cshields), but I'd forgotten about it.

There are a few reasons for a separate cluster, one being that most all of our top-line sites have their own cluster (AMO, SUMO, Input, MDN, and the current www.mozilla.org... which btw needs a cute acronym too). This gives good performance and security isolation... won't be affected by other sites. The dev/stage spin-up will be in PHX1, but later we'll spin up a new "prod" cluster in PHX1 and SJC1 (or SCL3, depending on the timing).

Fortunately we've pretty well nailed down how to roll out new clusters with puppet, and we have available Seamicro nodes to do it in PHX1 right now. That means rolling out a new admin node and cluster is actually pretty quick, so this won't delay us much at all. :)
Admin node (for now): bedrockadm.seamicro.phx1.mozilla.com
dev: bedrock[1-2].dev.seamicro.phx1.mozilla.com
stage: bedrock[1-4].stage.seamicro.phx1.mozilla.com

new bedrock files have been added to the webapp module in puppet, you should be able to start from there Brian.  I've started updating mana as well.  Web configs for the dev and stage sites will need created, but I've already seeded the trees so you can go to bedrockadm in to the dev or stage /src/www.mozilla.org/ tree, trash the bedrock dir and check out their repo as the new bedrock dir, then run deploy to sync and push it out.

Fred, please let me know of devs (other than yourself) who should have ssh access to dev and stage.
Thanks for getting that ball rolling, cshields, awesome!

Brian/Jake: Let us know if you need to discuss comment 13 more. While the idea is to test bedrock, dev/stage will only be meaningful in combination with the existing PHP setup.

(In reply to Corey Shields [:cshields] from comment #19)
> Fred, please let me know of devs (other than yourself) who should have ssh
> access to dev and stage.

James Long (jlong), for now.
For dev and stage, you can start with the Apache configs on mrapp-stage04 for www-dev.allizom.org and www.allizom.org. Those should more or less "just work", adjusting for any path differences.

You can also make tarballs of those 2 sites in /data on mrapp-stage04 and put them on bedrockadm. That's probably how I'd do it... less to break. The update tasks for them are in /etc/cron.d as usual... those'll need to come over also.

At that point we'll need to set up a Zeus VIP in PHX1 to point at it, then change DNS for one of the 2 sites (dev or stage) to point at that VIP and make sure it seems to be working, then change the other.
Have any database nodes been provisioned?
Additionally I copied www.allizom.org/www-dev.allizom.org from mrapp-stage04 to bedrockadm and filled /

[localhost] err: fatal: Unable to create '/data/bedrock-stage/www/.git/index.lock': No space left on device
[localhost] err: fatal: Unable to create '/data/bedrock-stage/www/.git/index.lock': No space left on device

Is there anything that can be trimmed to save disk space?

[root@node70.seamicro.phx1 cron.d]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_node70-lv_root
                      9.5G  9.2G     0 100% /
Assignee: server-ops → bhourigan
:digi - a httpd restart freed ~5.5 gigs on mrapp-stage04 earlier today.
Okay, this isn't going to work. Even if we can trim the docroot, we're still going to grow 2 more copies later on... prod. I took a quick look and didn't see anything that's obviously removable to my eyes.

I'll see about getting a VM for bedrockadm.

As far as databases go, dev/stage will use the dev database cluster already up and running in PHX1... the cluster is up, but no database or user has been allocated yet. Prod will get its own cluster with master-master replication to SCL3 later this year. This will probably need to be purchased, unless we happen to have spare blades and whatnot in PHX1. That'll happen around the same time we set up a prod cluster in PHX1 for bedrock.
(In reply to Fred Wenzel [:wenzel] from comment #17)
> (In reply to Jake Maul [:jakem] from comment #16)
> > Yeah, maybe a wiki page to keep track of the whole project and outline steps
> > would be a good idea. Want to make one with what you know of and I'll fill
> > in the things I can think of?
> 
> Sounds good, jlong and I will make one and get back to you. Thanks!

In response to this, I made a wiki page here: https://wiki.mozilla.org/Mozilla.com/Bedrock/Staging

If you get blocked on something, or something difficult comes up, can you all write a quick note there? It's an easier way to track everything going on then bugs. We'll probably use that as a place to keep track of creating the production instance too at some point.
Dev and stage databases are ready to go... I know that's a ways off, but I was already there setting up DB's for basket dev/stage, so I just did it. I have the credentials, will store them on the new admin node when it comes online. That's currently blocking any further IT work on bedrock.
Woot!
Assignee: bhourigan → nmaul
The new bedrockadm is mostly ready to go (just needs some cron jobs and ACLs). I opened a new bug for NetOps regarding the ACLs, and am working on the crons.

On the actual web nodes (2 dev, 4 stage!), I'm working on getting them set up to at least run the current site... Then we can start toying with integrating bedrock similarly to how AMO integrates remora (PHP) and zamboni (django).
Hey Jake, any other update here? We're starting to actually build out pages now, so a dev/staging server would be really helpful. How much more work is there? If it's just a few more things to setup, can we schedule a time for that to happen?
Doing datacenter work early this week, hope to get working on this more in the latter half of the week.
Have done some work on this. I need to know what URLs you would like to have sent to the Django app so I can have Apache rewrite them to it. Thanks!
(In reply to Jake Maul [:jakem] from comment #32)
> Have done some work on this. I need to know what URLs you would like to have
> sent to the Django app so I can have Apache rewrite them to it. Thanks!

Great! Can you do what zamboni does and have all URLs under /b/ be handled by the django app?

So:

/b/channel
/b/home

bedrock would handle these URLs and with the /b/ stripped.
www-dev.allizom.org is moved over to the bedrock cluster, and Django handling of the /b/ path is configured.

Any URL you request with /b/ in the path should "just work". There is a slight problem at the moment, where it seems like PHP and/or .htaccess is hijacking links to /b/ and inserting the locale before them (/en-US/b/) resulting in a 404. jlongster is working on this.

If there are any URLs you'd like to have go to django without having the /b/ in the path, let me know and I should be able to rewrite them to go there. AMO does this with a lot of URLs... they rewrite to /z$1, so that certain URLs are managed by django but the user doesn't see the /z/ in the path.
I commited a test patch in r97927 to see if I could tell the prefetch file to ignore /b/.

Jake, how exactly is the URL rewriting setup? Does it go Apache -> (process .htacess file) -> Django ?

I'm not sure how to stop all the handlers from running and pass it through to Django. There's the "AddHandler init prefetch.php" command so that prefetch handles all requests, and there's 404 handlers too.
This has been taken care of.

The issue was a typo in WSGIScriptAlias. This does take precedence over .htaccess processing in the main PHP app, so that was a red herring. Once that was fixed the app started working, mostly.

After this we fixed this, we needed to improve the update script, as we were not pulling in vendor submodules. This is fixed as well.

After that, we needed to set up an Alias for /media, to go to the django bedrock/media directory. This is fixed as well.

At this point I think Infra's involvement in this is completed. Our next step will be to migrate www.allizom.org at some point in the future, but that's out of scope of this bug (which is for dev).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
There's one last issue which is that the media needs to be compressed using the jingo-minify command `./manage.py compress_assets` on the server. Jake is doing that now and it should fix it.
Everything is working. Thanks Jake!
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.