Closed Bug 799733 Opened 12 years ago Closed 11 years ago

[socorro staging] need 2 web servers (apache mod_wsgi for django)

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P4)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: nmaul)

References

Details

(Whiteboard: [triaged 20121019][waiting][803589])

We're getting read to ship a new django frontend for Socorro, and need to be able to test it in staging.

These will need access to the middleware servers and also to postgres (via the loadbalancer is fine/preferred).

The current Socorro config has a memcached installed on each webhead, we can continue that for the moment although I think it'd be ideal to have sepearate dedicated memcached boxes someday.
Assignee: server-ops → server-ops-webops
Component: Server Operations → Server Operations: Web Operations
QA Contact: jdow → cshields
Whiteboard: [pending triage]
Depends on: 803589
Admin node for socorro-staging is socorroadm.stage.private.phx1.mozilla.com... hopefully we can just plug this in as a new environment.
Priority: -- → P4
Whiteboard: [pending triage] → [triaged 20121019][waiting][803589]
Blocks: 788003
Hey, so is this ready to go and just waiting on us for info on how to load the app?
Depends on: 802692
(In reply to Robert Helmer [:rhelmer] from comment #2)
> Hey, so is this ready to go and just waiting on us for info on how to load
> the app?

Probably needs a little Puppet love, will bring up in the webops meeting tomorrow
Assignee: server-ops-webops → nmaul
ACL bug opened to for access between admin and new web nodes

new hostgroup added to Commander on admin node (puppet)


The big tricks are going to be Puppet manifests for the new nodes, and how to deploy this properly.

For the former, I'm hoping we can bypass most of what's already in puppet and start fresh. However there's an awful log here already, and I really don't have a good grasp on what will be needed for the new django app. It'll take a good bit of digging.

For deployment, I suspect that will be a fairly simple django app deployment, which we have dozens (hundreds?) of already. The main question to you (:rhelmer) will be, what is needed to deploy in terms of git / manage.py magic? Repos, branches, submodules, schematic/south, etc. If you've already got a script that does everything, we might be able to crib from it to bootstrap this part quicker.
(In reply to Jake Maul [:jakem] from comment #4)
> ACL bug opened to for access between admin and new web nodes
> 
> new hostgroup added to Commander on admin node (puppet)
> 
> 
> The big tricks are going to be Puppet manifests for the new nodes, and how
> to deploy this properly.
> 
> For the former, I'm hoping we can bypass most of what's already in puppet
> and start fresh. However there's an awful log here already, and I really
> don't have a good grasp on what will be needed for the new django app. It'll
> take a good bit of digging.
> 
> For deployment, I suspect that will be a fairly simple django app
> deployment, which we have dozens (hundreds?) of already. The main question
> to you (:rhelmer) will be, what is needed to deploy in terms of git /
> manage.py magic? Repos, branches, submodules, schematic/south, etc. If
> you've already got a script that does everything, we might be able to crib
> from it to bootstrap this part quicker.

cc'ing lonnen, who worked on the deployment for the django app.

Right now the django app is in a separate repo, and all dependencies are installed and bundled together into a build like this:

https://ci.mozilla.org/job/socorro-crashstats/lastSuccessfulBuild/artifact/socorro-crashstats.tar.gz

You can see the steps it does in https://github.com/mozilla/socorro-crashstats/blob/master/bin/jenkins.sh

The only DB interaction is for authentication/sessions ("./manage.py syncdb --noinput" once manually should be enough) because Socorro uses a REST middleware, and the PostgreSQL schema is managed separately.
I patterned the build + deploy (to dev) process after what we already do with the main socorro repo. Since it's a separate repo we can be flexible, though. If there is a standard way to set this up with an amo or sumo style red button for deployment that is probably preferred.
External resources the web servers will need access to:

Socorro Postgres server
Socorro Middleware server 

Memcache of some sort
Mozilla's BrowserID server
(In reply to Chris Lonnen :lonnen from comment #7)
> External resources the web servers will need access to:
> 
> Socorro Postgres server

If it matters, this can be "any" postgres server. We're using it to store authentication details for people who sign in. It does not read (or write to) any other tables part of Socorro.
Nodes are online and I've built a skeleton puppet config for the cluster.


TODO:

Admin node needs code to deploy (from jenkins), Chief, Chief deploy script, some simple/convenient way to override files within the app (settings/local.py)

ACLs between admin node and web nodes

Create admin node internal git repos, clone on web nodes... or switch to rsync deploys, either way should work for this.

Create apache config to run the app

Set up VIPs (bug 830515)

ACLs between web nodes and stuff in comment 7

Question to answer: do we need/want new memcache node(s) for this? It's django-1.4, so we can use KEY_PREFIX to have dev/stage/prod safely share memcache nodes. We can use existing nodes, if any existing ones are sensible. Does socorro already have some?

Anything else?
@jakem - In the current Socorro deploy memcached is running on each of the webheads. Each webhead uses all the other webheads to form a memcached pool. I imagine we shouldn't reuse those, but I'll defer to you on that.

I think rsync deploys would work best with our current setup.

Lastly, I think subdomains will be needed for the django site. crash-stats-new.allizom.org and crash-stats-new.mozilla.com should work
jakem: could you please cc me on bug 838905?
This is mostly set up now, and we're wading through configuration.

https://crash-stats-django.allizom.org/ is where it lives (such as it is at the moment).
@jake: this is great! Did you update mana with hostnames? I'd like to log in and investigate that 500.
Depends on: 842619
This is all working now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.