Closed Bug 708035 Opened 13 years ago Closed 12 years ago

Setup production servers for bedrock

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlong, Assigned: mburns)

References

Details

(Whiteboard: [1/31])

This bug tracks the progress of settings up servers in the production environment for hosting bedrock, the new Python mozilla.org site.

We have plans to get one or two simple, low-traffic pages serving by the end of Q4. Jake has said we can work towards that.
Jake, over to you? :) 

Or let one of us know the plan, we'll be happy to help.
Assignee: server-ops → nmaul
Please note that we really don't need a database for this at first. We would like one soon (maybe Q1 of next year, if that) and leverage the work you guys are doing for multi-cluster dbs, but the main site will never use a database.

The databases will be used on smaller pages and subsites, which is not an urgent need.
Jake, you've mentioned that all you need is a date for this to go live. Fred and I talked and we settled on January 31st. We want to have just one or two pages by then, something like the 404 page or http://www.mozilla.org/en-US/about/participate/.

Again, no database required. Just a basic Python setup. Please let me know if this works.
Whiteboard: [1/31]
Here's the plan. For new CC's, 'bedrock' is the new Django app for www.mozilla.org.

Spin up bedrock prod nodes in PHX1. We need to scrounge up hardware for this. This will become a sole source for www.mozilla.org for a couple weeks (apart from caching nodes, which my understanding is a large portion of the content will be cache-able. We'll do caching nodes in at least AMS1 (for EU performance) and probably SJC1 (for redundancy). This should let us hit the 1/31 date.

We'll spin up a new prod cluster in SCL3, as soon as we can. This will probably be mid-to-late February. This will replace the caching node in SJC1.

Once we have prod stuff in SCL3, we'll start work on the database replication issues. Unknown exactly how long this may take, but hopefully not too long. We have a DB set up in PHX1 already, we'll need to set up the SCL3 end and the replication between them... so, the hard part. :)

By sometime in March, we aim to be completely "full-production" in both places.



Assigning to cshields for node allocation for PHX1 prod cluster. oremj mentioned it would be okay to take 3 nodes from the new rack of 64 that should hopefully be online soon-ish. There is already an admin/dev/stage cluster to plug this into, but might need a little bit of puppet tweaking for prod.

Adding WebOps folks to CC list so they know what's up with this project, just so they're up to speed on the overall plan.
Assignee: nmaul → cshields
OS: Mac OS X → All
Hardware: x86 → All
The new rack of blades is online, we should be able to allocate the 3 we need shortly. I believe Phong's team is still working on getting the chassis's all squared away first (HP Onboard Admin stuff), then we can do snag 3 and get to work.
OS: All → Mac OS X
Hardware: All → x86
Assigning to Phong for action:

cshields says we can re-allocated pp-zlb03 and pp-zlb04 for this, as well as spin up a VM in PHX1 (really want to have 3 nodes). The sooner we can have these re-kicked and puppetized, the better: we want to have live content on them by Monday.

OS should be RHEL6, 64-bit

Names should be:
bedrock[123].webapp.phx1.mozilla.com

Thanks!
Assignee: cshields → phong
Severity: normal → major
OS: Mac OS X → All
Hardware: x86 → All
Just a ping here. We want to push this next week. We were thinking Monday, but with the Firefox release on Tuesday, maybe we should push it to Thursday or something. What do you think?
Blocks: 721878
I've created DNS entries for all 3 hosts and created a VM for the third one:

bedrock1.webapp.phx1.mozilla.com has address 10.8.81.86
bedrock2.webapp.phx1.mozilla.com has address 10.8.81.87
bedrock3.webapp.phx1.mozilla.com has address 10.8.81.88 (this is the VM)
Assignee: phong → mburns
\o/ Thanks guys. Thursday's looking good. The push bug is the blocked bug.
After a bit of poking and prodding, these hosts have been all setup with rhel6+puppet.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
How much power do these servers have? Zeus will cache these pages for 15 minutes, so I'm assuming we can still handle a good bit of load.
Should be quite powerful. Only 3 nodes, but 2 of them are ex-Zeus nodes with ridiculous amounts of RAM and CPU power.

If we need more, we should be able to get a blade or two like we were originally planning... but I don't see that being very likely.
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.