Closed Bug 705760 Opened 13 years ago Closed 12 years ago

Migrate MXR from sjc1 to scl3

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: nmaul)

References

Details

(Whiteboard: SCL3)

+++ This bug was initially created as a clone of Bug #704374 +++

This should go in PHX1 or SCL3. It would be nice if we could bring this up before SJC1 goes away, so we can avoid physically moving the existing MXR installation altogether. However that blocks on some code revisions in MXR, so that may not work out even if Infra is good-to-go on it.

At least 1 web node to run the web interfaces.

At least 1 processing node to take care of backend indexing. With care, this could be overlayed onto the web nodes, and/or the admin node.

NFS storage between them... 350GB should be enough. Current usage for MXR is 165GB but is missing some trees that would be useful to add. Current DXR usage appears to be around 50GB.

Zeus frontend VIP. Don't know how much caching we can do, but we should be consistent regardless. may need 2 VIPs... one MXR, one DXR. One should suffice if we use a wildcard or SAN cert.

MXR will need a MySQL database + user. It will need converted from Berkeley DB files to MySQL to be moved, but will need this. Unknown what DXR needs as far as databases, but I suspect a MySQL DB/user would be appropriate there as well.

Possibly a node to run the "glimpseserver" package- alleviates clients from trying to read the index themselves, which saves I/O and probably scales to mulitple servers better. I don't know if MXR and DXR could coexist on the same glimpserver box... probably.

An admin node. This would handle the updating/indexing. That is, it would either do the work itself, or farm it out to the other systems. GNU parallel can do this natively, if we want.

If we centralize processing, the admin node should be relatively more powerful than we normally spec out. CPU is likely to be the limiting factor (assuming good storage performance), so 8+ cores. If we instead farm out work to the web nodes, this would be unnecessary and we should beef *them* up instead. 4-8 cores each would be nice.
(In reply to Jake Maul [:jakem] from comment #0)
> consistent regardless. may need 2 VIPs... one MXR, one DXR. One should
> suffice if we use a wildcard or SAN cert.

We can use 1 VIP and multiple SSL certs based on hostnames if needed too. As long as the backend VirtualHosts are setup right, this will work fine too.
IIRC (been a long time since I touched it) DXR won't share nicely with other stuff, that's why it's on its own box now.  DXR actually builds the source it's indexing, and instruments it with debug code it can hook into for parsing where symbols wind up after it builds, or something along those lines, so it'll need hardware capable of building Firefox without interfering with other stuff on the box (compilers tend to eat CPU).

I suspect it'd probably be sufficient to put the build portion of DXR on a separate box and have it share the results with the web servers in the cluster somehow...
jcramer doesn't know why he's on this bug, and emailed me in reply to my last comment here asking me to remove him from it.
He would have been on it because somewhere along the line o' bugs, someone forgot that jcranmer has an n.
This is no doubt already factored in, but just in case: the current MXR working directory is newer than what is in the mercurial repo, as there are quite a few uncommitted changes, if bug 681197 was anything to go by. (Wouldn't want some of the recent changes to be lost in the move).
Assignee: server-ops → nmaul
Hardware for this should be wrapped up by the end of next week, then we'll work on configuring that hardware and building a cluster out.

Then the fun of migrating these sites will begin.
Assignee: nmaul → server-ops-devservices
Component: Server Operations: Web Operations → Server Operations: Developer Services
QA Contact: cshields → shyam
Waiting on the hardware to be setup.
Whiteboard: SCL3
Jake's going to be leading this effort, we're going to have some hardware kicked first.
Assignee: server-ops-devservices → nmaul
Summary: Build out new "Tools" cluster, initially for MXR and DXR → Migrate MXR and DXR from sjc1 to scl3
Summary: Migrate MXR and DXR from sjc1 to scl3 → Migrate MXR from sjc1 to scl3
Renaming because DXR is already in PHX, so it's not as time-constrained... MXR needs to be moved more quickly because it's in SJC1. :)
Jake, 

Machines are online but not puppetized :

mxr-web1.webapp.scl3.mozilla.com
mxr-processor1.private.scl3.mozilla.com

Let me know if you need anything else from me.
Depends on: 748900
These are set up and largely working... puppetized and all. I've posted to mozilla.dev.planning with the new IP (63.245.215.42) inviting folks to test it. If nothing major is presented in a few days, we'll cut over to it.

One other minor thing that needs done... the old system emailed justdave and myself when it did updates... the new one just emails me for now. When we cut over, I need to remember to add justdave back to MAILTO. :)
Depends on: 751305
Depends on: 751636
Completed.

dm-webtools04 is still online, but I'm fairly sure nothing else of value runs on it. I disabled the MXR crons on it, and added justdave in to the mxr-processor1 job for now... this should ultimately go to a new "IT Developer Services" mailing list or something, I think... but this'll do for now.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.