Closed Bug 401005 Opened 17 years ago Closed 17 years ago

Move Bugzilla into the web cluster

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justdave, Assigned: justdave)

References

Details

(Whiteboard: waiting for patches)

We've been doing some preliminary work on moving Bugzilla into our web cluster.  Although not technically necessary yet (we're not straining the hardware it's currently running on yet) this will get us some redundancy and allow for future server upgrades (kernel security patches and so forth) without requiring any downtime (just do one backend server at a time and swap them in and out of rotation as they get completed).

There's a few concerns yet that need to be resolved before we can move it, but it looks like this might actually be doable without too much trouble with the 3.0.x versions of Bugzilla.

There's another bug to upgrade Bugzilla to a newer version, this should *not* happen in the same outage window, to minimize the possible things that could go wrong.
Flags: needs-downtime+
The current plan involves NFS-mounting the data directory and the graphs directory from each of the webheads.

One of my concerns with this so far is that we have a number of DoS mitigation tools on the current server that depend on the webserver knowing who it's actually serving traffic to, which isn't going to happen if it's behind a load balancer.  The real solution to this problem is to fix the parts of the app that are DoSable.

Frequent targets currently appear to be showdependencygraph.cgi and running queries that retrieve every public bug in the database.  The point of strain in all of these cases is the database server, not the webserver, and moving Bugzilla into the cluster won't make these DoS problems go away.  So we probably need to hunt down these features in Bugzilla and either disable them or make them more performant so they don't take out the database server if we're going to lose our tools to be able to cut off the people abusing them.
Another thing we need to check on is that because of mod_perl, Bugzilla uses a large amount of RAM.  recluse currently has 8 GB of RAM on it, and generally runs with about 2 GB in use, but can easily use up all 8 when someone pulls a bunch of large queries at once.  Because of the number of processes being handled, we could in theory divide this 8 GB between all of the webheads sharing the load and come out fine, so we'll probably be okay, but it's worth keeping an eye on as we test/deploy.
I did an initial set up on the php5 cluster.  All of the packages are installed and managed in bcfg2, a NS virtual server is set up, and it is currently pointing to the live database.

This instance can be tested by adding "63.245.209.72   bugzilla.mozilla.org" to /etc/hosts.

I have not messed with the mail set up on these boxes, so I'm guessing something needs to be done there.

We should be able to write a script that looks at the NSlogs to find the top bugzilla offenders after that we can use the NS to firewall them off as well as find their process with 'server-status' on the web-heads.  It won't be quite as easy as it is on recluse, but it should be possible.
Depends on: 255606
Depends on: 102622
Bug 102622 is already resolved upstream, but for 3.2/trunk only.  We'll need to backport it to 3.0 for bmo.
Whiteboard: waiting for patches
ok, this is up and running.  Was a bumpy ride, but it looks stable now. :)
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.