Closed
Bug 863268
Opened 10 years ago
Closed 9 years ago
Migrate buildapi off of kvm in scl1 and onto the releng cluster
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: dustin)
References
Details
(Whiteboard: [2013Q4])
Attachments
(1 file)
2.84 KB,
patch
|
dustin
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
We need to migrate the following vms off of kvm in scl1: buildapi01.build.scl1.mozilla.com redis01.build.scl1.mozilla.com Catlee: is redis used for anything *but* buildapi?
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(catlee)
Comment 1•10 years ago
|
||
It's also used to store tokens and nonces for the signing server.
Flags: needinfo?(catlee)
Updated•10 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
Reporter | ||
Updated•10 years ago
|
Whiteboard: [2013Q4]
Reporter | ||
Updated•10 years ago
|
Assignee: server-ops-releng → dustin
Assignee | ||
Comment 2•10 years ago
|
||
BuildAPI has unimplemented support for using Memcached as a backend. So most likely we can just implement that and no longer use redis for Buildapi.
Assignee | ||
Comment 3•10 years ago
|
||
I'm going to morph this slightly, since we don't actually want to migrate redis. We want to kill it. Also, we need to migrate signing off of redis, which will be a new bug.
Summary: Migrate redis and buildapi off of kvm in scl1 → Migrate buildapi off of kvm in scl1 and onto the releng cluster
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Updated•10 years ago
|
Assignee | ||
Comment 5•10 years ago
|
||
Per bhearsum in bug 804334: Note to whomever does this: please update the deployment docs at https://wiki.mozilla.org/ReleaseEngineering/BuildAPI#Updating_code
Assignee | ||
Comment 6•10 years ago
|
||
Bug 946334 switches the messaging backend to the new backend, without moving the web service or changing the k/v store.
Assignee | ||
Comment 7•10 years ago
|
||
Since using mod_wsgi is new, I want to test it out in a staging environment first, so I'm knocking out bug 841345.
Depends on: 841345
Assignee | ||
Comment 8•10 years ago
|
||
OK, the dep bugs have a bunch of patches that get buildapi into a shape where, for me at least, it works on mod_wsgi and talks reliably to RabbitMQ. A few other code notes: * it's OK if mod_wsgi spawns multiple processes that are all running LoggingJobDoneConsumers, as the consumers are all equivalent (they just log the job completion in the DB). Whether kombu correctly survives such forks, I don't know, and will test. * logging needs to go somewhere And deployment notes: * the buildapi DB needs to be hosted on MySQL, not SQLite as it is now * this can be done from buildapi01 * crontasks need to get pulled out and run on the admin host * once the DB is in MySQL, these can be set up before the rest is migrated If we play our cards carefully, we can actually run the old and new buildapi instances in parallel, and transition from one to the other (and back, if necessary) with Apache config changes.
Assignee | ||
Comment 10•10 years ago
|
||
Kombu seems fine with mod_wsgi. Logging is set up (paster's logging config was full of fail, so I just configured it directly in buildapi.wsgi). I have the deployment largely figured out in staging. I'm going to add a fake selfserve agent that will run on the admin node, so we can test the kombu stuff in staging. Other than that, I'm waiting on flows for the DB and crontask changes above.
Assignee | ||
Comment 11•10 years ago
|
||
I've disabled the automatic updates of buildapi01 from hg, before landing the patches in the dep bug.
Assignee | ||
Comment 12•10 years ago
|
||
Attachment #8363708 -
Flags: review?
Assignee | ||
Comment 13•10 years ago
|
||
Comment on attachment 8363708 [details] [diff] [review] bug863268.patch I'll need to apply this on the old instance, too, so that it can talk to the buildapi DB.
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8363708 [details] [diff] [review] bug863268.patch r+ via irc (with an added comment)
Attachment #8363708 -
Flags: review? → review+
Assignee | ||
Updated•10 years ago
|
Attachment #8363708 -
Flags: checked-in+
Assignee | ||
Comment 15•10 years ago
|
||
OK, deployed on the production instance. It's now storing all data outside of the VM - either in MySQL, Redis, or RabbitMQ.
Assignee | ||
Comment 16•10 years ago
|
||
Oh, and the new job IDs are 600000 and higher, so you can recognize them.
Assignee | ||
Comment 17•10 years ago
|
||
I enabled mod_wsgi on the production cluster, and set up buildapi at /buildapi_new. It needs some flows before it will actually show data, but everything up to the point of connecting to the DB works fine.
Assignee | ||
Comment 18•10 years ago
|
||
(And an extra note to self: the prod instance is configured to use the staging instance's AMQP vhost, to avoid it consuming from the prod queue and making a mess. Once everything else looks good, I'll fix that and verify proper consumption)
Assignee | ||
Comment 19•10 years ago
|
||
I just switched the prod instance to use the prod AMQP vhost, and changed the /buildapi URI to point to that instance. nginx and paster are still running on buildapi01 as I believe some of the crontasks talk to localhost. With that, buildapi01 is also consuming jobrequest-finished messages and recording them in the DB. Redis is still running because some of them talk to redis.
Assignee | ||
Comment 20•10 years ago
|
||
I had missed adding `allowed_origins` to the paster config. That's fixed. It seems there was some caching somewhere along the line that caused that fix to take a while to "sink in".
Assignee | ||
Comment 21•10 years ago
|
||
Ergh, and gviz_api isn't installed either. Fixing.
Assignee | ||
Comment 22•10 years ago
|
||
I just created a 'buildapi' user in LDAP so that we can assign ownership to it on the relengweb netapp share.
Assignee | ||
Comment 23•9 years ago
|
||
At this point, most of the crontasks are running in parallel on the releng web admin host. I can't run the tasks that talk to the buildapi HTTP service until bug 970513 is closed (there's always something). Once that's done, we can make the cutover to serve this content directly from the webheads, at which point buildapi01 and redis01 can go away.
Assignee | ||
Comment 24•9 years ago
|
||
OK, I believe we're ready for the cutover. There are essentially two copies of the 'buildjson' directory now - one on buildapi01, which the production builddata URLs are proxying to, and one on an NFS volume, which the staging builddata URLs are proxying to: http://builddata-pub-build.allizom.org/builddata/buildjson https://secure-pub-build.allizom.org/builddata/buildjson (you can tell it's the netapp share by the presence of '@@@NETAPP' at the top - just a flag I added temporarily for my own sanity). It's just an apache change to "cut over" production to the NFS share. I'll organize when we can do that.
Assignee | ||
Comment 25•9 years ago
|
||
The switch is complete, with no ill effects that I can see.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•