Persona is no longer an option for authentication on BMO. For more details see Persona Deprecated.
Last Comment Bug 464748 - Upgrade bouncer db master to BL460c
: Upgrade bouncer db master to BL460c
Product: Graveyard
Classification: Graveyard
Component: Server Operations (show other bugs)
: other
: All Other
: P1 minor (vote)
: ---
Assigned To: Dave Miller [:justdave] (
: matthew zeier [:mrz]
Depends on:
  Show dependency treegraph
Reported: 2008-11-13 11:46 PST by matthew zeier [:mrz]
Modified: 2015-03-12 08:17 PDT (History)
2 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Description matthew zeier [:mrz] 2008-11-13 11:46:47 PST
From #infra discussions, upgrade bouncer's master db to BL460c.  

Is 4GB RAM enough?  Is 146GB 10k RPM drives fast enough?
Comment 1 Dave Miller [:justdave] ( 2008-11-13 11:56:50 PST
Current server has 4GB RAM and 72GB 10krpm drives, so specs sound fine.

The main perf boost will probably be going to 64-bit for the OS (in addition to having faster processors on the blade, the existing box is HT instead of dual core.
Comment 2 Dave Miller [:justdave] ( 2008-12-10 14:27:51 PST
I'll try to get this staged tonight, we need to do this switchover during tomorrow's outage window to get it in time for the next FF release.
Comment 3 Dave Miller [:justdave] ( 2008-12-11 11:41:44 PST
Upgrade procedure:

1) Disable the sentry cron jobs
2) Disable logging in bouncer so we have no writes to the master
3) advise people not to mess with the bouncer admin
4) shut down mysql on mrdb-bouncer01 (bouncer itself should keep talking to the slaves to serve downloads)
5) rsync /var/lib/mysql and /var/lib/mysql-innodb from mrdb-bouncer01 to tm-bouncer01-master01
6) swap IP addresses between mrdb-bouncer01 and tm-bouncer01-master01 and make mrdb-bouncer01 be a cname to tm-bouncer01-master01
7) shut down replication streams on all the slaves
8) bring up mysqld on tm-bouncer01-master01
9) fix replication filenames to match new hostname and snag the replication pointers for later reference.
10) turn sentry/bouncer logging back on
11) fix replication config on all the slaves to point at the new master and resume replication
Comment 4 matthew zeier [:mrz] 2008-12-11 14:36:39 PST
fyi, this will impact bouncer's logging.  Copying interested parties.
Comment 5 Daniel Einspanjer [:dre] [:deinspanjer] 2008-12-11 14:40:26 PST
So to make sure I understand the impact correctly:
For the period of time between step 2 and step 10, we will have no record of downloads served out through bouncer, and when I parse the log files for that day, I will see a gap for that time duration that should be considered "legitimate"?
Comment 6 matthew zeier [:mrz] 2008-12-11 14:42:04 PST
I'm not sure how you get logs but this is just logging in the database, not web access_logs.  Maybe you already knew that.
Comment 7 Jeremy Orem [:oremj] 2008-12-11 14:43:49 PST
The logging in the database is on and off anyways, but I think there are a few scripts that do use the data there.  Off the top of my head I can only think of the sfx download counter feed.
Comment 8 Daniel Einspanjer [:dre] [:deinspanjer] 2008-12-11 14:47:07 PST
Oh, no I did not understand correctly.  I record Mozilla product download statistics from the bouncer access logs retrieved from im-log02/stats/logs/  So if those requests are going to be continued to be logged it should not have any impact on me.
Comment 9 Dave Miller [:justdave] ( 2008-12-11 18:49:58 PST
I forgot to flag the machine I was using for this as used in Inventory, and someone else snagged it and reused it in the meantime. :(  All of the config I had done so far is in puppet though, so redeploy will be painless once I have a box to put it on.  Working on acquiring that now, but the database move is going to be delayed pending getting the replacement blade reloaded.
Comment 10 Dave Miller [:justdave] ( 2008-12-11 19:28:21 PST
I didn't get much sleep last night and actually having trouble staying awake at the moment waiting for the machine to become available.  I believe we're still waiting on logs from what was currently on there getting copied off before we can wipe the new machine and reload it, so I'll pick up on it in the morning after I get some sleep.  I'm assuming because of the lack of public notice on this and the fact we can do it with no user-visible downtime other than the admin panel that I can probably get away with that timing.
Comment 11 Dave Miller [:justdave] ( 2008-12-15 10:51:27 PST
This is completed.  Downtime of the DB itself was under a minute.  Logging was off for about 5 minutes due to propagation times pushing the config change to disable/enable it into the cluster.

rsync of the data, including replication log files took about 15 seconds to complete (small DB, minimal changes between passes, and ran a pass first before taking everything down, then ran a followup pass with it down to get the last minute changes).  Turns out the replication logs weren't being stored with a hostname-specific filename, meaning the slave pointers didn't have to be touched.  As far as they knew, they were still talking to the same master.  Had the DHCP and DNS changes for the IP swap staged, triggered those right as the rsync was starting.  rsync and network restarts on the boxes scripted ahead of time, sent gratuitous arps out from the new box immediately after taking over the IP address.  As far as the network and apps are concerned, the box went down for about 30 seconds then came back.

In case anyone needs to rebuild download counts from the apache logs (don't know if you care or not) the downtime period for logging was 10:24am to 10:29am.

mrdb-bouncer01-old is now at in case anything needs to be snagged from it.  I'll decomission it and mark it as spare in the next couple days if nothing is found. (HP DL360 G4)

Note You need to log in before you can comment on or make changes to this bug.