Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Upgrade bouncer db master to BL460c


Status Graveyard
Server Operations
9 years ago
2 years ago


(Reporter: mrz, Assigned: justdave)





9 years ago
From #infra discussions, upgrade bouncer's master db to BL460c.  

Is 4GB RAM enough?  Is 146GB 10k RPM drives fast enough?


9 years ago
Assignee: server-ops → justdave
Current server has 4GB RAM and 72GB 10krpm drives, so specs sound fine.

The main perf boost will probably be going to 64-bit for the OS (in addition to having faster processors on the blade, the existing box is HT instead of dual core.


9 years ago
Priority: -- → P1
I'll try to get this staged tonight, we need to do this switchover during tomorrow's outage window to get it in time for the next FF release.
Upgrade procedure:

1) Disable the sentry cron jobs
2) Disable logging in bouncer so we have no writes to the master
3) advise people not to mess with the bouncer admin
4) shut down mysql on mrdb-bouncer01 (bouncer itself should keep talking to the slaves to serve downloads)
5) rsync /var/lib/mysql and /var/lib/mysql-innodb from mrdb-bouncer01 to tm-bouncer01-master01
6) swap IP addresses between mrdb-bouncer01 and tm-bouncer01-master01 and make mrdb-bouncer01 be a cname to tm-bouncer01-master01
7) shut down replication streams on all the slaves
8) bring up mysqld on tm-bouncer01-master01
9) fix replication filenames to match new hostname and snag the replication pointers for later reference.
10) turn sentry/bouncer logging back on
11) fix replication config on all the slaves to point at the new master and resume replication
Group: infra
Whiteboard: Scheduled for Thurs 12/11 7pm

Comment 4

9 years ago
fyi, this will impact bouncer's logging.  Copying interested parties.
So to make sure I understand the impact correctly:
For the period of time between step 2 and step 10, we will have no record of downloads served out through bouncer, and when I parse the log files for that day, I will see a gap for that time duration that should be considered "legitimate"?

Comment 6

9 years ago
I'm not sure how you get logs but this is just logging in the database, not web access_logs.  Maybe you already knew that.

Comment 7

9 years ago
The logging in the database is on and off anyways, but I think there are a few scripts that do use the data there.  Off the top of my head I can only think of the sfx download counter feed.
Oh, no I did not understand correctly.  I record Mozilla product download statistics from the bouncer access logs retrieved from im-log02/stats/logs/  So if those requests are going to be continued to be logged it should not have any impact on me.
I forgot to flag the machine I was using for this as used in Inventory, and someone else snagged it and reused it in the meantime. :(  All of the config I had done so far is in puppet though, so redeploy will be painless once I have a box to put it on.  Working on acquiring that now, but the database move is going to be delayed pending getting the replacement blade reloaded.
Whiteboard: Scheduled for Thurs 12/11 7pm → Scheduled for Thurs 12/11 ?pm
I didn't get much sleep last night and actually having trouble staying awake at the moment waiting for the machine to become available.  I believe we're still waiting on logs from what was currently on there getting copied off before we can wipe the new machine and reload it, so I'll pick up on it in the morning after I get some sleep.  I'm assuming because of the lack of public notice on this and the fact we can do it with no user-visible downtime other than the admin panel that I can probably get away with that timing.
Whiteboard: Scheduled for Thurs 12/11 ?pm → Scheduled for Fri 12/12 ?am
This is completed.  Downtime of the DB itself was under a minute.  Logging was off for about 5 minutes due to propagation times pushing the config change to disable/enable it into the cluster.

rsync of the data, including replication log files took about 15 seconds to complete (small DB, minimal changes between passes, and ran a pass first before taking everything down, then ran a followup pass with it down to get the last minute changes).  Turns out the replication logs weren't being stored with a hostname-specific filename, meaning the slave pointers didn't have to be touched.  As far as they knew, they were still talking to the same master.  Had the DHCP and DNS changes for the IP swap staged, triggered those right as the rsync was starting.  rsync and network restarts on the boxes scripted ahead of time, sent gratuitous arps out from the new box immediately after taking over the IP address.  As far as the network and apps are concerned, the box went down for about 30 seconds then came back.

In case anyone needs to rebuild download counts from the apache logs (don't know if you care or not) the downtime period for logging was 10:24am to 10:29am.

mrdb-bouncer01-old is now at in case anything needs to be snagged from it.  I'll decomission it and mark it as spare in the next couple days if nothing is found. (HP DL360 G4)
Last Resolved: 9 years ago
Resolution: --- → FIXED
Whiteboard: Scheduled for Fri 12/12 ?am
Product: → Graveyard
You need to log in before you can comment on or make changes to this bug.