560659 - [tracking] AMO DB uplift

matthew zeier [:mrz]

Reporter

Description

•

15 years ago

No description provided.

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Assignee: server-ops → tellis

timellis

Comment 1

•

15 years ago

There are some servers in MPT for AMO. Create another AMO cluster, during outage, fail to those servers. Then old AMO hardware is available for re-use.

timellis

Comment 2

•

15 years ago

There are actual prod problems with AMO right now.

Severity: minor → critical

timellis

Comment 3

•

15 years ago

How to PXE boot a machine: (1) ssh mozillaadmin@10.2.8.101 ((bizarro password typed next up)) (2) show server names (3) connect server X (where X is the slot number) will connect to a blade (4) power on <--power on, wait ~30 seconds (5) vsp <--console (6) Press ENTER for OS install menu (7) <osNum> text console=ttyS0 hostname=<hostname>

timellis

Comment 4

•

15 years ago

These are the boxes slated for AMO02: 10.09 tm-sfx01-slave02 2634 HP - BL460c G6 (1x L5520 24GB BBWC) None 10.2.8.101 10.10 tm-amo01-slave04 2635 HP - BL460c G6 (1x L5520 24GB BBWC) infra : database 10.2.8.101 10.11 tm-amo02-slave02 2636 HP - BL460c G6 (1x L5520 24GB BBWC) None 10.2.8.101 10.12 tm-amo02-master01 2637 HP - BL460c G6 (1x L5520 24GB BBWC) None 10.2.8.101 10.13 tm-amo02-slave01 2638 HP - BL460c G6 (1x L5520 24GB BBWC) None 10.2.8.101 I think to use tm-amo01-slave04 to build the other 3 servers (if not from Backups).

timellis

Comment 5

•

15 years ago

Use bbcp to create new slaves. Old way is super-slow. Be sure my.cnf exists on slave and is valid for that slave before deploying slave.

timellis

Updated

•

15 years ago

Depends on: 572518

timellis

Updated

•

15 years ago

Group: infra

timellis

Comment 6

•

15 years ago

read only mode is enabled by putting this at the very bottom of settings_local.py: read_only_mode(globals()) and restarting apache. Then I assume I pull every slave out of the pool but one, rebuild the rest of the cluster, bring it back into read/write mode, then pull out the R/O slave and rebuild it while AMO is back online.

timellis

Updated

•

15 years ago

Whiteboard: Needs outage: 24 July 2010, NOON.

Wil Clouser [:clouserw]

Comment 7

•

15 years ago

Sounds reasonable. I don't know if AMO will run on one slave, but y'all are a better judge of that than me.

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: Needs outage: 24 July 2010, NOON. → [blocked]

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: [blocked]

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Assignee: tellis → mrz

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Assignee: mrz → justdave

Dave Miller [:justdave]

Comment 9

•

15 years ago

This is ready to go. The new slaves have been built and already added to the existing pool, we can pull the old slaves out whenever we feel like it. The new master is currently acting as a slave to the existing master. We'll move it into place via slave promotion. This will probably require dropping AMO into read-only mode for about 5 minutes to make sure all the replication pointers sync up before making the switch.

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Flags: needs-downtime+

Whiteboard: 09/21/2010

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Blocks: 599922

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: 09/21/2010 → 09/30/2010 @ 4pm

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: 09/30/2010 @ 4pm → 10/07/2010 @ 4pm

Dave Miller [:justdave]

Comment 10

•

15 years ago

I'd like to get the VIP for this moved into Zeus to match the setup in PHX while we're doing this. Currently we have VIPs both in the Netscaler and on the ACE, and both are actually being used (not sure by what). I think the actual addons site might be using one and samo/vamo using the other? Would be good to get them all pointed to the same place. Right now it's a pain when I want to take a slave out because there's more than one place it has to be disabled.

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: 10/07/2010 @ 4pm → 10/14/2010 @ 4pm

matthew zeier [:mrz]

Reporter

Comment 11

•

15 years ago

Did this happen today?

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Whiteboard: 10/14/2010 @ 4pm → 11/09/2010 @ 4pm

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Blocks: 610299

matthew zeier [:mrz]

Reporter

Comment 12

•

15 years ago

Is this in progress? Done?

Dave Miller [:justdave]

Comment 13

•

15 years ago

I must have missed the bugmail when this date was set on the whiteboard and for some reason failed to notice it since then. Today is my birthday and I'm not even supposed to be here today, would have vetoed it way back then if I'd noticed. No, this hasn't been touched.

Dave Miller [:justdave]

Comment 14

•

15 years ago

I don't know that this needs to be coordinated with another downtime, the required time for it is minimal, and we've had network failures that have taken the site out for longer than this will. The current master just needs to be dropped into read-only mode long enough for the replication pointers to sync up on all of the slaves (which is probably a matter of seconds in most cases), then change the config on the app to talk to the new database VIP for the master.

Phong Tran [:phong]

Updated

•

15 years ago

Whiteboard: 11/09/2010 @ 4pm → 11/18/2010 @ 6pm

Dave Miller [:justdave]

Comment 15

•

15 years ago

OK, all of the slaves have now been re-parented. The app is still talking to the original master, but the only thing slaving from it now is the new master. I'd prefer the app config changes be made by oremj because it looks like stuff is still using IP addresses instead of the fake hostnames, so I'm not sure how that'll break pheonix, or how many places it'll need to be changed, etc. For example the IP address of the old master appears in at least 8 files on mradm02 that I've found so far. The new database IPs should be: read-write: 10.2.70.12 read-only: 10.2.70.19

Assignee: justdave → jeremy.orem+bugs

Dave Miller [:justdave]

Comment 16

•

15 years ago

btw: I'm happy to do this myself if someone gives me a good complete list of all of the places it actually needs to be changed. I'd appreciate having app folks on hand when it happens though ;) More eyes to find if anything breaks, etc.

matthew zeier [:mrz]

Reporter

Updated

•

15 years ago

Flags: needs-downtime+

Whiteboard: 11/18/2010 @ 6pm

Jeremy Orem [:oremj]

Assignee

Comment 17

•

15 years ago

Let's switch the IPs on Tuesday. I'm scared.

Jeremy Orem [:oremj]

Assignee

Comment 18

•

15 years ago

So it is safe to change to these IPs at any time?

matthew zeier [:mrz]

Reporter

Comment 19

•

15 years ago

Can you set a date to get this wrapped up?

Jeremy Orem [:oremj]

Assignee

Comment 20

•

15 years ago

Need an answer to comment 18.

matthew zeier [:mrz]

Reporter

Comment 21

•

15 years ago

over to dave for comment.

Assignee: jeremy.orem+bugs → justdave

Dave Miller [:justdave]

Comment 22

•

15 years ago

Should be safe to switch them, the IPs are active. If you can run a test connection from the mysql command line from one of the webheads and it works then you're good. If anything we may need firewall poked, that might be the only blocker to it. You'll be able to figure that out testing from a webhead I'd imagine.

Assignee: justdave → jeremy.orem+bugs

Jeremy Orem [:oremj]

Assignee

Comment 23

•

15 years ago

Will plan on doing this next week.

Severity: critical → normal

Phong Tran [:phong]

Updated

•

15 years ago

Whiteboard: [12/20 @ 4PM]

Phong Tran [:phong]

Updated

•

15 years ago

Flags: needs-downtime+

Jeremy Orem [:oremj]

Assignee

Updated

•

15 years ago

Whiteboard: [12/20 @ 4PM] → [12/30 @ 4PM]

Phong Tran [:phong]

Updated

•

15 years ago

Whiteboard: [12/30 @ 4PM] → [1/11 @ 4PM]

Jeremy Orem [:oremj]

Assignee

Comment 24

•

15 years ago

I've switched the IPs for SAMO and AMO.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard