virtualize syslog1.private.(scl3|phx1)

RESOLVED FIXED

Status

Infrastructure & Operations
MOC: Service Requests
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: gcox, Assigned: gcox)

Tracking

({p2v})

Details

(Whiteboard: :VS [vm-p2v:2])

(Assignee)

Description

4 years ago
These boxes came out of warranty 2014-04-30, and are light-load potential candidates for p2v conversion.

I'm opening this for tracking the discussion of converting them to virtual (if so, we'll need an hour or 2 each to convert them), or to debate if this is something that should remain physical.
Whiteboard: :VS
So, just wanted to throw some details in - if we're interested in moving this to be a VM.  Here's what I can see on physicals - let me know if I've missed the mark, and we can discuss changing the stats around.  Also, any window desires, or if this isn't going to go virtual at all.


CPU - 2 or 3 (don't shudder, it's an option) - its load says "1-2 CPU" but looking at the moment by moment, there's lots of concurrency that flirts with taking 1 CPU, and I worry that dropping to one would result in backlog.  
RAM - LOTS of buffers/cache here.  I'm gonna go with 4G, as without the buffers/cache, we're looking at only 1.5G in use.  
Disk - Ahhh.. here's where things go wonky in SCL3.  Note that due to the sheer amount of data being copied, things will likely be down for more than the standard hour.
  
   / - resize down to the standard VM 40G
   /data - currently using 245G of 2.9T - any chance we can drop that to say, 600G?
   /data2 - currently 180M of 55G - any chance we can drop that a bit?

Disk in PHX1 I'd say leave as it is.
Is it possible to start with syslog2.scl3 first? It's a lot less critical and much less traffic and may be easier to troubleshoot any errors/issues from the p2v.

These all still have arcsight agents running on them as well, so we should remove (finally/once and for all-style) the arcsight agents and take another look at the load as I bet it will greatly reduce. 

I'm a bit booked next week at vegas/hacker-summercamp/defcon/blackhat but I'll be able to work on this the following week (aug 13th or so)
(Assignee)

Comment 3

4 years ago
We'll be in PDX that week; let's chat over beer.
most excellent
(Assignee)

Comment 5

4 years ago
Per discussion at workweek: 
For phx1
* make a syslog2 using existing nfs
* monkey with IPs for keepalive futures
* burn syslog1 to the ground, recreate

For scl3
* make an nfs
* nuke /data2 on syslog1
* move /data of syslog1 onto nfs
* make syslog2 vm using nfs
* monkey with IPs for keepalive
* burn syslog1 to ground, recreate
(Assignee)

Updated

4 years ago
Depends on: 1057408
(Assignee)

Updated

4 years ago
Depends on: 1057527
What do we need to do to setup a VIP per data center for logging?
(Assignee)

Comment 7

4 years ago
:atoll had a few different ideas on that one, so I'm going to spin sub-bugs for getting the VIP setup.

State of the world:
phx1 syslog2 exists in a minimal setup with NFS, so we're ready to have that pair VIP'ed; I'll open a blocking bug.

scl3:
syslog1 is now running on the nfs mount in /data.
syslog2, I forget what our step here was.  I recall there being some arcsight junk in there; was the goal here to destroy syslog2 NOW and make a fresh VM, or are we waiting for more of arcsight to go away beforehand?
(Assignee)

Updated

4 years ago
Depends on: 1058627
(Assignee)

Updated

4 years ago
Flags: needinfo?(jbryner)
syslog2.scl3 can go p2v if you like. Arcsight is off and there is very little traffic to the machine.
Flags: needinfo?(jbryner)
(Assignee)

Updated

4 years ago
Depends on: 1059861
(Assignee)

Comment 9

4 years ago
syslog2.scl3 p2v'ed.  Because it's in active use, it'll need to be a little carefully merged when it gets failover IPs, specified in bug 1059861.

At this point we're on hold on those blocker bugs.  Once syslog2 in either site is absorbing the load over its VIP and we can take syslog1 down, we'll go p2v syslog1.
Summary: virtualize(?) syslog1.private.(scl3|phx1) → virtualize syslog1.private.(scl3|phx1)
(Assignee)

Comment 10

3 years ago
scl3 server dropped a drive, bug 1115901

Updated

3 years ago
Group: infra
Component: Server Operations → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → lypulong
(Assignee)

Comment 11

3 years ago
Since syslog1 logs to NFS, the time we can run them in overlap mode (virtual copy done, physical copy still up) could be extended while we wait for DHCP-of-the-new-V to propagate from inventory.

syslog[12].private.(scl3|phx1) are VM's now, and consistently sized.

The 'blockers', with the failover IP and recreating a syslog with the new-syslog1.private.phx1 VM, can proceed at whatever pace they warrant.

The failover exposed a race condition on bringup of syslog vs nfs, bug 1131314.
Assignee: server-ops → gcox
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Whiteboard: :VS → :VS [vm-p2v:2]
You need to log in before you can comment on or make changes to this bug.