Closed Bug 1022893 Opened 10 years ago Closed 10 years ago

New VM for Observium, observium2.private.scl3.mozilla.com

Categories

(Infrastructure & Operations :: Virtualization, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Assigned: cknowles)

Details

(Whiteboard: [vm-create:1])

observium2.private.scl3.mozilla.com
RHEL 6, x86-64
2GB RAM
80GB disk
Only a couple of questions - how many CPUs do you need?  based on RAM, I'm guessing one, but wanted you to weigh in before I went too far.

Also, do you want that 80G all in /, or do you want the extra 40G (default template is 40G) in a different path?
Assignee: server-ops-virtualization → cknowles
I'd say we can try 1 cpu and see how it goes, adding more later if need be.  I'd like 80 all in / or we could do the default 40 on / plus 60GB on a different filesystem if that's easier.
Yup, we can certainly add a little more cpu later - noting that it does require a quick reboot to effect that change.  

as for the HDD, both are equivalently difficult, so I leave that to you to decide for whatever makes your life easier.
Let's just do 80 in / then as I'm uncertain where data and logs will end up.
Alright, observium2.private.scl3.mozilla.com a RHEL6 with 80G in /, 1 CPU, 2G RAM, has been created.

Initial stubs in puppet and nagios have been done, and it should be ready for your customization. 

Let me know if there are any problems or concerns.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
To make sure that box will handle the load here are a few metrics/comments:

About disk IO:
From http://www.observium.org/wiki/Hardware_Scaling
"A single 7200RPM drive will handle the RRD I/O of about 5,000 ports. This can be increased by using RAID-0 or faster (10k, 15k) drives. An SSD will allow much, much faster I/O, but may be susceptible to corruption due to the heavy write load."
Usually a "port" is a RRD file containing several DS. We're currently at 13000 ports live (28000 ports in total), pretty much all the network infra is in there, but we will probably be adding more devices (for example servers), etc...

About CPU:
"A quad-core 2.13GHz Xeon E5506 should scale to ~20k ports." and "2-8 pollers per CPU so as not to waste CPU cycle", I guess here by CPU they mean cores.

You can find some disk, cpu, ram, etc.. metrics of the actual server on:
https://observium.private.scl3.mozilla.com/device/device=20/tab=health/
and
https://observium.private.scl3.mozilla.com/device/device=20/tab=graphs/group=system/
Note that nfsen and mysql are also on that box and probably use some resources as well
Hrm.

Have some concerns - that origin box is pretty easily using 50% of all 8 cores, and disk IO is a concern.  While the datastore that the VMs live on is on fast disks, it's still accessing over NFS, and VMware.  So you've lost a lot of the promptness that you get from raw hardware.

So, my upshot is, give it a try - we *can* bump you to 4 core (however, the sweet spot for esx scheduling is 2-4 core, any higher and you start to lose through inefficiencies)  But I'm gonna wager that you're going to find the disk and other performance of the VM to be disappointing.  (I've had similar experiences at $lastjob with another monitoring system with not dissimilar hardware requirements.)  If this doesn't go well for your needs, you'll probably be better served moving to hardware instead.  Not because all VMs suck, just because they're not good at super high performance disk and high cpu count tasks.

Let me know if you have questions.
I'm not too concerned about the CPU as most of what the existing box is currently doing is mysql which is being moved to its own VMs (already setup).  We may need to bump it up but I don't see Observium doing lots with the CPU.

I/O is a little more concerning to me.  nfsen and mysql also contribute to current I/O load and won't be on the VM but certainly writing those rrd files is non-trivial.  The way I see it, it's like Graphite but with a small fraction of the volume and load.  Graphite is not suitable for VMs and we're going to try this and see if it is.  I think it may just work.  If not, we'll definitely spin up a dedicated server -- our plan is to test this out.  Thanks for the the VM!
Ah... OK, this was your sinister plan all along.  future-note - feel free to clue us into the purpose/plan on the initial request so that we don't go all tizzy-like on responses.  

Good luck with the testing, and let us know if there's anything we can help diag/etc.
Whiteboard: [vm-create:1]
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.