Closed Bug 559262 Opened 15 years ago Closed 15 years ago

Add aggregate graphs to Munin

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: jabba)

Details

It's hard to get a feel for an overall trend of the performance of the AMO boxes (app, db, gearman, memcache). If there were graphs tracking the averages of all the boxes it would help. No need for every graph; memory, cpu, and load would be a good start I think. This is an example of how to create an aggregate graph: http://munin-monitoring.org/wiki/aggregate_examples
While I suck at math, this is something I can try to figure out :D Will try with load first.
Assignee: server-ops → shyam
Justin will poke at this.
Assignee: shyam → jdow
Wil, I poked around a bit and used that example to create a few aggregate graphs. So far I'm having a hard time figuring out how to make graphs with more than one field work. If you take a look at our Munin page now, you should see an AggregateGraphs section, which is currently showing stacked and summed load average and stacked and summed CPU, however the CPU graph only shows the cpu user field. The load average plugin only has one field, so it is trivial, however for CPU there are system, user, nice, idle, iowait, irq, softirq and steal fields and for memory there is also a long list of fields. I'll keep poking at it to see if I can figure it out. In the mean time, can you take a look at the graphs to see if that is what you are after? I put both the stacked version and the summed version on there, but not sure if both are needed. Let me know more specifically and I'll see what I can find in the munin documentation for creating a more detailed aggregate graph for plugins with more than one field.
Adding zandr to CC, in case he knows more about the configuration.
These look great. For the original bug I was looking for the summed graphs to get general trending, but the stacked is interesting to see if one box is freaking out. Can you divide the summed by the number of boxes so the load isn't ~50ish? Also, putting this on the db-amo01 graphs would be awesome as we're about to land some code on Thursday that I hope reduces overall db load.
I've added aggregate graphs for the amo web servers and the amo db servers for load average, user cpu, and user memory.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.