Closed Bug 1040961 Opened 11 years ago Closed 11 years ago

Graphite data for hgweb* incomplete and/or incorrect

Categories

(Infrastructure & Operations :: Tools, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: ericz)

References

Details

some or all of the data for the hgweb*.dmz.scl3.m.c is missing For example, look at hosts.hgweb*_dmz_scl3_mozilla_com.apache.apache80.apache_requests.count for May to now. Shows all traffic on hgweb1, then no traffic, then all on hgweb6.
I'm told that this is related to work in bug 1040222, so marking that as a 'See Also'. Moving to the triage queue, in any case.
Assignee: infra → server-ops
Component: Infrastructure: Monitoring → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: jdow → shyam
Component: Server Operations → Tools
Product: mozilla.org → Infrastructure & Operations
Eric, Can you please take a look at this? I believe these hosts have been rebuilt multiple times in the past...not sure if that's affecting this in anyway.
Assignee: server-ops → eziegenhorn
fwiw, we _just_ rebuilt hgweb 1,2,4,8 over the last 2 weeks. We will be rebuilding the other 4 soon. Please let us know if there is a missing step in the rebuild. These metrics are key to us. Thanks!
Eric, can I get an ET on this, please? We need this data to better diagnose failures & effectiveness of remediation on bug 1042210 and related. I understand the concern about bug 104022, but that can't be a hard block for this work at this time. I'll try to get that moved along.
Blocks: 1042210
Severity: normal → major
I've been poking at hgweb1 and it is simply not sending two metrics: hosts.hgweb1_dmz_scl3_mozilla_com.apache.apache80.apache_bytes.count hosts.hgweb1_dmz_scl3_mozilla_com.apache.apache80.apache_requests.count There is an apache plugin to collectd that should seemingly be sending those metrics. It's a bit opaque, but in the configuration manual (http://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_apache) it says that the ExtendedStatus Apache directive needs to be enabled and I don't see that in any of the Apache configs on hgweb1. Can we try adding that and reloading apache? On the other hand, I don't see that directive enabled on hgweb6 either, which does appear to be sending those metrics so I'll keep looking for other issues with that plugin.
For what it's worth: hgweb1 stopped sending the metrics in question around 7/8/14. I'm guessing this is around when the box was rebuilt. hgweb6 _started_ sending those metrics around 7/16/14.
Actually I believe the ExtendedStatus apache config is likely the culprit. The url collectd is querying returns less information on hgweb1 than hgweb6. Output from curl http://localhost:80/server-status?auto hgweb6: Total Accesses: 263701 Total kBytes: 245849944 CPULoad: .280456 Uptime: 44770 ReqPerSec: 5.89013 BytesPerSec: 5623190 BytesPerReq: 954681 BusyWorkers: 5 IdleWorkers: 15 Scoreboard: ______WW____W_W_W___............................................................................................................................................................................................................................................ hgweb1: BusyWorkers: 5 IdleWorkers: 17 Scoreboard: _____._WW________._WWW__........................................................................................................................................................................................................................................ Note the missing BytesPerSec and ReqPerSec fields among others for hgweb1.
Yep. hgweb1.dmz.scl3# grep -ir 'extendedstatus on$' conf* conf/httpd.conf:#ExtendedStatus On hgweb6.dmz.scl3# grep -ir 'extendedstatus on$' conf* conf/httpd.conf:ExtendedStatus On hgweb6's httpd was tweaked on the 16th to block a security issue. I'll drop the change into puppet and do rolling gracefuls.
change is in place.
Those missing metrics are now flowing for hgweb[1-5].
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.