Closed
Bug 1040961
Opened 11 years ago
Closed 11 years ago
Graphite data for hgweb* incomplete and/or incorrect
Categories
(Infrastructure & Operations :: Tools, task)
Infrastructure & Operations
Tools
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: ericz)
References
Details
some or all of the data for the hgweb*.dmz.scl3.m.c is missing
For example, look at
hosts.hgweb*_dmz_scl3_mozilla_com.apache.apache80.apache_requests.count
for May to now.
Shows all traffic on hgweb1, then no traffic, then all on hgweb6.
I'm told that this is related to work in bug 1040222, so marking that as a 'See Also'.
Moving to the triage queue, in any case.
Assignee: infra → server-ops
Component: Infrastructure: Monitoring → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: jdow → shyam
Updated•11 years ago
|
Component: Server Operations → Tools
Product: mozilla.org → Infrastructure & Operations
Comment 2•11 years ago
|
||
Eric,
Can you please take a look at this? I believe these hosts have been rebuilt multiple times in the past...not sure if that's affecting this in anyway.
Assignee: server-ops → eziegenhorn
fwiw, we _just_ rebuilt hgweb 1,2,4,8 over the last 2 weeks. We will be rebuilding the other 4 soon.
Please let us know if there is a missing step in the rebuild. These metrics are key to us.
Thanks!
Eric, can I get an ET on this, please? We need this data to better diagnose failures & effectiveness of remediation on bug 1042210 and related.
I understand the concern about bug 104022, but that can't be a hard block for this work at this time. I'll try to get that moved along.
Blocks: 1042210
Severity: normal → major
| Assignee | ||
Comment 5•11 years ago
|
||
I've been poking at hgweb1 and it is simply not sending two metrics:
hosts.hgweb1_dmz_scl3_mozilla_com.apache.apache80.apache_bytes.count
hosts.hgweb1_dmz_scl3_mozilla_com.apache.apache80.apache_requests.count
There is an apache plugin to collectd that should seemingly be sending those metrics. It's a bit opaque, but in the configuration manual (http://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_apache) it says that the ExtendedStatus Apache directive needs to be enabled and I don't see that in any of the Apache configs on hgweb1. Can we try adding that and reloading apache? On the other hand, I don't see that directive enabled on hgweb6 either, which does appear to be sending those metrics so I'll keep looking for other issues with that plugin.
| Assignee | ||
Comment 6•11 years ago
|
||
For what it's worth:
hgweb1 stopped sending the metrics in question around 7/8/14. I'm guessing this is around when the box was rebuilt.
hgweb6 _started_ sending those metrics around 7/16/14.
| Assignee | ||
Comment 7•11 years ago
|
||
Actually I believe the ExtendedStatus apache config is likely the culprit. The url collectd is querying returns less information on hgweb1 than hgweb6.
Output from curl http://localhost:80/server-status?auto
hgweb6:
Total Accesses: 263701
Total kBytes: 245849944
CPULoad: .280456
Uptime: 44770
ReqPerSec: 5.89013
BytesPerSec: 5623190
BytesPerReq: 954681
BusyWorkers: 5
IdleWorkers: 15
Scoreboard: ______WW____W_W_W___............................................................................................................................................................................................................................................
hgweb1:
BusyWorkers: 5
IdleWorkers: 17
Scoreboard: _____._WW________._WWW__........................................................................................................................................................................................................................................
Note the missing BytesPerSec and ReqPerSec fields among others for hgweb1.
Comment 8•11 years ago
|
||
Yep.
hgweb1.dmz.scl3# grep -ir 'extendedstatus on$' conf*
conf/httpd.conf:#ExtendedStatus On
hgweb6.dmz.scl3# grep -ir 'extendedstatus on$' conf*
conf/httpd.conf:ExtendedStatus On
hgweb6's httpd was tweaked on the 16th to block a security issue. I'll drop the change into puppet and do rolling gracefuls.
Comment 9•11 years ago
|
||
change is in place.
| Assignee | ||
Comment 10•11 years ago
|
||
Those missing metrics are now flowing for hgweb[1-5].
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•