On July 22 the server developer3.webapp.scl3.mozilla.com did not report to New Relic from 8:47 to 8:52 AM CST (13:47 - 13:52 UTC), and an alert was raised. An incident report was filed by John and we would like to understand the cause of this failure. https://docs.google.com/document/d/1PlRqtBVYaZg_FCGVeKUMOFY7ApT_ZVq8M2GopRBb_7A/edit
I had a quick look through the doc and grepped through the logs myself and I can't find any indication that this is the proxy. I would ask, what led you to that conclusion (other than the error in the doc), but I don't think this is worth too much effort since this was a blip. We also had some newrelic key changes that happened that week that might have caused a blip. Re-open or log more here if needed, but it looks like we haven't seen this since.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
As it says in the incident report, the error in /var/log/newrelic/nrsysmond.log is: 2016-07-22 11:01:47.542 (7734) error: RPM cmd='metric_data' for 'Infrastructure' failed: Proxy CONNECT aborted due to timeout This is a rolling log file, and there are no recent entries in today's file. On developer3, there were 91 Proxy CONNECT errors on 7/28, or 6% of once-a-minute attempts failing.
You need to log in before you can comment on or make changes to this bug.