Closed Bug 1288759 Opened 8 years ago Closed 8 years ago

Investigate Root Cause Of developer3.webapp.scl3.mozilla.com not reporting to New Relic

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bensternthal, Assigned: rwatson)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3242])

On July 22 the server developer3.webapp.scl3.mozilla.com did not report to New Relic from 8:47 to 8:52 AM CST (13:47 - 13:52 UTC), and an alert was raised.

An incident report was filed by John and we would like to understand the cause of this failure.

https://docs.google.com/document/d/1PlRqtBVYaZg_FCGVeKUMOFY7ApT_ZVq8M2GopRBb_7A/edit
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3242]
Component: WebOps: Engagement → WebOps: Other
Assignee: server-ops-webops → rwatson
I had a quick look through the doc and grepped through the logs myself and I can't find any indication that this is the proxy. 
I would ask, what led you to that conclusion (other than the error in the doc), but I don't think this is worth too much effort since this was a blip. We also had some newrelic key changes that happened that week that might have caused a blip. Re-open or log more here if needed, but it looks like we haven't seen this since.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
As it says in the incident report, the error in /var/log/newrelic/nrsysmond.log is:

2016-07-22 11:01:47.542 (7734) error: RPM cmd='metric_data' for 'Infrastructure' failed: Proxy CONNECT aborted due to timeout

This is a rolling log file, and there are no recent entries in today's file. On developer3, there were 91 Proxy CONNECT errors on 7/28, or 6% of once-a-minute attempts failing.
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.