If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

HTTP - /ganglia on ganglia1.dmz.phx1.mozilla.com is CRITICAL: CRITICAL - Socket timeout after 45 seconds

RESOLVED WONTFIX

Status

mozilla.org Graveyard
Server Operations: MOC
RESOLVED WONTFIX
3 years ago
3 years ago

People

(Reporter: MOC Nagios API, Unassigned)

Tracking

Details

(Whiteboard: [id=nagios1.private.phx1.mozilla.com:335757, URL)

(Reporter)

Description

3 years ago
Automated alert report from nagios1.private.phx1.mozilla.com:

Hostname: ganglia1.dmz.phx1.mozilla.com
Service:  HTTP - /ganglia
State:    CRITICAL
Output:   CRITICAL - Socket timeout after 45 seconds

Runbook:  http://m.allizom.org/HTTP+-+/ganglia
[root@ganglia1.dmz.phx1 log]# tail messages
Jun 12 04:24:33 ganglia1 /usr/sbin/gmetad[1550]: RRD_update (/var/lib/ganglia/rrds/generic-web/__SummaryInfo__/part_max_used.rrd): /var/lib/ganglia/rrds/generic-web/__SummaryInfo__/part_max_used.rrd: illegal attempt to update using time 1402572272 when last update time is 1402572272 (minimum one second step)
Jun 12 04:24:33 ganglia1 /usr/sbin/gmetad[1550]: RRD_update (/var/lib/ganglia/rrds/generic-web/__SummaryInfo__/tcp_lastack.rrd): /var/lib/ganglia/rrds/generic-web/__SummaryInfo__/tcp_lastack.rrd: illegal attempt to update using time 1402572272 when last update time is 1402572272 (minimum one second step)
Jun 12 04:24:33 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [addons.memcached] datasource
Jun 12 04:24:34 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [Dist] datasource
Jun 12 04:24:34 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [github-sync] datasource
Jun 12 04:24:35 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [Addons] datasource
Jun 12 04:24:36 ganglia1 /usr/sbin/gmetad[1550]: RRD_update (/var/lib/ganglia/rrds/Socorro/__SummaryInfo__/ds_KB_read.rrd): /var/lib/ganglia/rrds/Socorro/__SummaryInfo__/ds_KB_read.rrd: not a simple unsigned integer: '-523992250'
Jun 12 04:24:38 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [IT Elastic Search] datasource
Jun 12 04:24:38 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [Snippets Grid] datasource
Jun 12 04:24:39 ganglia1 /usr/sbin/gmetad[1550]: data_thread() got no answer from any [addons.elasticsearch] datasource
Assignee: nobody → server-ops-webops
Blocks: 1024417
Component: Server Operations: MOC → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
QA Contact: bpannabecker → nmaul
nagios-phx1: Usul: cturra has been paged with the message "ganglia1.dmz.phx1.mozilla.com:HTTP - /ganglia is CRITICAL: CRITICAL is down can you have a look ?(Usul)"
[1:28pm] Usul: and that’s bug 102443
of course it cleared just after I paged :(
Whiteboard: [id=nagios1.private.phx1.mozilla.com:335757] → [kanban:https://kanbanize.com/ctrl_board/4/136] [id=nagios1.private.phx1.mozilla.com:335757]

Comment 4

3 years ago
I don't believe WebOps owns ganglia, so I'm moving this back to Server Operations: MOC. However, my own feeling is that we can close this out... seems like a one-time event. :)
Assignee: server-ops-webops → nobody
Component: WebOps: Other → Server Operations: MOC
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → dmoore
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/136] [id=nagios1.private.phx1.mozilla.com:335757] → [id=nagios1.private.phx1.mozilla.com:335757]

Updated

3 years ago
Whiteboard: [id=nagios1.private.phx1.mozilla.com:335757] → [kanban:https://kanbanize.com/ctrl_board/4/518] [id=nagios1.private.phx1.mozilla.com:335757]
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/518] [id=nagios1.private.phx1.mozilla.com:335757] → [id=nagios1.private.phx1.mozilla.com:335757]

Updated

3 years ago
Whiteboard: [id=nagios1.private.phx1.mozilla.com:335757] → [kanban:https://kanbanize.com/ctrl_board/4/523] [id=nagios1.private.phx1.mozilla.com:335757]
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/523] [id=nagios1.private.phx1.mozilla.com:335757] → [id=nagios1.private.phx1.mozilla.com:335757]

Updated

3 years ago
Whiteboard: [id=nagios1.private.phx1.mozilla.com:335757] → [kanban:https://kanbanize.com/ctrl_board/4/525] [id=nagios1.private.phx1.mozilla.com:335757]
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/525] [id=nagios1.private.phx1.mozilla.com:335757] → [id=nagios1.private.phx1.mozilla.com:335757
clsoing old nagios alerts.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
(Assignee)

Updated

3 years ago
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.