Closed
Bug 837521
Opened 12 years ago
Closed 12 years ago
jenkins1.dmz.phx1 nagios alerts on memcached
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ericz, Assigned: ericz)
References
Details
(Not sure if this is webops or dev services)
Got this alert
< nagios-phx1> | Sun 10:01:40 PST [199] jenkins1.dmz.phx1.mozilla.com:memcached is CRITICAL: Unable to set memcached nagios (http://m.allizom.org/memcached)
Using the documentation I checked on memcached and it seemed to be running fine. The alert wouldn't clear though so I restarted it. The alert still wouldn't clear. 45 minutes later, it cleared on it's own for no obvious reason. This alert has flapped four times in the last week. Can we get it to not do that or at least in the documentation explain better what can be done about it?
Assignee | ||
Comment 1•12 years ago
|
||
Supporting info from when it alerted before I restarted memcached:
[eziegenhorn@jenkins1.dmz.phx1 ~]$ service memcached status
memcached (pid 20310) is running...
[eziegenhorn@jenkins1.dmz.phx1 ~]$ ps aux | grep memcached
nobody 20310 0.0 0.0 360368 19232 ? Ssl 2012 8:32 memcached -d -p 11211 -u nobody -m 256 -c 10024 -P /var/run/memcached/memcached.pid
1892 29008 0.0 0.0 103244 860 pts/0 S+ 10:04 0:00 grep memcached
[eziegenhorn@jenkins1.dmz.phx1 ~]$ memcached-tool localhost:11211 stats
#localhost:11211 Field Value
accepting_conns 1
auth_cmds 0
auth_errors 0
bytes 303872
bytes_read 308246457
bytes_written 291362392
cas_badval 0
cas_hits 165948
cas_misses 0
cmd_flush 24837
cmd_get 662191
cmd_set 541256
cmd_touch 0
conn_yields 0
connection_structures 79
curr_connections 10
curr_items 607
decr_hits 0
decr_misses 0
delete_hits 30125
delete_misses 175694
evicted_unfetched 0
evictions 0
expired_unfetched 27706
get_hits 455285
get_misses 206906
hash_bytes 524288
hash_is_expanding 0
hash_power_level 16
incr_hits 9841
incr_misses 5299
libevent 1.4.13-stable
limit_maxbytes 268435456
listen_disabled_num 0
pid 20310
pointer_size 64
reclaimed 99203
reserved_fds 20
rusage_system 286.289477
rusage_user 226.680539
threads 4
time 1359914716
total_connections 159917
total_items 551097
touch_hits 0
touch_misses 0
uptime 13596805
version 1.4.14
Assignee | ||
Comment 2•12 years ago
|
||
It alerted again.
Assignee | ||
Comment 3•12 years ago
|
||
The stats this time:
[eziegenhorn@jenkins1.dmz.phx1 ~]$ memcached-tool localhost:11211 stats
#localhost:11211 Field Value
accepting_conns 1
auth_cmds 0
auth_errors 0
bytes 15346
bytes_read 320191
bytes_written 299773
cas_badval 0
cas_hits 163
cas_misses 0
cmd_flush 9
cmd_get 627
cmd_set 511
cmd_touch 0
conn_yields 0
connection_structures 68
curr_connections 10
curr_items 65
decr_hits 0
decr_misses 0
delete_hits 26
delete_misses 180
evicted_unfetched 0
evictions 0
expired_unfetched 6
get_hits 430
get_misses 197
hash_bytes 524288
hash_is_expanding 0
hash_power_level 16
incr_hits 13
incr_misses 7
libevent 1.4.13-stable
limit_maxbytes 268435456
listen_disabled_num 0
pid 29825
pointer_size 64
reclaimed 28
reserved_fds 20
rusage_system 1.391788
rusage_user 1.411785
threads 4
time 1359928521
total_connections 115
total_items 524
touch_hits 0
touch_misses 0
uptime 13740
version 1.4.14
Assignee | ||
Comment 4•12 years ago
|
||
Self resolved again (without a restart) but much quicker this time.
Assignee | ||
Comment 5•12 years ago
|
||
This alerted once last night and once this morning. Both times it looked fine and recovered on its own. I've not seen this alert be useful yet.
Comment 6•12 years ago
|
||
Per IRC, let's not have this page oncall, but just show up in IRC for now.
Assignee: bburton → server-ops
Component: Server Operations: Web Operations → Server Operations
QA Contact: nmaul → shyam
Assignee | ||
Comment 7•12 years ago
|
||
This alerted again but as per comment 6, it shows in IRC, and doesn't page. I'm going to note in the documentation that this alert usually clears on its own and consider this closed.
Assignee: server-ops → eziegenhorn
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 8•12 years ago
|
||
This actually paged me just now so I'll investigate the nagios config.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 9•12 years ago
|
||
Changed to #sysalertsonly.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•