Closed Bug 1280498 Opened 8 years ago Closed 8 years ago

generic3.webapp.phx1 Using 512 out of 512 Clients

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdevney, Unassigned)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3137])

< nagios-phx1> Thu 16:13:16 UTC [1017] generic3.webapp.phx1.mozilla.com:httpd max clients is WARNING: Using 512 out of 512 Clients (http://m.mozilla.org/httpd+max+clients)


From http://generic3.webapp.phx1.mozilla.com/server-status yup, confirming that it's using all 512 clients, and all 8 threads of each.  

The vast majority of these listed are for pastebin.mozilla.org, in state Logging.
Example:
Srv	PID	Acc	M	CPU 	SS	Req	Conn	Child	Slot	Client	VHost	Request
0-8	5972	24/558/79354	L 	12.16	22194	0	1425.2	147.96	20421.02 	172.6.192.161	pastebin.mozilla.org	GET /?dl=8877534 HTTP/1.1


Despite all these threads trying to log iostat doesn't show much disk access happening:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          27.35    0.00    5.41    0.40    0.00   66.83

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              15.60       558.40        36.80       2792        184
sda               0.00         0.00         0.00          0          0
dm-0             18.00       558.40        36.80       2792        184
dm-1              0.00         0.00         0.00          0          0
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3137]
Error reported on zlb:

Node 10.8.81.93:80: Monitor failed. A Monitor that was assigned to this node failed. First failed 17 seconds ago.  (This error was reported by zlb1.internal.private.phx1.mozilla.com)
System ram is full - can't just increase maxclients.  That will make it fall over.
No hung nor missing NFS mounts.  
Not seeing historical data on zlb as the runbook suggests, but given today's place in work week it's very likely there is unusually high load right now.
bunch of this stuff in httpd's error.log

/usr/bin/diff3: standard output: Broken pipe
/usr/bin/diff3: write failed
/usr/bin/diff3: standard output: Broken pipe
/usr/bin/diff3: write failed
highest ram users

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                               
16155 apache    20   0  880m  90m 3716 S  0.0  0.8  12:03.33 httpd                                 
16165 apache    20   0  869m  87m 3660 S  0.0  0.7  44:16.49 httpd                                 
16153 apache    20   0  880m  85m 3684 S  0.0  0.7  12:18.08 httpd                                 
16154 apache    20   0  874m  82m 3684 S  0.0  0.7  12:17.30 httpd                                 
16164 apache    20   0  805m  81m 3656 S  0.0  0.7  46:43.14 httpd                                 
16160 apache    20   0  805m  80m 3664 S  0.0  0.7  45:25.08 httpd                   

[root@generic3.webapp.phx1 httpd]# ps -ef | grep httpd | wc -l
585
jedi restarted httpd and alerts cleared.
Memory use has been climbing steadily since this morning until it ran out, so as long as it doesn't start rising again we should be ok.  Thus far, it looks to be stable and in line with the other nodes, so hopefully this was a one-off event.  https://graphite-scl3.mozilla.org/dashboard/#generic-prod-webheads
All clear since then. Calling this one.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.