Closed Bug 1123901 Opened 10 years ago Closed 10 years ago

aus4 backends max client checks time out relatively frequently

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1126825

People

(Reporter: bhearsum, Assigned: nmaul)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316] )

Catlee noticed this check failing today: 16:35:51 nagios-phx1 | Tue 13:35:51 PST [1341] aus2.webapp.phx1.mozilla.com:httpd max clients is CRITICAL: (Service Check Timed Out) (http://m.mozilla.org/httpd+max+clients) │ Q I originally thought we might be actually hitting load issues, but then I noticed the history of these checks failing: https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20httpd%20aus%20webapp%20comp%3AMOC&list_id=11886145 Most of them are timeouts, not actually hitting max clients. All of the ones in the list above happened prior to enabling the release channel (which has the vast majority of the traffic), and some happen after caching was enabled. Given that, and the fact that there's no other service checks timing out, it makes me wonder if there's a problem with the max clients check itself. Eg, the nagios plugin on the servers can't respond in time for some reason other than load. Regardless, we may want to bump max clients up since network/cpu/memory usage is all pretty much under control - I imagine the nodes can take more than 256 clients at a time (I am not a sysadmin/webop though, so I might be overlooking something!).
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316]
(In reply to Ben Hearsum [:bhearsum] from comment #0) > Regardless, we may want to bump max clients up since network/cpu/memory > usage is all pretty much under control - I imagine the nodes can take more > than 256 clients at a time (I am not a sysadmin/webop though, so I might be > overlooking something!). I'm curious what limit we set on the nodes behind aus3.m.o.
As a point of information, it looks like the older cluster is set up with a MaxClients of 260 while the newer cluster is set up with a MaxClients of 256. Not a huge difference.
This is being worked on over in bug 1126825. Dup'ing this one to that. :) TL;DR: I think there's an issue with the version of mod_wsgi on some of the nodes, and we're attempting to validate that.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Assignee: server-ops-webops → nmaul
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.