Closed
Bug 1123901
Opened 10 years ago
Closed 10 years ago
aus4 backends max client checks time out relatively frequently
Categories
(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)
Infrastructure & Operations Graveyard
WebOps: Product Delivery
x86_64
Linux
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1126825
People
(Reporter: bhearsum, Assigned: nmaul)
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316] )
Catlee noticed this check failing today:
16:35:51 nagios-phx1 | Tue 13:35:51 PST [1341] aus2.webapp.phx1.mozilla.com:httpd max clients is CRITICAL: (Service Check Timed Out) (http://m.mozilla.org/httpd+max+clients) │ Q
I originally thought we might be actually hitting load issues, but then I noticed the history of these checks failing:
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20httpd%20aus%20webapp%20comp%3AMOC&list_id=11886145
Most of them are timeouts, not actually hitting max clients. All of the ones in the list above happened prior to enabling the release channel (which has the vast majority of the traffic), and some happen after caching was enabled. Given that, and the fact that there's no other service checks timing out, it makes me wonder if there's a problem with the max clients check itself. Eg, the nagios plugin on the servers can't respond in time for some reason other than load.
Regardless, we may want to bump max clients up since network/cpu/memory usage is all pretty much under control - I imagine the nodes can take more than 256 clients at a time (I am not a sysadmin/webop though, so I might be overlooking something!).
Comment 1•10 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #0)
> Regardless, we may want to bump max clients up since network/cpu/memory
> usage is all pretty much under control - I imagine the nodes can take more
> than 256 clients at a time (I am not a sysadmin/webop though, so I might be
> overlooking something!).
I'm curious what limit we set on the nodes behind aus3.m.o.
Comment 2•10 years ago
|
||
As a point of information, it looks like the older cluster is set up with a MaxClients of 260 while the newer cluster is set up with a MaxClients of 256. Not a huge difference.
Assignee | ||
Comment 3•10 years ago
|
||
This is being worked on over in bug 1126825. Dup'ing this one to that. :)
TL;DR: I think there's an issue with the version of mod_wsgi on some of the nodes, and we're attempting to validate that.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Updated•9 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•