Closed Bug 1123901 Opened 10 years ago Closed 10 years ago

aus4 backends max client checks time out relatively frequently

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 1126825

People

(Reporter: bhearsum, Assigned: nmaul)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316] )

bhearsum@mozilla.com (:bhearsum)

Reporter

Description

•

10 years ago

Catlee noticed this check failing today: 16:35:51 nagios-phx1 | Tue 13:35:51 PST [1341] aus2.webapp.phx1.mozilla.com:httpd max clients is CRITICAL: (Service Check Timed Out) (http://m.mozilla.org/httpd+max+clients) │ Q I originally thought we might be actually hitting load issues, but then I noticed the history of these checks failing: https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20httpd%20aus%20webapp%20comp%3AMOC&list_id=11886145 Most of them are timeouts, not actually hitting max clients. All of the ones in the list above happened prior to enabling the release channel (which has the vast majority of the traffic), and some happen after caching was enabled. Given that, and the fact that there's no other service checks timing out, it makes me wonder if there's a problem with the max clients check itself. Eg, the nagios plugin on the servers can't respond in time for some reason other than load. Regardless, we may want to bump max clients up since network/cpu/memory usage is all pretty much under control - I imagine the nodes can take more than 256 clients at a time (I am not a sysadmin/webop though, so I might be overlooking something!).

:kanban

Updated

•

10 years ago

Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316]

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

10 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #0) > Regardless, we may want to bump max clients up since network/cpu/memory > usage is all pretty much under control - I imagine the nodes can take more > than 256 clients at a time (I am not a sysadmin/webop though, so I might be > overlooking something!). I'm curious what limit we set on the nodes behind aus3.m.o.

C. Liang [:cyliang]

Comment 2

•

10 years ago

As a point of information, it looks like the older cluster is set up with a MaxClients of 260 while the newer cluster is set up with a MaxClients of 256. Not a huge difference.

Jake Maul [:jakem]

Assignee

Comment 3

•

10 years ago

This is being worked on over in bug 1126825. Dup'ing this one to that. :) TL;DR: I think there's an issue with the version of mod_wsgi on some of the nodes, and we're attempting to validate that.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → DUPLICATE

:kanban

Updated

•

10 years ago

Assignee: server-ops-webops → nmaul

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

aus4 backends max client checks time out relatively frequently

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

Tracking

(Not tracked)

People

(Reporter: bhearsum, Assigned: nmaul)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/316] )

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated