Change builds-4hr.js.gz nagios check to use HEAD request instead of GET, and increase timeout

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: catlee, Assigned: rbryce)

Tracking

Details

(Reporter)

Description

4 years ago
We're getting some socket timeout failures for the nagios check for builds-4hr.js.gz.

We should be able to do the check with a HEAD request instead of a GET.

While we're at it, let's increase the timeout to 20 seconds.
Assignee: relops → server-ops
Component: RelOps → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: arich → shyam
Bump? This is causing sheriffs to close the trees in bug 960054..
(Assignee)

Comment 2

4 years ago
 (In reply to Chris AtLee [:catlee] from comment #0)
> We're getting some socket timeout failures for the nagios check for
> builds-4hr.js.gz.
> 
> We should be able to do the check with a HEAD request instead of a GET.
> 
> While we're at it, let's increase the timeout to 20 seconds.

The HEAD request doesn't supply the document modification time.  Instead, I suppressed the body from being downloaded, which significantly lowers the size of the response and supplies the mod time. I also added a 20 sec timeout.  

I believe this will alleviate the timeout issues you are seeing.
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(In reply to Rick Bryce [:rbryce] from comment #2)
> The HEAD request doesn't supply the document modification time.  Instead, I
> suppressed the body from being downloaded, which significantly lowers the
> size of the response and supplies the mod time. I also added a 20 sec
> timeout.  
> 
> I believe this will alleviate the timeout issues you are seeing.

The Nagios alerts in bug 960054 comment 11 suggest that the timeout is still 10 seconds, and that the payload is still including the body (see bytes returned in the recovery email).

I don't suppose you could you take another look? :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
By the way, this check was disabled a few days ago while debugging the underlying issue, and re-enabled yesterday (Monday).
(Assignee)

Comment 5

4 years ago
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> (In reply to Rick Bryce [:rbryce] from comment #2)
> > The HEAD request doesn't supply the document modification time.  Instead, I
> > suppressed the body from being downloaded, which significantly lowers the
> > size of the response and supplies the mod time. I also added a 20 sec
> > timeout.  
> > 
> > I believe this will alleviate the timeout issues you are seeing.
> 
> The Nagios alerts in bug 960054 comment 11 suggest that the timeout is still
> 10 seconds, and that the payload is still including the body (see bytes
> returned in the recovery email).
> 
> I don't suppose you could you take another look? :-)

This is odd. I double checked my work to make sure I didnt mis-configure the timeout.  It is correct and should work as is.  I have removed the timeout setting from the check. I'm hoping this will force to use the timeout of 60 seconds.
(Assignee)

Comment 6

4 years ago
I'm hoping this will force to use of the default NRPE timeout, 60 seconds.
(Assignee)

Comment 7

4 years ago
Ed, is this working out for you?
(In reply to Rick Bryce [:rbryce] from comment #7)
> Ed, is this working out for you?

I haven't seen any timeouts since, happy to call this fixed for now :-)
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.