Check monitoring for aus5.mozilla.org

RESOLVED FIXED

Status

Infrastructure & Operations Graveyard
WebOps: Product Delivery
--
major
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: nthomas, Assigned: cyliang)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2076] )

(Reporter)

Description

2 years ago
Last week aus5.mozilla.org wasn't working - connections were being refused at the ZLBs in all the phx1 evac fun - and nagios missed it. We're serving nightly, aurora, and beta using aus5.mozilla.org, and once 42.0 ships that'll be on it too.

Please check we have nagios monitoring in place to catch problems, copying over all the checks we have on aus4.mozilla.org.

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2076]
(Assignee)

Updated

2 years ago
Assignee: server-ops-webops → cliang
(Assignee)

Comment 1

2 years ago
I've added a check for aus5.mozilla.org that is similar to the general check for aus4.mozilla.org, which verfies that VIP for aus5.mozilla.org is up and the SSL cert for that site is good (not set to expire soon).  

For aus4.mozilla.org, there looks like there is an additional check verifying that a specific file is available:
    https://aus4.mozilla.org/update/3/Firefox/14.0a1/20120222174716/WINNT_x86-msvc/en-US/nightly/Windows%205.1/default/default/update.xml

I didn't know if it made sense to have a similar check for aus5.mozilla.org or not (given that they currently share the same back-end infrastructure).  If so, please let me know which URL I should use (the one given for the aus4.mozilla.org check or a new one).  If not, let me know and/or close out this bug.
Flags: needinfo?(nthomas)
(Reporter)

Comment 2

2 years ago
Lets add a new check on aus5, since that's the main domain going forward. Here's an updated url for that:

https://aus5.mozilla.org/update/3/Firefox/45.0a1/20151114030404/WINNT_x86-msvc-x86/en-US/nightly/Windows_NT%206.0.2.0%20%28x86%29/default/default/update.xml

Thanks for setting this up.
Flags: needinfo?(nthomas)
(Assignee)

Comment 3

2 years ago
I've added in a check for the update.xml file:

   HTTP OK: HTTP/1.1 200 OK - 884 bytes in 0.118 second response time

If we find that additional checks are needed or they need to be re-routed, please open a new bug. =)
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.