Closed
Bug 608889
Opened 15 years ago
Closed 14 years ago
add tegras to nagios
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: joduinn, Assigned: arich)
References
Details
(Whiteboard: [added to DNS/inventory])
These tegra machines need to be added to nagios as we move them to production.
tegra-001.build.m.o
...
tegra-013.build.m.o
These should be treated in the same way as the n810s and n900 - slower reboot times, etc.
| Reporter | ||
Comment 1•15 years ago
|
||
(In reply to comment #1)
> These tegra machines need to be added to nagios as we move them to production.
>
> tegra-001.build.m.o
> ...
> tegra-013.build.m.o
>
> These should be treated in the same way as the n810s and n900 - slower reboot
> times, etc.
Aki just reminded me of a donated machine from Joel, so this should be tegra-001...tegra-014.
Blocks: 608747
Updated•15 years ago
|
Assignee: server-ops → jdow
Comment 2•15 years ago
|
||
These haven't been added to DNS, DHCP or Inventory yet. I'll need a list of MAC addresses and serial numbers before I can proceed.
Comment 3•15 years ago
|
||
Still seem to be missing from reverse DNS - needed for nagios to work.
Assignee: jdow → jlazaro
Updated•15 years ago
|
Whiteboard: [needs to be added to inventory/DNS first]
Updated•14 years ago
|
Whiteboard: [needs to be added to inventory/DNS first] → [added to DNS/inventory]
Comment 4•14 years ago
|
||
I think arr is doing the nagios work for releng these days. I'm not sure what the current status of this is, but I don't think jlaz will be getting to it, due to his shift to services ops.
Amy, can you get these tegras added to nagios? I imagine just a ping check with some kind of lazy_host or very_lazy_host directive.
Assignee: jlazaro → arich
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Comment 5•14 years ago
|
||
This is now tegra-001 through tegra-093.
| Assignee | ||
Comment 6•14 years ago
|
||
Aki: In addition to a ping check, zandr mentioned that checking port 20701 might be useful to monitor. Dustin was under the impression that this might not be stable enough to check yet, though. Do you want anything other than a basic slow ping check?
Status: NEW → ASSIGNED
Comment 7•14 years ago
|
||
you can do a simple socket open check on 20701 - the agent will respond if it's alive
| Assignee | ||
Comment 8•14 years ago
|
||
I've added the ping check and matched it to the n900s. If someone could please write up the information for the 20701 check and add it to this ticket, I'll implement that as well.
For reference:
http://nagiosplugins.org/man/check_tcp
And the check timing numbers I'd need:
normal_check_interval
retry_check_interval
max_check_attempts
first_notification_delay
Thanks!
| Assignee | ||
Updated•14 years ago
|
Assignee: arich → bear
Comment 9•14 years ago
|
||
The port 20701 check should respond with
$ telnet tegra-001 20701
Trying 10.250.48.251...
Connected to tegra-001.build.mtv1.mozilla.com.
Escape character is '^]'.
$>
(I imagine the $> is the response, and the rest is telnet output)
The frequency should probably match the ping check.
(Bear can add any further information or clarification)
Comment 10•14 years ago
|
||
(In reply to comment #9)
> The port 20701 check should respond with
>
> $ telnet tegra-001 20701
> Trying 10.250.48.251...
> Connected to tegra-001.build.mtv1.mozilla.com.
> Escape character is '^]'.
> $>
>
> (I imagine the $> is the response, and the rest is telnet output)
aki is correct, you will only see "$>" with no crlf
any response should be "quit\n" but IMO that is not required as closing the socket works as well
>
> The frequency should probably match the ping check.
>
> (Bear can add any further information or clarification)
Updated•14 years ago
|
Assignee: bear → arich
| Assignee | ||
Comment 11•14 years ago
|
||
Since the prompt on the tegras is two metacharacters and does not play well with the nagios config file, I'm doing a simple tcp connection to the port. If we change the prompt to be something easier to check in the future, we can open a new bug and I can go back and rework the check. For now, I've rolled the simple tcp socket check out to all of the tegras. I would expect several of the tegras that are down to send out notifications later today once they finally accumulate enough failed tries.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 12•14 years ago
|
||
notifications were seen for the tegras that were offline - thanks!
Status: RESOLVED → VERIFIED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•