Closed Bug 406267 Opened 17 years ago Closed 16 years ago

Releases_Lag nagios test should automatically add/remove servers from DNS

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justdave, Assigned: reed)

References

Details

We currently have a nagios test which monitors the servers in the releases.mozilla.org round-robin and pages when they get too far behind or become unreachable or too slow.  It also provides an overview and statistics at https://nagios.mozilla.org/ftplag/

We should fix this test so that it automatically adds/removes the mirrors from the pool instead of paging about them, and only pages when less than a certain percentage of them are left in the pool (say half).

This would reduce work for the sysadmins, which still letting us know when there are real problems.
Assignee: server-ops → nobody
Component: Server Operations → Server Operations: Projects
Can we just put this behind the netscaler?
Putting it behind the netscaler would end up routing all the traffic through us, not sure if we have that kind of bandwidth actually.  We're talking about roughly 8 Gbit of traffic during releases here, I'm guessing.
Good point.
I wonder if we could use the global load balancing stuff to do this.
GSLB on the Netscaler requires that the IPs returned in a query live behind the Netscaler.

You could probably do something with bind to query and external something that had an array of "up" boxes.  Actually, I bet something like http://mysql-bind.sourceforge.net/ would be something to look at and would be an interesting candidate for NDB !
Given we only have 4-5 hosts in releases, a far simpler approach might be just to have a script re-generate the mozilla.org-soa file and update the serial # every 15 minutes...adding mysql seems to add a lot of complexity for a 4 element array.
There's 14 actually, not 4.
And yeah, that was what I was kind of thinking of if I ever got time to mess with it was to take the releases section of the mozilla.org-ftp zone file and make it an $INCLUDE to a file that was generated by the monitoring script, and have it update the serial in the soa file every time it modified it.
Assignee: nobody → mrz
Whiteboard: pending geodns
Depends on: 432518
Assignee: mrz → reed
Changing QA Contact.
QA Contact: justin → mrz
Component: Server Operations: Projects → Server Operations
Whiteboard: pending geodns
Whiteboard: Waiting on geodns.pl mods from xb95
So, the script already allowed enable/disable by IP.  I've updated it so it can now operate on hostnames (descriptions) as well:

[root@geodns01 ~]# ./geodns.pl --list | grep sand
45 ) US GLOBAL irc-mozilla-org.geo.mozilla.com       A 63.245.208.159  60     sand.mozilla.org

[root@geodns01 ~]# ./geodns.pl --enable sand.mozilla.org
No change necessary.

[root@geodns01 ~]# ./geodns.pl --disable sand.mozilla.org
Disabling 45.

Rebuilding views...
    building CN from CC = CN
    building US from region = North America
    building GLOBAL from global entries
    building JP from global entries (no enabled, matching entries found)
    building EU from region = Europe
Rebuilt views.

If you try to specify something not distinct enough, it will error:

[root@geodns01 ~]# ./geodns.pl --enable mozilla
Found more than one matching description.

Does this work?
(In reply to comment #10)
> Does this work?

Looking good, but what about descriptions with spaces in them?

[root@geodns01 ~]# ./geodns.pl --enable "trillian.gtlib.gatech.edu - ipv6"
Unable to find host by ID, IP, or description.
I assumed your script wouldn't know to put " - ipv6" on there as the IP would resolve to just trillian.gtlib.gatech.edu.  In effect it's doing a match against he first word.

Do you need it to support matching the entire phrase?  That seems like it'd be something a human would type.  In that case you presumably already have the id and can use that?
(In reply to comment #12)
> Do you need it to support matching the entire phrase?  That seems like it'd be
> something a human would type.  In that case you presumably already have the id
> and can use that?

90 ) US GLOBAL releases.geo.mozilla.com              A 128.61.111.9    60     trillian.gtlib.gatech.edu - ipv4
91 ) US GLOBAL releases.geo.mozilla.com              AAAA 2610:148:fd80:3d6f:209:3dff:fe12:7bf9 60     trillian.gtlib.gatech.edu - ipv6

I need to be able to differentiate those two descriptions... "trillian.gtlib.gatech.edu - ipv4" and "trillian.gtlib.gatech.edu - ipv6".
Alright, fixed it so it works as expected now:

[root@geodns01 ~]# ./geodns.pl --disable "trillian.gtlib.gatech.edu - ipv6"
Disabling 91.

Rebuilding views...
    building CN from CC = CN
    building US from region = North America
    building GLOBAL from global entries
    building JP from global entries (no enabled, matching entries found)
    building EU from region = Europe
Rebuilt views.

I also fixed a bug in that it was only selecting A records, now it includes AAAA and CNAME as well.
Whiteboard: Waiting on geodns.pl mods from xb95 → Working on new script to use xb95's new features
Whiteboard: Working on new script to use xb95's new features → Script made and working; need to add count percentage limit check
Pages critical if percentage of active servers goes under 40%.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: Script made and working; need to add count percentage limit check
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.