Closed Bug 432518 Opened 16 years ago Closed 16 years ago

geodns deployment

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mrz, Assigned: mrz)

References

Details

tracking
Assignee: server-ops → nobody
Component: Server Operations → Server Operations: Projects
Flags: needs-downtime+
Whiteboard: Tuesday 2008/06/03 @ 8pm
Assignee: nobody → mrz
Blocks: 406267
Whiteboard: Tuesday 2008/06/03 @ 8pm → Tuesday 2008/05/27 @ 8pm
Punting to Dave for nagios... need mysql, replication (geodns02) and process check (named, mysql).
Assignee: mrz → justdave
I bailed on this lastnight for a couple reasons - 

1. no Nagios monitoring
2. Release_Lag monitor - not sure how this works and what it would do after changing RRs
3. Didn't tell oncall this was happening - didn't want to spring this change on oncall @ 10pm.

Component: Server Operations: Projects → Server Operations
Whiteboard: Tuesday 2008/05/27 @ 8pm → needs-monitoring
Release_Lag monitor isn't a problem - 

my ($name, $aliases, $addrtype, $length, @addrs) = gethostbyname('releases.mozilla.org');

will continue to work and grab everything.  May need to change that at some point to make sure to grab all the global mirrors but that could be done as part of bug 406267.
13:08 <@nagios> geodns01.sj:DNS is CRITICAL: CRITICAL - Plugin timed out after 4 seconds

DNS check tries to resolve www.mozilla.com but those two boxes are set with recursion off.  Turn it on?  Make new check?
The test was changed to let us specify a host to test in the service definition.

I've finished double-checking all of the service checks, and everything that's still red in nagios is a real problem, the tests are okay.

named on geodns02 is indeed not resolving releases.geo.mozilla.com, probably because MySQL isn't replicating so it hasn't picked it up yet.  Nagios can't connect to the server to check MySQL because the ACLs to allow nagios in haven't replicated either.  I'm betting that fixing the replication problem will resolve both of the other two as well.
Assignee: justdave → mrz
Whiteboard: needs-monitoring
Restored replication on geodns02.  Required a full mysql re-init, it seems that somehow the geodns01 /etc/my.cnf lost the log-bin line and binary logging wasn't running anymore?

Anyway this is resolved and tested and you can now setup nagios alerts against it if you want as it should all be good.
Nagios alerts were already there, they were just all red before. :)  DNS is still red: https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=geodns02.nl

Is named talking to mysql correctly?
Deployed.  Thanks!
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.