Closed
Bug 762816
Opened 12 years ago
Closed 12 years ago
Nagios change: take away 1 am downtime on intranet2
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
VERIFIED
INVALID
People
(Reporter: scabral, Assigned: afernandez)
Details
between 1-2 am pacific (4 am eastern) I get paged about intranet2 being behind in replication. This is likely due to jabba's defrag that happens nightly. let's downtime replication being behind during this maintenance (We have to lookup exactly when the maintenance is, then check out the nagios history to see how long the downtime should be, and it's just for replication lag).
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-database → afernandez
Reporter | ||
Comment 1•12 years ago
|
||
[root@puppetdashboard1.private.phx1 cron.d]# hostname puppetdashboard1.private.phx1.mozilla.com [root@puppetdashboard1.private.phx1 cron.d]# crontab -l # HEADER: This file was autogenerated at Wed May 30 17:08:40 -0700 2012 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: homeclean MAILTO=infra-notices@mozilla.com 0 3 * * * /usr/local/bin/homeclean.sh > /dev/null # Puppet Name: prune-reports 0 1 * * * cd /usr/share/puppet-dashboard/; /usr/bin/rake RAILS_ENV=production reports:prune upto=4 unit=day # Puppet Name: optimize-db 0 4 * * 0 cd /usr/share/puppet-dashboard/; /usr/bin/rake RAILS_ENV=production db:raw:optimize
Reporter | ||
Comment 2•12 years ago
|
||
Is there an ETA on this? I get paged around 4:45 am Eastern every morning, and it's getting a bit tiresome.
Assignee | ||
Comment 3•12 years ago
|
||
:sheeri for the time being, I increased the replication lag, so shouldn't page you at ~4am EST. Basically did the same thing that was done in Bug 760789. Will fix correctly on Monday.
Assignee | ||
Comment 4•12 years ago
|
||
Just for general update, as far as paging, its working as intended. No rush in fixing correctly but it's in the TODO list.
Reporter | ||
Updated•12 years ago
|
Summary: downtime intranet2 during the 1 am hour maintenance → Nagios change: downtime intranet2 during the 1 am hour maintenance
Comment 5•12 years ago
|
||
Is this regular downtime still required for this? Since the last update, the puppetdashboard DB servers are no more under DBA monitoring. I'm not sure if that is intentional but just adding that observation here.
Reporter | ||
Comment 6•12 years ago
|
||
good point. In fact, intranet doesn't even house puppetdashboard any more. We should probably take the increase in replication lag off intranet2 actually.
Reporter | ||
Updated•12 years ago
|
Summary: Nagios change: downtime intranet2 during the 1 am hour maintenance → Nagios change: take away 1 am downtime on intranet2
Assignee | ||
Comment 7•12 years ago
|
||
The actual change that I made was on: puppetdashboard2.db.phx1.mozilla.com which used "mysql-lazy-repl" Seems these hosts were reinstalled some time ago and the current checks only have; "generic" "hp-servers" As for the purpose of this bug, nothing else to do.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•