Closed Bug 1132939 Opened 9 years ago Closed 9 years ago

buildbot-master67's time gets reset when it reboots

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: jlund)

References

()

Details

Feb  9 13:29:10 buildbot-master67.bb.releng.use1.mozilla.com ntpd[12710]: 0.0.0.0 c61c 0c clock_step -17999.988448 s
Feb 12 09:50:26 buildbot-master67.bb.releng.use1.mozilla.com ntpdate[1102]: step time server 10.26.75.40 offset -28799.497552 sec

That's not so good!  I suspect that this is because the underlying host hardware's timezone is misconfigured -- an AWS bug.  Should we (a) report to AWS? (b) terminate and re-create this master in hopes of finding a better instance?
Assignee: relops → dustin
Flags: needinfo?(rail)
We had something similar in bug 962099. I'm not sure how common this in AWS though. :/
Flags: needinfo?(rail)
OK, let's terminate and re-create.
Flags: cab-review?
I can't find where I said it before, but I had issues with my new BMs as well (120..123) with ntpdate doing the very same thing


Feb 12 09:50:26 buildbot-master67.bb.releng.use1.mozilla.com ntpdate[1102]: step time server 10.26.75.40 offset -28799.497552 sec

By many hours.  And in those cases it seemed to happen always after buildbot was up and running.
I won't be available during the TCW, so someone else will need to take care of this.
Assignee: dustin → relops
Assignee: relops → jlund
/me takes

steps:

1) disable in slavealloc
2) python buildfarm/maintenance/manage_masters.py -f buildfarm/maintenance/production-masters.json -H bm67-tests1-linux64 graceful_stop
3) terminate in console
4) aws_create_instance -c configs/buildbot-master -r us-east-1 -s aws-releng -k /builds/aws_manager/secrets/aws-secrets.json --ssh-key ~/.ssh/aws-ssh-key -i ./instance_data/us-east-1.instance_data_master.json buildbot-master67
5) python buildfarm/maintenance/manage_masters.py -f buildfarm/maintenance/production-masters.json -H bm67-tests1-linux64 start
6) enable in slavealloc
7) mark as done and celebrate saturday
Flags: cab-review? → cab-review+
master disabled and stopped, terminated, and is currently in the process of being recreated. puppetizing should complete soon
master has come back up. I noticed that it was reset back to a m3.medium so stopped it, bumped it back up to m3.large, and ensured it had swap.

ni: myself to come back to this slave and ensure its last few jobs look good
Flags: needinfo?(jlund)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
looks healthy
Flags: needinfo?(jlund)
Change Request: --- → approved
Flags: cab-review+
You need to log in before you can comment on or make changes to this bug.