buildbot-master73.srv.releng.usw2.mozilla.com is DOWN

RESOLVED WORKSFORME

Status

Release Engineering
Buildduty
P2
normal
RESOLVED WORKSFORME
3 years ago
3 years ago

People

(Reporter: Callek, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Just now, got this in #buildduty:

[18:58:50]	nagios-releng	Tue 15:58:53 PDT [4202] buildbot-master73.srv.releng.usw2.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[19:04:09]	nagios-releng	Tue 16:04:12 PDT [4204] buildbot-master73.srv.releng.usw2.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%

No other usw2 errors.

per AWS console host is running: 
  10.132.49.181  /// 54.186.55.202

centos-6-x86_64-server-2013-06-24-21-57 (ami-516cfc61)

No scheduled events:
Launch time
March 6, 2014 5:14:18 AM UTC-5 (5172 hours)

System reachability check failed at October 7, 2014 7:06:00 PM UTC-4 (1 minutes ago)

I did a direct "reboot" via AWS console, and hoping this comes back fine.
(Reporter)

Comment 1

3 years ago
the reboot didn't work to clear nagios, so I did "stop" --> wait a minute after aws says its done --> "start"
(Reporter)

Comment 2

3 years ago
And right after I did that, I got an e-mail, which corresponds to this instance:

Dear Amazon EC2 Customer,

One or more of your Amazon EC2 instances in the us-west-2 region is scheduled for retirement. The following instance(s) will be shut down after 12:00 AM UTC on 2014-10-22.

  i-6e3f5a67

We recommend that you launch a replacement for each retiring instance and begin migrating to it. You can do this by stopping and re-starting your instance, or by terminating it and launching a new one in its place.

You can see more information on the instances scheduled for retirement in the AWS Management Console at https://console.aws.amazon.com/ec2/home?region=us-west-2#s=Events

For more information about scheduled retirement events, please see Monitoring Scheduled Events in the EC2 user guide: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html

If your instance's root device is an EBS volume, the instance will be stopped after the retirement date, and you can start it again at any time. You can prevent retirement for this instance by issuing a stop and start from the AWS Management Console. Doing so will migrate your instance to new hardware and help reduce unforeseen downtime. For more information about how to stop and start your instance please see Stopping and Starting Instances in the EC2 user guide: http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/starting-stopping-instances.html

In case of difficulties stopping your instance, please see the Instance FAQ: http://aws.amazon.com/instance-help/#ebs-stuck-stopping

If your instance's root device is an instance store, it will be terminated after the retirement date. We recommend that you launch a replacement instance from your most recent AMI and migrate all necessary data to the replacement instance before this time.

Warning: Any data not stored in an EBS volume will be lost when your instance is stopped or terminated.

If you have any questions or concerns, you can contact the AWS Support Team on the community forums and via AWS Premium Support at: http://aws.amazon.com/support

Sincerely,
Amazon Web Services

This message was produced and distributed by Amazon Web Services LLC, 410 Terry Avenue North, Seattle, Washington 98109-5210

Reference: cd757166-bdc9-4682-b2f9-8065feeb7b8e
(Assignee)

Comment 3

3 years ago
(In reply to Justin Wood (:Callek) from comment #2)
> One or more of your Amazon EC2 instances in the us-west-2 region is
> scheduled for retirement. The following instance(s) will be shut down after
> 12:00 AM UTC on 2014-10-22.
> 
>   i-6e3f5a67

That's the instance ID for the one you just created. It looks like we were just unlucky and ended up on hardware that's going away.

I'll try to recreate the master tomorrow.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Don't need to recreate - just disable in slavealloc + graceful shutdown, then shutdown and restart.
(Assignee)

Comment 5

3 years ago
This planned event disappeared from the AWS console, so I'm marking this as WORKSFORME.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.