Closed Bug 1043530 Opened 10 years ago Closed 10 years ago

Loan an ami-6a395a5a instance to Aaron Klotz

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: bugzilla)

References

Details

Go figure, right after I say that it's okay to close bug 974634, it turns out I need a loaner Amazon instance again so that I can run win32-build tests on Windows Server 2012 64-bit.
Assignee: nobody → jlund
I was not able to get to this in my queue today. I will set this up tomorrow. Thank you for your patience.
I am guessing based off past work on this: https://bugzilla.mozilla.org/show_bug.cgi?id=974634#c2 we actually want an AMI of ami-c6c8a6f6?
Yeah I basically need the same thing as in that bug.
I have been fighting this for quite a while. I ended up hacking a some old scripts we had for this imi that I believe were only used on our previous ec2 creating host.

jhopkins is no longer with us and this previous loaner set up is not heavily documented. None the less, I think I have an instance running to the spec requested. However it seems I do not have the keys to it and can not ssh.

I am polling some people from releng/relops to try and figure out what is going on. Apologies for the delay

(aws_manager)[buildduty@aws-manager1.srv.releng.scl3.mozilla.com scripts]$ python aws_manage_instances.py status tst-win64-ec2-aklotz
2014-07-25 20:17:14,074 - INFO - Found tst-win64-ec2-aklotz (i-3879b537)...
   Name: tst-win64-ec2-aklotz
     ID: i-3879b537
     IP: 10.132.58.95
Enabled: True
  State: running
   Tags: Name -> tst-win64-ec2-aklotz, created -> 2014-07-26 03:00:02 PST, moz-type -> tst-win64, FQDN -> tst-win64-ec2-aklotz.test.releng.usw2.mozilla.com, moz-state -> ready, moz-bug -> 1043530, moz-loaned-to -> aklotz@mozilla.com
OK, looks the cloud-tools repo shows there are hints of using the deploypass for access to the machine. I used that against administrator and I was able to get in. Progress...

Now in terms of cleaning and prepping for loaner, usually for win machines we file a relops bug: https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#t-xp32-ix.2C_t-w732-ix.2C_t-w864-ix.2C_and_w64-ix-slave

but as this is an ec2 instance and is set up abnormally, I'm a bit lost as to what to do.

arr: do you have any ideas to my thoughts above? I would like to at the very least, do whatever we do here: https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#t-xp32-ix.2C_t-w732-ix.2C_t-w864-ix.2C_w64-ix-slave
Flags: needinfo?(arich)
and after all that, I was just going through cleaning old lost aws instances and it turns out your previous one was never properly destroyed.

previous instance fqdn that is now running: rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com

I resurrected it and it should be running as before. I am going to create a tracking bug for this instance so when you are done with it here, it can be found and terminated properly.

Once you confirm you can log into your old instance, I'll terminate the one I created for you in this bug and give up on that effort.

I am leaving the needinfo open as I would like to document the process of creating new windows ec2 instances so this will be faster in the future.
aaron, can you confirm that rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com works for you as before when time permits and I'll then assign this bug to you fro tracking it. thanks :)
Flags: needinfo?(aklotz)
Since it doesn't involve hardware, relops hasn't ever been involved with the loan process for AWS nodes.  I'd say you want to wipe the credentials, just like you would do on hardware, then just delete the node when you're done.
Flags: needinfo?(arich)
ssh: rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com: no address associated with name
Flags: needinfo?(aklotz) → needinfo?(jlund)
oh fun. I am going to dump debugging steps here so they are documented...

you can see below that querying aws I get an fqdn of: rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com and ip 10.132.59.91

(aws_manager)[buildduty@aws-manager1.srv.releng.scl3.mozilla.com scripts]$ python aws_manage_instances.py status rbt-w64-ec2-aklotz
2014-07-29 12:29:20,664 - INFO - Found rbt-w64-ec2-aklotz (i-88da3d80)...
   Name: rbt-w64-ec2-aklotz
     ID: i-88da3d80
     IP: 10.132.59.91
Enabled: True
  State: running
   Tags: Name -> rbt-w64-ec2-aklotz, created -> 2014-03-17 17:47:27 PST, moz-type -> tst-win64, FQDN -> rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com, moz-state -> ready, moz-loaned-to -> tabraldes
========================================================================

but if I look at inventory I get: fqdn rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com and ip: 10.132.58.208


if I ssh with the inventory fqdn or ip, it resolves to a linux spot instance:

jlund@Hastings163:~BUC/cedar
> ssh cltbld@rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com
The authenticity of host 'rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com (10.132.58.208)' can't be established.
[cltbld@tst-linux32-spot-581.test.releng.usw2.mozilla.com ~]$

if I ssh with aws given fqdn:
ssh: rbt-w64-ec2-aklotz.test.releng.usw2.mozilla.com: no address associated with name

if I ssh with aws given ip:
ssh Administrator@10.132.59.91
# success!! I can get into the correct instance: rbt-w64-ec2-aklotz


So it looks like this instance was recreated. the fqdn changed in the recreation process and inventory/aws-tags were never properly updated

Aaron, while I look into fixing up DNS, could you ssh directly with the ip of the host for now: 10.132.59.91
Flags: needinfo?(jlund)
Blocks: 974634
I'm seeing a "bad file number" error when trying to ssh to that ip. Any ideas?
Flags: needinfo?(jlund)
hmm, weird, I am not sure why. That sounds like a blocked port number or else you're getting blocked while trying to use the ip. Regardless, I deleted old inventory records and re-created new ones so DNS should correctly resolve against:

ip: 10.132.59.91
fqdn: rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com

So now you can use the fqdn and it will point to the right machine.

I also then confirmed that - user:aklotz@mozilla.com and ip:10.132.59.91 is on vpn loaner group list

aklotz, you should be able to ssh in via: Administrator@rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com and the password sent via email (the one from jhopkins).

If you can still not get access to this host, please ping the person currently on buildduty (pmoore) here or on irc in #releng. At which point, I propose we continue efforts with the new loaner slave I created in comment 5.
Flags: needinfo?(jlund)
> aklotz, you should be able to ssh in via:
> Administrator@rbt-w64-ec2-aklotz.dev.releng.usw2.mozilla.com and the
> password sent via email (the one from jhopkins).
> 
> If you can still not get access to this host, please ping the person
> currently on buildduty (pmoore) here or on irc in #releng. At which point, I
> propose we continue efforts with the new loaner slave I created in comment 5.

aklotz, gentle ping. any success with this?
Flags: needinfo?(aklotz)
Yes, I'm in. Thanks!
Flags: needinfo?(aklotz)
great! I am going to assign this bug to you.

When you are finished with the loan forever, please comment stating so and mark this bug as resolved.
Assignee: jlund → aklotz
Status: NEW → ASSIGNED
FTR - the instance that did not end up being used for this loan: tst-win64-ec2-aklotz.test.releng.usw2.mozilla.com (comment 4) was terminated and inventory records destroyed
Hi Aaron,

Do you still need this loaner?

Thanks,
Pete
Flags: needinfo?(aklotz)
From what I was told yesterday, I do not believe that I need this loaner anymore.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(aklotz)
Resolution: --- → FIXED
Component: Loan Requests → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.