Slave loan request for dmose

RESOLVED FIXED

Status

RESOLVED FIXED
a year ago
6 months ago

People

(Reporter: dmose, Assigned: dmose)

Tracking

Details

(Whiteboard: [buildduty][capacity][buildslaves][loaner])

(Assignee)

Description

a year ago
I need a specific sort of VM which seems to be the only one that can reproduce a race condition I've got.  Unfortunately, the list of machine names at https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_loaner#What_type_of_machines_do_I_need.3F doesn't appear to match anything shown by treeherder or the logs.

https://treeherder.mozilla.org/#/jobs?repo=pine&selectedJob=114062262 is the job in question, and the machine name appears to be the somewhat uninformative "i-03ebcb2cd4250f242".  

The only reason I need this is because the one-click loaners I get keep dying out from under me because they're spot instances.
These are jobs that runs in Taskcluster, for example this job was ran on machine "i-03ebcb2cd4250f242" which in fact was a gecko-t-linux-medium(you can check the logs to see this in [1])
For jobs that ran on Taskcluster you don't need a loner, you can open an Interactive shell that will look exactly as the environment where this job ran.
To do this the following steps are needed:
-from treeherder open the link to the task (in this case task ID is SS5dsw3oShGeTocQIOV_4Q)
-go to Task Actions and select _Create Interactive Task
-chose Shell or Display depending on what you want to see or test


The "i-03ebcb2cd4250f242" is in fact the worker ID,in order to be able to do all these steps you will need to Sign in first,and maybe need some rights,for that please speak with someone from Taskcluster team on iirc #taskcluster or use Ni


[1]https://public-artifacts.taskcluster.net/A4zzHV0oQBmQxq1QVXnOjQ/0/public/logs/live_backing.log
Assignee: nobody → dmose
Email sent to dmose for further instructions. 

Loaning machines: 
    - tst-linux64-ec2-dmose

Hi Dan, 

I am going to assign this to you to keep track of the loan. 

When you are finished with the loan forever, please comment stating so here in the bug, and mark the bug as RESOLVED.

By the way, now that this aws instance has been created, starting and stopping it can happen in a flash!
If you are not going to be using this machine for multiple hours, let us know in this bug and we can stop it.

Comment again when you want it started back up.
* For faster turnaround, ping #releng (look for nick with 'buildduty')
:dmose said that this is a taskcluster job, so a releng buildbot loaner is not going to help him investigate his issue. He said that the problem he's having with the taskcluster oneclick loaners is that they're killed out from underneath him because they're spot instances.
(Assignee)

Comment 4

a year ago
I'm working on [one](https://github.com/mozilla/activity-stream/issues/2830) of a small number of bugs blocking us from turning on Activity Stream as soon as possible.

I had three spot instances killed out from under me yesterday, each time destroying the work in progress.  

I don't think I'll be able to debug this race without a real instance, as I can't reproduce it anywhere else. 

Any help would be much appreciated
I've corrected the loaner LDAP information to be the correct username for :dmose. I still don't think this will get him the instance type he needs, but connection to the loaner assigned should work as soon as LDAP updates.
(Assignee)

Comment 6

a year ago
OK, dustin helped me debug, and it turns out that the problem I was having before was not spot instance killing, but was rather a taskcluster regression.  He has re-opened the bug, and I know now how to workaround the issue as well.

Since this is a buildbot instance anyway, I won't be needing it.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Instance terminated and removed from inventory.
Component: Loan Requests → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.