Just got re-imaged. It needs to be setup.
This slave needs the try keys on it and be put back unto the pool.
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #1) > This slave needs the try keys on it and be put back unto the pool. Anything else?
(In reply to Ben Hearsum [:bhearsum] from comment #2) > (In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #1) > > This slave needs the try keys on it and be put back unto the pool. > > Anything else? I have verified that these are the only two things needed. https://wiki.mozilla.org/ReferencePlatforms/Win64#Post-reimaging_steps The twisted patches seems to be a fall out that recent re-imaged slaves have. bug 770506 is filed to fix this.
Back in the try pool.
107 days, 18:04:26 vnc shows that we had not been able to auto-login. Looking at these steps: https://wiki.mozilla.org/ReferencePlatforms/Win64#Fix_auto-login_after_new_password_deployment I wrote this: REG ADD "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon" /v AutoAdminLogon /t REG_SZ /d 1 REG ADD "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon" /v DefaultUserName /t REG_SZ /d cltbld REG ADD "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon" /v DefaultDomainName /t REG_SZ /d "" REG ADD "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon" /v DefaultPassword /t REG_SZ /d YouWish REG DELETE "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon" /v AutoLogonCount
I think that fixed it.
Created attachment 828826 [details] w64-ix-slave41-screencap.png markco: this machine was from day2 batch1 of our win64-rev2 migration. Any idea what could cause the machine to boot into this mode? Is it something we can/should disable?
2008 will boot into that after a crash or power failure. It may be worthwhile to run hardware diagnostics on this machine. We can't disable this, but it should only boot into after a severe event.
Back in production.
Booting issues. See dep bug.
Back in production.
Disabling Slave: b-2008-ix-0020 by... Updating slave b-2008-ix-0020 in slavealloc...Success Gracefully shutting down slave...Success b-2008-ix-0020 - was successfully disabled via slaveapi Reason for disabling: bug 1026870
I can't reach this host after post-image: jlund@Hastings163:~BUC/tools > ping b-2008-ix-0020 PING b-2008-ix-0020.wintry.releng.scl3.mozilla.com (10.26.44.24): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 ^C --- b-2008-ix-0020.wintry.releng.scl3.mozilla.com ping statistics --- 3 packets transmitted, 0 packets received, 100.0% packet loss
Q - any idea what would be happening here after we re-imaged this?
It was up when tested. I will junp on the ipmi interface and see what is up. Also we are now in a state where we can drop the try and build keys automatically. Where can I get and confirm those?
Looks like this machine hung after a reboot command was issued remotely. A reset of the box brought it right back . I rebooted it a few times to test and I am running a disk check to make sure everything is good.
Machine looks good and buildbot starts ( and closes with disabled) after several different reboots. The machine needs keys and a re-enable
I think the new GPO that automatically adds keys did not hit this slave, I could not reach production hosts described here: https://wiki.mozilla.org/ReleaseEngineering/How_To/Adjust_SSH_keys_on_a_slave#Production So I tried putting keys on i so I could enable it but I seem to be having issues reaching other slaves from it. cltbld@B-2008-IX-0020 ~ $ rm -rf .ssh cltbld@B-2008-IX-0020 ~ $ sftp b-2008-ix-0083.build.mozilla.org:.ssh/* .ssh/ Connecting to b-2008-ix-0083.build.mozilla.org... ssh: connect to host b-2008-ix-0083.build.mozilla.org port 22: Bad file number Connection closed I am going to leave this open until it is back in prod.
Enabled and rebooted to production.
Running the wrong version of MSVC 2013. Disabled for reimaging. https://treeherder.mozilla.org/logviewer.html#?job_id=4111237&repo=try
GPO fixes in bug 1026870 are meant to fix this. Re-enabling these machines so they pick up the fix.
Apparently it didn't get the memo about fixing itself, since it's still busted in the same way. Redisabled.
Re-imaged and re-enabled.
loaned to markco in bug 1175982
Re-imaged and returned to production (try).
deallocated from bug 1198317