Closed Bug 1205227 Opened 10 years ago Closed 10 years ago

Please take a look at t-w732-ix-194

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aselagea, Unassigned)

References

Details

(Whiteboard: imaging issues)

Attachments

(3 files)

For some reason, it times out every job it touches. It has been re-imaged multiple times during the past few months, also the memory and disk diagnostics did not seem to find any issue.
Assignee: relops → server-ops-dcops
Severity: major → normal
Component: RelOps → DCOps
QA Contact: arich
Filed a ticket with IX for a burn in test. ticket id; #SBK-126-27184
colo-trip: --- → scl3
Whiteboard: #SBK-126-27184
I can't find this host in inventory. Did it get renamed to something else?
:vinh if you look at svn revision 108640, new host name; - 'talos-linux32-ix-022.test.releng.scl3.mozilla.com' => { + 't-w732-ix-194.wintest.releng.scl3.mozilla.com' => {
dropped off at IX
Summary: Please take a look at talos-linux32-ix-022 → Please take a look at t-w732-ix-194
Whiteboard: #SBK-126-27184 → #SBK-126-27184 - burn in test at iX
update from IX; "Hello Sal, I wanted to update the ticket to reflect the current status of the node. We placed it into our burn-in for an extended test and everything has tested smoothly. The node is in the final stages of our tests and if all continues to test smoothly, the node will be placed at Will-Call for pick-up at your convenience. An update will be posted to the ticket to confirm such. If you have any further questions or concerns, we are here to help. Thanks"
Picked up host from IX today, it passed the burn-in tests. reimaging.
Whiteboard: #SBK-126-27184 - burn in test at iX → reimaging
Noticed that the monitor is currently connected to the integrated video adapter on motherboard and not the dedicated one (see attachment)
Attached image t-w732-ix-194.PNG
changed video output to external adapter and reimaged as i dont believe the external video drivers were installed properly; still reimaging.
any known issues with w7 reimaging? it looks like im running into the same issue after a reimage. the host will get stuck at the windows splash screen (still hasnt finished booting up the login prompt) and will hang there. the host will be pingable and sshable. however if i reboot the host it will boot to same windows splash screen then the o/s will crash and the screen will black out. no ping and no ssh even after subsequent reboots. i saw a message that complained of a bad driver after a reimage. let me try to get the exact message if possible.
Flags: needinfo?(q)
Flags: needinfo?(arich)
Flags: needinfo?(arich)
If we can get a pic that would be great
Flags: needinfo?(q)
Attached image bad driver
>If we can get a pic that would be great attached. did a reimage, host rebooted after installation completes then fails to completely boot up. performed a start up repair to get these messages.
A remote command reimage of the machine worked Looking at it now). However the we can't load the NVidia control panel with a monitor plugged into the onboard card since "there is no display attached to an NVidia gpu" To fool the system we have to have the onboard card set to "disabled" and the no monitor plugged in.
Any updates on this? Thanks!
Flags: needinfo?(q)
Flags: needinfo?(q) → needinfo?(vle)
>However the we can't load the NVidia control panel with a monitor plugged into the onboard card since "there is no display attached to an NVidia gpu" To fool the system we have to have the onboard card set to "disabled" and the no monitor plugged in. this is a win7 host so onboard has been disabled prior to reimage and no monitor is connected to the onboard as no video would be redirected there. i went ahead and started another reimage and removed the monitor (to external adapter), will check back in ~30 minutes.
Flags: needinfo?(vle)
same issue. rebooted and reimaged host with no external cables attached and host is still hanging during the 2nd reboot (after initial o/s install). host is unresponsive on kvm so it seems like it's crashing somewhere. :arr, im not able to ssh in to the few hosts i tested from your w7 servers you retasked this morning. are they imaging fine for you? mozillas-MacBook-Air-2:~ vle$ ssh !$ ssh t-w732-ix-204.wintest.releng.scl3.mozilla.com Received disconnect from 10.26.41.254: 2: Handshake failed Disconnected from 10.26.41.254 mozillas-MacBook-Air-2:~ vle$ ssh t-w732-ix-205.wintest.releng.scl3.mozilla.com Received disconnect from 10.26.42.19: 2: Handshake failed
Flags: needinfo?(arich)
Attached image 20151027_152006.jpg
new error message when boot up fails and the system attempts to fix, i dont see the bad driver error this time.
This doesn't look like it ran a new install (which should wipe the disk) this looks like it tried to pickup an incomplete previous install.
van: Some of the reimages worked, some didn't. You can see which by looking at the parent bug.
Flags: needinfo?(arich)
>This doesn't look like it ran a new install (which should wipe the disk) this looks like it tried to pickup an incomplete previous install i went ahead and pxebooted, chose the local disk format option, let it does its thing, and it rebooted. confirmed o/s was no longer present by letting it complete the boot process. rebooted and reimaged to same error/issues.
Whiteboard: reimaging → imaging issues
Let's try replacing the video card?
I've replaced the video card, reimaging in progress.
Host back online vhua$ ssh t-w732-ix-194.wintest.releng.scl3.mozilla.com The authenticity of host 't-w732-ix-194.wintest.releng.scl3.mozilla.com (10.26.42.18)' can't be established. RSA key fingerprint is 3b:f9:39:8d:96:4f:0c:c8:4d:be:df:9c:2c:44:09:95. Are you sure you want to continue connecting (yes/no)?
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Did that fix the issue?
Looks much better!
I rebooted the host 3 times and each time it booted up to the correct video card and resolution.
back in the slave pool
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: