Closed
Bug 1200180
Opened 10 years ago
Closed 9 years ago
Post reimage processes failing to produce functional windows test hosts
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: vciobancai, Assigned: q)
References
Details
(Whiteboard: [windows])
Attachments
(1 file)
24.52 KB,
image/png
|
Details |
No description provided.
Reporter | ||
Comment 1•10 years ago
|
||
Started a reimage process for two slaves (t-xp32-ix-030 and t-w864-ix-020) but in the process of reimage I received the following error in MDT: "An error has ocuurred in the script on this page; ERROR: Name redefined"
Updated•10 years ago
|
Assignee: relops → q
It looks like troubleshooting XP installs this weekend also broke w7/8 images. please stand by.
t-w864-ix-020 reinstalled touch less and took a mocha test and succeeded. A pgo test is running now.
Reporter | ||
Comment 6•9 years ago
|
||
Re-image failed for the following slave t-w732-ix-117
Comment 7•9 years ago
|
||
Because there's no video output on the w7 boxes, we don't know if this is an imaging issue or a problem with the box.
Comment 8•9 years ago
|
||
:vladc: I saw that dcops rebooted t-w732-ix-117 in bug 1200531. Did you try a reimage after that?
Flags: needinfo?(vlad.ciobancai)
Comment 9•9 years ago
|
||
DO we have a list of XP machines that need reimaging? Q has been working on a fix, and it would be good to have a number of machines to test out on.
Flags: needinfo?(kmoir)
Flags: needinfo?(alin.selagea)
Reporter | ||
Comment 11•9 years ago
|
||
Below you can find the xp slaves that needs to be re-imaged:
- t-xp32-ix-030 (bug 1198420)
- t-xp32-ix-032 (bug 1201396)
- t-xp32-ix-004 (needs a re-image , according to this bug 880784 has some problems with the jobs)
- t-xp32-ix-033 (needs a re-image , according to this bug 959635)
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(vlad.ciobancai)
Reporter | ||
Comment 12•9 years ago
|
||
(In reply to Amy Rich [:arr] [:arich] from comment #8)
> :vladc: I saw that dcops rebooted t-w732-ix-117 in bug 1200531. Did you try
> a reimage after that?
Amy I re-installed the t-w732-ix-117 slave and the process of re-image worked without any problems. The slave has been re-enabled in slavealloc
Comment 13•9 years ago
|
||
(In reply to Amy Rich [:arr] [:arich] from comment #8)
> :vladc: I saw that dcops rebooted t-w732-ix-117 in bug 1200531. Did you try
> a reimage after that?
Yesterday I also re-imaged t-w864-ix-158 and it worked fine.
Flags: needinfo?(alin.selagea)
Comment 14•9 years ago
|
||
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-w864-ix&name=t-w864-ix-158 - it didn't work fine, it did 1.5 jobs, needed a reboot, and did .5 jobs after that, and is sitting idle. https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-w864-ix&name=t-w864-ix-092 is the best Win8 slave which has been reimaged in the last two weeks, because once it did two jobs in a row instead of 1.5. In general, they do 1.5, sit idle, do 1.5, sit idle, and settle into doing .5 and then idling.
And t-w732-ix-117 has some sort of graphics problem, resolution or the wrong graphics card or whatever it is that causes them to fail every webgl test.
Comment 15•9 years ago
|
||
t-w732-ix-159 also got reimaged this week, also disabled for broken graphics.
Comment 16•9 years ago
|
||
vladc: hm, do you have a list of all windows machines you've reimaged since 2015-08-31? Q: we should correlate those and see if they're all failing or some subset or...
Philor: are you seeing machines with the same failure modes that haven't been reimaged since then?
Flags: needinfo?(vlad.ciobancai)
Flags: needinfo?(q)
Flags: needinfo?(philringnalda)
Comment 17•9 years ago
|
||
No, every Win8 slave which is doing 1.5 jobs and then stopping is one which has been reimaged this week or last week, every Win7 slave with busted graphics is one which has been reimaged this week or last week.
Flags: needinfo?(philringnalda)
Comment 18•9 years ago
|
||
And the reverse: to the best of my knowledge (not everyone always admits in a bug when they reimage something), every Win8 and Win7 slave which has been reimaged this week or last week is broken.
Updated•9 years ago
|
Blocks: t-w864-ix-043
Updated•9 years ago
|
Updated•9 years ago
|
Blocks: t-w732-ix-159, t-w732-ix-117
Updated•9 years ago
|
Summary: reimage failed for t-xp32-ix and t-w864-ix → reimaging failing to produce functional windows test hosts
Assignee | ||
Comment 19•9 years ago
|
||
Let's halt reimages. I will dive into a reimaged machine a start verifiing state. We can roll back if we need to and start a new win 10 server.
Assignee | ||
Comment 20•9 years ago
|
||
So it isn't ALL reimages t-w864-ix-020 was reimaged after the update as noted in comment 5 and has been running fine with tests and has proper resolution. (http://buildbot-master119.bb.releng.scl3.mozilla.com:8201/buildslaves/t-w864-ix-020). Still looking for a common link and state on broken machines.
Flags: needinfo?(q)
Assignee | ||
Comment 21•9 years ago
|
||
t-w864-ix-092 appears to be hardware locked up it will not respond over OOB KVM even though the screen renders. VNC and ssh are not working. I am cold resetting via ipmi commands now
Assignee | ||
Comment 22•9 years ago
|
||
It looks like t-w864-ix-158 was enabled during imaging and it rebooted before the NVidia drivers were finished installing. After the next reboot the machine came back and the install repaired. That allowed the resolution scripts to run and it has the correct resolution now. I don't think we have a technological problem with reimaging on windows 8 at this point. I am diving into the windows 7 issues now
Assignee | ||
Comment 23•9 years ago
|
||
It does appear we can get into a race condition if the NVidia drivers are interrupted. I am adding some pieces to the install bat to delete the scheduled task if the install is good which should fix the condition.
Summary: reimaging failing to produce functional windows test hosts → Post reimage processes failing to produce functional windows test hosts
Assignee | ||
Comment 24•9 years ago
|
||
Confirmed that win8 hosts are looking better. Windows 7 hosts have been slower to trouble shoot due to their graphics setups but it looks like t-w732-ix-159 still has a dell 2048wfp monitor plugged into it causing the resolution to get set to the wrong value. t-w732-ix-1117 had the wrong setting for the onboard VGA card. After some research this was my fault since I had reset the settings for testing and I did put them back before vlad reimaged. After resetting the BIOS setting 117 looks good so far.
Assignee | ||
Comment 25•9 years ago
|
||
comment 24 should read "and I did NOT put them back before vlad reimaged."
Comment 26•9 years ago
|
||
We also looked at the win7&8 slaves and found that the following ones have been re-imaged since August 31:
Windows 8:
t-w864-ix-025
t-w864-ix-092
t-w864-ix-158
t-w864-ix-020
t-w864-ix-043
Windows 7:
t-w732-ix-117
t-w732-ix-159
Flags: needinfo?(vlad.ciobancai)
Assignee | ||
Comment 27•9 years ago
|
||
XP installs are back online. I am checking through windows 7 as it also touches the 32 bit install pipeline to make sure no xp changes affected it. However, the test machine t-xp32-ix-033 is taking tests and passing after a touchless install:
http://buildbot-master119.bb.releng.scl3.mozilla.com:8201/buildslaves/t-xp32-ix-033
Comment 28•9 years ago
|
||
Also re-imaged t-xp32-ix-030, t-xp32-ix-032, t-xp32-ix-004 and returned them to the pool. Noticed that the tests are passing on each of them.
Comment 29•9 years ago
|
||
So,
Windows 8:
t-w864-ix-025 - hasn't been reenabled since 2015-09-15
t-w864-ix-092 - enabled, but hasn't taken a job or been rebootable since 2015-09-17
t-w864-ix-158 - redisabled, failing video tests
t-w864-ix-020 - redisabled, failing video tests
t-w864-ix-043 - hasn't been reenabled since 2015-09-10
Windows 7:
t-w732-ix-117 - disabled, failing video tests
t-w732-ix-159 - hasn't been enabled since 2015-09-14
Windows XP:
t-xp32-ix-030 - disabled, unable to do webgl and running at 4-bit color
t-xp32-ix-032 - so far, still enabled
t-xp32-ix-004 - so far, still enabled
Comment 30•9 years ago
|
||
bug 1207160 is possibly related and talks about the two w7 machines mentioned in comment 29.
Comment 31•9 years ago
|
||
Can we please attach the dxdiag of all failing machines plus a desktop screenshot?
Could we also check that the info in here [1] is still valid for working machines of each pool?
Could we also attach to that wiki a dxdiag for a working machine of each pool?
[1] https://wiki.mozilla.org/ReleaseEngineering:Buildduty:Slave_Management#Working_graphical_setup
Comment 32•9 years ago
|
||
We discovered that the media feature pack is not being installed. See bug 1209577
Depends on: 1209577
Reporter | ||
Comment 33•9 years ago
|
||
In the process of re-image for t-w864-ix-025 a warning message appear in the mgmnt console. I attached the warning message
Updated•9 years ago
|
Whiteboard: [windows]
Comment 34•9 years ago
|
||
The screen capture you show of t-w864-ix-025 was due to bug 1210344.
Comment 35•9 years ago
|
||
Do we have a status on the re-imaging process at the moment?
I've been able to re-image several win7&8 slaves this week using a remote command and did not notice issues (maybe bug 1205227 is the exception, but I don't think that the issue is related to the re-image process in that case).
Thanks.
Flags: needinfo?(q)
Comment 36•9 years ago
|
||
As far as we know, the issues with imaging have been solved.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(q)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•