New WinXP slaves frequently turning purple after "Logon failure: unknown user name or bad password."

RESOLVED DUPLICATE of bug 788382

Status

Infrastructure & Operations
Buildduty
RESOLVED DUPLICATE of bug 788382
6 years ago
16 days ago

People

(Reporter: philor, Assigned: armenzg)

Tracking

Details

(Reporter)

Description

6 years ago
Last time we saw this, it was failure to manage to get OPSI going, https://wiki.mozilla.org/ReleaseEngineering/OPSI#Check_the_hostkey, but what it is this time is apparently so far unclear.

https://tbpl.mozilla.org/php/getParsedLog.php?id=14819704&tree=Firefox
talos-r3-xp-082
(In reply to Phil Ringnalda (:philor) from comment #0)
> Last time we saw this, it was failure to manage to get OPSI going,
> https://wiki.mozilla.org/ReleaseEngineering/OPSI#Check_the_hostkey, but what
> it is this time is apparently so far unclear.
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=14819704&tree=Firefox
> talos-r3-xp-082

Not related to OPSI.
It seems that this issue has been seen before in bug 713326 and bug 718534.

I think it is something of the harness:
http://social.msdn.microsoft.com/Forums/en/sqlreportingservices/thread/5a45b73e-b116-4fe3-a943-836b74faca27

More info as I get in the office.

45571 INFO SimpleTest FINISHED
INFO | automation.py | Application ran for: 0:08:42.390000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmp6zdmp5pidlog
==> process 3484 launched child process 3932
==> process 3484 launched child process 676
==> process 3484 launched child process 3272
==> process 3484 launched child process 1728
==> process 3484 launched child process 392
==> process 3484 launched child process 3296
==> process 3484 launched child process 3440
==> process 3484 launched child process 916
INFO | automation.py | Checking for orphan process with PID: 3932
INFO | automation.py | Checking for orphan process with PID: 676
INFO | automation.py | Checking for orphan process with PID: 3272
INFO | automation.py | Checking for orphan process with PID: 1728
INFO | automation.py | Checking for orphan process with PID: 392
INFO | automation.py | Checking for orphan process with PID: 3296
INFO | automation.py | Checking for orphan process with PID: 3440
INFO | automation.py | Checking for orphan process with PID: 916
ERROR: Logon failure: unknown user name or bad password.

ERROR: Logon failure: unknown user name or bad password.

ERROR: Logon failure: unknown user name or bad password.

WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

INFO | runtests.py | Running tests: end.

command timed out: 1200 seconds without output, attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1

remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
]
For now we will disable those slaves and remove confusion.

To resolve this bug I would do this:
* take one of the offending slaves and try to run mochitest-other
* if no avail, start asking beyond the #ateam channel to figure out if someone else would no

This might be a test doing something funky, an OPSI package that did not get installed (since all of these slaves are new) or a Windows setting that got missed.
FTR the slaves were re-imaged on bug 786036.
Assignee: nobody → armenzg
Depends on: 786036
Blocks: 789580
This is most likely the same root issue from bug 788382.
The slaves were not setup using the talos-r3-xp-ref template but were created from a blank slate and marking packages manually.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 788382
Product: mozilla.org → Release Engineering
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.