Closed Bug 1553280 Opened 6 years ago Closed 5 years ago

windows/aarch64 - workers getting stuck deleting user directory

Categories

(Taskcluster :: Workers, defect)

ARM64
Windows
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: egao, Unassigned)

References

Details

Attachments

(1 file)

Attached file generic-worker.log

Originally reported by Atte @ Bitbar in slack channel.

Atte has reported that generic-worker on some windows10-aarch64 hardwares hosted at Bitbar are beginning to report failures which is causing the hardware to be stuck.

An excerpt of the log is as follows:

2019/05/17 14:22:40 Running command: 'net' 'user' 'task_1558102959' 'pWd0_8Azk0bByL0EUFJNhmfXr1hUY' '/add' '/expires:never' '/passwordchg:no' '/y'
2019/05/17 14:22:40 Created new OS user!
2019/05/17 14:22:40 Running command: 'wmic' 'useraccount' 'where' 'name='task_1558102959'' 'set' 'passwordexpires=false'
Updating property(s) of '\\YOGA-018\ROOT\CIMV2:Win32_UserAccount.Domain="YOGA-018",Name="task_1558102959"'

Property(s) update successful.

2019/05/17 14:22:40 Running command: 'net' 'localgroup' 'Remote Desktop Users' '/add' 'task_1558102959'
The command completed successfully.

2019/05/17 14:22:40 Making system call LogonUserW with args: [18F0E120 18F5C010 19098040 2 0 18F5C014]
2019/05/17 14:22:40   Result: 1 0 The operation completed successfully.
2019/05/17 14:22:40 Making system call LoadUserProfileW with args: [900 18F7A060]
2019/05/17 14:22:40   Result: 1 0 An attempt was made to reference a token that does not exist.
2019/05/17 14:22:40 Creating folder C:\tasks\task_1558102959\AppData\Roaming
2019/05/17 14:22:40 Making system call SHSetKnownFolderPath with args: [BDC9E0 0 900 1904A060]
2019/05/17 14:22:40   Result: 0 19F860 The operation completed successfully.
2019/05/17 14:22:40 Making system call SHGetKnownFolderPath with args: [BDC9E0 8000 900 18F5C0B4]
2019/05/17 14:22:40   Result: 0 19FCD0 The operation completed successfully.
2019/05/17 14:22:40 Making system call CoTaskMemFree with args: [7E0F4B8]
2019/05/17 14:22:40   Result: 4 0 The operation completed successfully.
2019/05/17 14:22:40 Creating folder C:\tasks\task_1558102959\AppData\Local
2019/05/17 14:22:40 Making system call SHSetKnownFolderPath with args: [BDC9D0 0 900 18E780F0]
2019/05/17 14:22:40   Result: 80070057 0 The operation completed successfully.
2019/05/17 14:22:40 The parameter is incorrect.
2019/05/17 14:22:40 WARNING: Not able to redirect Local App Data folder to C:\tasks\task_1558102959\AppData\Local - IGNORING!
2019/05/17 14:22:40 Making system call UnloadUserProfile with args: [900 840]
2019/05/17 14:22:40   Result: 1 0 An attempt was made to reference a token that does not exist.
2019/05/17 14:22:40 Making system call CloseHandle with args: [900]
2019/05/17 14:22:40   Result: 1 776DCB00 The operation completed successfully.
2019/05/17 14:22:40 Created dir: C:\tasks\task_1558102932\generic-worker
2019/05/17 14:22:40 Making system call GetProfilesDirectoryW with args: [0 18F5C140]
2019/05/17 14:22:40   Result: 0 7FFFFFF7 The data area passed to a system call is too small.
2019/05/17 14:22:40 Making system call GetProfilesDirectoryW with args: [18F0E7A0 18F5C140]
2019/05/17 14:22:40   Result: 1 7FFFFFF7 The operation completed successfully.
2019/05/17 14:22:40 Trying to remove directory 'C:\Users\task_1558101181' via os.RemoveAll(path) call...
2019/05/17 14:22:40 WARNING: could not delete directory 'C:\Users\task_1558101181' with os.RemoveAll(path) method
2019/05/17 14:22:40 remove C:\Users\task_1558101181\AppData\Local\Microsoft\Windows\WinX\Group1: Access is denied.

Full log is attached to this bug.

Each of these windows10-aarch64 hardware are configured using OCC, running taskcluster generic-worker 14.1.1.

OS: Unspecified → Windows
Hardware: Unspecified → ARM64

This seems to be a rolling issue that affects hardware for some time, then resolve itself.

I am working to understand the failure rate; given that we have very limited hardware available for windows10-aarch64, even having 2-3 hardware unavailable to run tests would mean significantly longer wait times for queued tasks.

Pete, did your recent changes fix this?

Flags: needinfo?(pmoore)

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #2)

Pete, did your recent changes fix this?

The recent change was for cleanup of User Profiles. This issue should get fixed with bug 1433854.

Depends on: 1433854
Flags: needinfo?(pmoore)

Hi Edwin,

Did the fix in generic-worker 25.4.0 fix this issue for you (bug 1433854)?

Thanks,
Pete

Flags: needinfo?(egao)

Similar to Bug 1565005, it's been a long time since I was involved in windows10-aarch64. :markco has been interacting with folks at Bitbar, and I recall he was attempting to bring back some of the offline machines, but it isn't clear if the issues were caused by this bug.

Let's close this bug for now and file a new one if it is warranted.

Flags: needinfo?(egao) → needinfo?(pmoore)

OK, thanks Edwin!

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(pmoore)
Resolution: --- → WORKSFORME

Mark, please advise if this needs to be reopened or if issues are still present in generic-worker 25.4.0 or later. Thanks!

Flags: needinfo?(mcornmesser)

Rgr. Side note, I am hoping to upgrade generic-worker on these in about 3 weeks.

Flags: needinfo?(mcornmesser)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: