Closed Bug 1547965 Opened 5 years ago Closed 5 years ago

Enable websocktunnel for windows hardware

Categories

(Infrastructure & Operations :: RelOps: Windows OS, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: markco)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

https://github.com/taskcluster/generic-worker/releases/tag/v14.1.0 has the description -- it's basically removing some config from generic-worker and adding a little more. You can confirm it works if (a) the resulting workerType still runs tasks and (b) you can see output line-by-line while a task is running.

Blocks: 1547358
Assignee: nobody → mcornmesser

In the linked pull request generic-worker is upgraded to 14.1.0 and the prescribed changes to the config file from the link in the description are in place. https://github.com/markcor/ronin_puppet/blob/BUG1547965/modules/win_generic_worker/templates/hw-generic-worker.config.epp

The result is:

May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSGetActiveConsoleSessionId with args: []#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 1 0 The operation completed successfully.#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSQueryUserToken with args: [1 C0420786D0]#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 0 0 An attempt was made to reference a token that does not exist.#015

Verified the correct generic-worker version is present:
C:>c:\generic-worker\generic-worker.exe --version
2019/05/13 19:43:07 Making system call GetProfilesDirectoryW with args: [0 C0423B7958]
2019/05/13 19:43:07 Result: 0 0 The data area passed to a system call is too small.
2019/05/13 19:43:07 Making system call GetProfilesDirectoryW with args: [C0423B57E0 C0423B7958]
2019/05/13 19:43:07 Result: 1 0 The operation completed successfully.
generic-worker 14.1.0 [ revision: https://github.com/taskcluster/generic-worker/commits/081b7c0200fb915ccc08d0cace49712ebc6e23c3 ]

However, I have not been able to verify that that the worker's credentials have scope auth:websocktunnel-token:<wstAudience>/<workerGroup>.<workerId>., as mentioned in the link from the description. Looking here https://tools.taskcluster.net/auth/scopes/assume%3Aworker-type%3Areleng-hardware%2F*/role:worker-type%3Areleng-hardware%2F it appears it has not.

dustin: Is the proper place to verify the scopes? If I change add the scope "auth:websocktunnel-token:<wstAudience>/<workerGroup>.<workerId>.*" will it break workers that have the older version of generic-worker ? Or am I going down the wrong path here?

Flags: needinfo?(dustin)
Flags: needinfo?(dustin)

I think you're on exactly the right path. And adding that scope won't break anything.

The current scopes are :
auth:sentry:generic-worker
auth:statsum:generic-worker
queue:claim-work:releng-hardware/*
queue:create-artifact:*
queue:pending-tasks:releng-hardware/*
queue:poll-task-urls
queue:resolve-task
queue:worker-id:*
auth:websocktunnel-token:taskcluster-net/*

Should "auth:websocktunnel-token:<wstAudience>/<workerGroup>.<workerId>." be added, or should it replace "auth:websocktunnel-token:taskcluster-net/" ?

Flags: needinfo?(dustin)

Ah, it looks like it's already in place! The audience is taskcluster-net and * matches the workerGroup and workerId.

Flags: needinfo?(dustin)

pmoore: any suggestion on this?

Generic-worker is failing to load the configuration file and has an exit code of 69.
Configuration file template: https://github.com/markcor/ronin_puppet/blob/BUG1547965/modules/win_generic_worker/templates/hw-generic-worker.config.epp . This has worked with previous version and only has the changes prescribed here https://github.com/taskcluster/generic-worker/releases/tag/v14.1.0 .

Resulting log messages:

May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSGetActiveConsoleSessionId with args: []#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 1 0 The operation completed successfully.#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSQueryUserToken with args: [1 C0420786D0]#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 0 0 An attempt was made to reference a token that does not exist.#015

Papertrail: https://papertrailapp.com/systems/3282840801/events

Flags: needinfo?(pmoore)

I have upgrade to generic-worker 14.1.1, and the behavior is the same.

(In reply to Mark Cornmesser [:markco] from comment #6)

pmoore: any suggestion on this?

Generic-worker is failing to load the configuration file and has an exit code of 69.
Configuration file template: https://github.com/markcor/ronin_puppet/blob/BUG1547965/modules/win_generic_worker/templates/hw-generic-worker.config.epp . This has worked with previous version and only has the changes prescribed here https://github.com/taskcluster/generic-worker/releases/tag/v14.1.0 .

Resulting log messages:

May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSGetActiveConsoleSessionId with args: []#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 1 0 The operation completed successfully.#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Making system call WTSQueryUserToken with args: [1 C0420786D0]#015
May 13 12:30:40 T-W1064-MS-016.mdc1.mozilla.com generic-worker: Result: 0 0 An attempt was made to reference a token that does not exist.#015

Papertrail: https://papertrailapp.com/systems/3282840801/events

This suggests either the Windows Service isn't running as LocalSystem, or there is a problem with the autologin. My guess is the latter.

Can you connect to the default display of the machine and see what is going on? Perhaps it is asking you to register your Windows version, or doing something else which is preventing the autologin to occur? Which version of Windows are you running? I would be interested in seeing a screenshot - this would probably give a clue why the task user login might not be occurring.

Flags: needinfo?(pmoore) → needinfo?(mcornmesser)

(In reply to Mark Cornmesser [:markco] from comment #4)

The current scopes are :
auth:sentry:generic-worker
auth:statsum:generic-worker
queue:claim-work:releng-hardware/*
queue:create-artifact:*
queue:pending-tasks:releng-hardware/*
queue:poll-task-urls
queue:resolve-task
queue:worker-id:*
auth:websocktunnel-token:taskcluster-net/*

Should "auth:websocktunnel-token:<wstAudience>/<workerGroup>.<workerId>." be added, or should it replace "auth:websocktunnel-token:taskcluster-net/" ?

Regarding queue:claim-work:releng-hardware/* see bug 1550154 comment 5.

This suggests either the Windows Service isn't running as LocalSystem, or there is a problem with the autologin. My guess is the latter.

Can you connect to the default display of the machine and see what is going on? Perhaps it is asking you to register your Windows version, or doing something else which is preventing the autologin to occur? Which version of Windows are you running? I would be interested in seeing a screenshot - this would probably give a clue why the task user login might not be occurring.

The issues was with the autologin. During deployment LTICleanup.wsf is ran at the end at the end of the deployment. In that script there was the following:

	'//----------------------------------------------------------------------------
	'//  Clear the autologon registry keys
	'//----------------------------------------------------------------------------

	oLogging.CreateEntry "Removing AutoAdminLogon registry entries", LogTypeInfo

	On Error Resume Next
	oShell.RegWrite "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\AutoAdminLogon", "0", "REG_SZ"
	oShell.RegWrite "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\DefaultUserName", "", "REG_SZ"
	oShell.RegWrite "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\DefaultDomainName", "", "REG_SZ"
	oShell.RegDelete "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\DefaultPassword"
	oShell.RegWrite "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\AutoLogonCount", &H00000000, "REG_DWORD"
	On Error Goto 0

For some reason when the deployment or the Puppet bootstrap script would clear these registry values generic-worker 14.x.x was unable to set the needed values. I did not see this behavior on the previous version. The remedy was to remove this code from the script.

Flags: needinfo?(mcornmesser)
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: