generic-worker: macOS multiuser engine - autologin sometimes not successful
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
People
(Reporter: pmoore, Assigned: pmoore)
References
Details
Attachments
(1 file)
The new multiuser engine for macOS in generic-worker creates per-task OS users for executing tasks. Between tasks, the worker reboots the machine, and logs into the new task user's desktop. This seems to work for the first auto-login for the first task, but sometimes (perhaps all subsequent logins?) the login window has the username field correctly prefilled with the task user account name, but the password field is empty.
This bug prevents us from using the multiuser engine on macOS for tasks that require a graphical context (such as PGO builds).
The key steps to troubleshooting this problem is to study the com.apple.loginwindow.plist config and the /etc/kcpassword
file in cases of failure.
/usr/bin/sudo /usr/bin/plutil -convert json /Library/Preferences/com.apple.loginwindow.plist -o - | jq -S .
See kc.go for interpreting the /etc/kcpassword
file.
Assignee | ||
Comment 1•6 years ago
|
||
I found the problem, and can reproduce.
Example
Below I have logged the login window property list from /Library/Preferences/com.apple.loginwindow.plist
(see comment 0).
I'm logged in, generic-worker is running:
2019/06/21 17:42:58.841190 Login Window properties: map[string]interface {}{"GuestEnabled":false, "OptimizerLastRunForBuild":3.808064e+07, "OptimizerLastRunForSystem":1.68690944e+08, "RetriesUntilHint":3, "SHOWFULLNAME":false, "UseVoiceOverAtLoginwindow":false, "UseVoiceOverLegacyMigrated":true, "autoLoginUser":"task_1561164123", "lastUser":"loggedIn", "lastUserName":"pmoore", "showInputMenu":true}
I submit a task, the task completes on my machine, the machine reboots, and the autologin works correctly (login window property list) is updated during login process (lastUserName
is updated):
2019/06/21 17:43:07.421697 Login Window properties: map[string]interface {}{"GuestEnabled":false, "OptimizerLastRunForBuild":3.808064e+07, "OptimizerLastRunForSystem":1.68690944e+08, "RetriesUntilHint":3, "SHOWFULLNAME":false, "UseVoiceOverAtLoginwindow":false, "UseVoiceOverLegacyMigrated":true, "autoLoginUser":"task_1561164123", "lastUser":"loggedIn", "lastUserName":"task_1561164123", "showInputMenu":true}
generic-worker
has waited for that update to occur, so then creates a new user (task_1561164183
), and updates the autologin for the next reboot ("autoLoginUser":"task_1561164183"
):
2019/06/21 17:43:14.403006 Login Window properties: map[string]interface {}{"GuestEnabled":false, "OptimizerLastRunForBuild":3.808064e+07, "OptimizerLastRunForSystem":1.68690944e+08, "RetriesUntilHint":3, "SHOWFULLNAME":false, "UseVoiceOverAtLoginwindow":false, "UseVoiceOverLegacyMigrated":true, "autoLoginUser":"task_1561164183", "lastUser":"loggedIn", "lastUserName":"task_1561164123", "showInputMenu":true}
Three seconds later(!), the current login process must have been still running, and resets the change generic-worker
just made to "autoLoginUser"
:
2019/06/21 17:43:17.320269 Login Window properties: map[string]interface {}{"GuestEnabled":false, "OptimizerLastRunForBuild":3.808064e+07, "OptimizerLastRunForSystem":1.68690944e+08, "RetriesUntilHint":3, "SHOWFULLNAME":false, "UseVoiceOverAtLoginwindow":false, "UseVoiceOverLegacyMigrated":true, "autoLoginUser":"task_1561164123", "lastUser":"loggedIn", "lastUserName":"task_1561164123", "showInputMenu":true}
Assignee | ||
Comment 2•6 years ago
•
|
||
So the challenge is knowing when the operating system has finished making changes to /Library/Preferences/com.apple.loginwindow.plist
during the login process. Note, it seems to update the file at least twice, writing identical content to it (content from log line 2019/06/21 17:43:07.421697
is identical to content from log line 2019/06/21 17:43:17.320269
in comment 1).
It seems to cache the properties in memory, and therefore changes to this file while the login process is ongoing are lost when the operating system updates this file with cached values it has.
It is unfortunate that this file stores both state information (e.g. lastUser
) in addition to config settings (e.g. autoLoginUser
), since in the process of updating state information, the operating system is steamrolling config changes that are stored in the same file.
I'll see if I can find an alternative way to detect that the login process has completed, and that the login window property list file is no longer going to be overwritten by the operating system. Note, it isn't sufficient for the worker just to wait until the file has the properties it expects, since the file is updated multiple times by the operating system to the same state. The worker is in fact currently waiting for this file to have the expected state before writing to it, but as explained above, the state is then reset by the operating system.
It may be because on my local machine I haven't (yet) disabled the "Siri enable/disable prompt" and the "Light/dark theme prompt" for first user login, and maybe the operating system updates /Library/Preferences/com.apple.loginwindow.plist
before it prompts the user for these settings, and again afterwards - so it might be by disabling these prompts, the problem magically disappears. However, in case future prompts come along, I'd rather find a more robust solution to ensuring the desktop login process is complete, than just disabling these and hoping this bug doesn't occur again in future.
Assignee | ||
Comment 3•6 years ago
|
||
I've tried to find a better way to know that updates to /Library/Preferences/com.apple.loginwindow.plist
have completed, but have not yet found a reliable + safe + simple way to do this.
There is a thorough overview of Daemons and Agents in Technical Note TN2083.
Chosen solution
I'm going to add a ten second sleep after the initial update to /Library/Preferences/com.apple.loginwindow.plist
after initial log in of the task user, to give the user account some time to settle before a task runs, and to allow for further updates to this file within the first ten seconds of the login, by the operating system. In my experience, this file is usually updated in the first 1-2 seconds of the login, so ten seconds should hopefully be enough to cover any updates the operating system makes to it, assuming a reasonably healthy host environment.
Alternative solutions considered, but not implemented
-
I've looked at
/private/var/log/system.log
for log entries that might indicate the login is complete, so that the worker could wait for these messages before continuing, but didn't find anything concrete. -
I monitored the process table during login, to see if there were any obvious changes that could be auto-detected to know that the login was complete. I didn't find anything obvious.
-
I considered listening for changes to
/Library/Preferences/com.apple.loginwindow.plist
, and resetting its state to the desired state, if it changes. This seems like a good potential solution, with the downside that there is then a race condition between the state being reset, and a failing task that causes the system to reboot before that update gets applied. It is therefore better to only start a task once we know there won't be any further updates to the login window plist file by the operating system.
There is some useful information about methods to monitor a file for changes in this stackoverflow question. Notably, launchd supports the tag WatchPaths
which may be used to trigger a command when a file changes. This could for example be used to reset the state of /Library/Preferences/com.apple.loginwindow.plist
to its required state, if the operating system unwantedly updates it.
- I considered writing a login item for the new task user that signals back to the worker that the login process is complete, but it isn't clear that this would always occur after the operating system updates
/Library/Preferences/com.apple.loginwindow.plist
and therefore didn't seem worth the investment of time.
So the chosen solution is the simplest and ugliest, but has the advantages that:
- it is simple
- it is likely to continue working across major releases of macOS, as it doesn't rely on specific behaviour of the login subsystem
- it may provide a slightly more stable task environment, since the login process has a few seconds to "settle" before a task runs
The obvious disadvantage is that we potentially burn up to 10 seconds of processing time (wastefully) per task. Please note though, that the time is burned after task creation, before a task is claimed, so for a non-empty worker pool, there should not be any additional delay for a task to be claimed. The additional delay is only when there are no free workers available.
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 5•6 years ago
|
||
Released in generic-worker 15.1.0.
Description
•