Closed
Bug 1330718
Opened 7 years ago
Closed 7 years ago
"CreateFrozen/CreateProcess: Access is denied" on win10 with "runTasksAsCurrentUser": true
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: pmoore, Assigned: pmoore)
References
Details
Attachments
(1 file)
Windows 7 and Windows 2012 don't seem to be affected, AFAIK. When running with "runTasksAsCurrentUser": false (which is also default value, if not specified) there is also not a problem. So this is something particular to the combination of running on Windows 10 with the option to run as current user. Error is consistent. Jan 11 19:32:02 i-0a9ef4536d54137fd.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/11 18:31:59 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 11 19:32:14 i-0b7a090dca88ac2b3.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/11 18:32:12 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:08:35 i-093e56b624e692417.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:08:19 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:14:07 i-0c70fab89a7b41a3f.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:14:06 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:16:33 i-05f5c69d56a2b2f6c.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:15:39 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:17:05 i-0770f5dff779e64c5.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:17:03 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:17:07 i-073721cc70ff9882b.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:17:03 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:24:35 i-00389a9cd10a19982.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:24:33 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:29:50 i-005c9211900f2bcf5.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:29:35 Cause: CreateFrozen/CreateProcess: Access is denied. Jan 12 11:30:41 i-02a4808771620cda0.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:30:40 Cause: CreateFrozen/CreateProcess: Access is denied. This currently affects only gecko-t-win10-64-beta worker type, as this is the only place where these settings are applied. I've created a win10 VM locally, and am currently installing all toolchains required etc in order to debug this locally.
Assignee | ||
Comment 1•7 years ago
|
||
Note, technically, this is probably a bug in the https://github.com/taskcluster/runlib library, rather than https://github.com/taskcluster/generic-worker directly.
Assignee | ||
Comment 2•7 years ago
|
||
Specifically, this is the syscall that has the problem: https://github.com/taskcluster/runlib/blob/29b01e7a4a5ef4d31b739dfb13f43cf5df3e9a74/subprocess/subprocess_windows.go#L203-L214
Assignee | ||
Comment 3•7 years ago
|
||
I've not been able to reproduce locally so far. After installing win10 x64 and setting up an environment, I was able to run a task successfully as current user: https://tools.taskcluster.net/task-inspector/#ECmqh-RTQWe78bcdso-6KA/0
Assignee | ||
Comment 4•7 years ago
|
||
I'll do some remote testing on a live instance. I'm wondering if for some reason task users have access to Z: drive, but for some reason, the GenericWorker account has restricted access to Z:? Could be way off though ... could be something entirely different, but debugging on a live instance should give me more insight. At least we see generic worker is theoretically capable of running in this mode on win10, so is perhaps an environment issue.
Assignee | ||
Comment 5•7 years ago
|
||
Strange, I'm also able to run a task on gecko-t-win10-64-beta as the current user! I manually edited the config on the worker I hijacked, and manually ran generic worker, and was able to run a task successfully: https://tools.taskcluster.net/task-inspector/#S6Dhtsp2T3Wx6P6TtB2WhA/0 The command run ("whoami") returns "i-023850e5d715c\genericworker" to confirm the new setting was successfully applied. Next, I'll investigate if this relates to running as a scheduled item (since I ran the worker manually).
Assignee | ||
Comment 6•7 years ago
|
||
I'm now manually running generic worker under an administrator command shell with the same test that was consistently blue on treeherder[1]: https://tools.taskcluster.net/task-inspector/#X_WdJv5kRjWiR4RWHNLo1g/0 So far seems to be running ok, before it wouldn't even start ... So seems to point to something specific about being run as a scheduled task. I wonder if this is job group related, such that when running as a scheduled task, a job group is implicitly assigned, and this interferes somehow, even though in win10 nested job groups are supported[2]. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf38e6ba4f7af8c4925b51e9f23c482434b90401&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded&selectedJob=68092152 [2] https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388(v=vs.85).aspx
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #6) > I'm now manually running generic worker under an administrator command shell > with the same test that was consistently blue on treeherder[1]: > > https://tools.taskcluster.net/task-inspector/#X_WdJv5kRjWiR4RWHNLo1g/0 > > So far seems to be running ok, before it wouldn't even start ... So seems to > point to something specific about being run as a scheduled task. In the end it failed due to two test failures: TEST-FAIL | browser/components/migration/tests/unit/test_Edge_availability.js | took 30520ms TEST-FAIL | browser/components/sessionstore/test/unit/test_startup_invalid_session.js | took 1596ms But that is a different issue - the tests wouldn't even run before. Will now try again as a scheduled task, to test this hypothesis.
Assignee | ||
Comment 8•7 years ago
|
||
Good news! Running locally, I've been able to reproduce the issue, running from a scheduled task: https://tools.taskcluster.net/task-inspector/#eu3W3DFjQbCimn4KqhO6Gg/0
Assignee | ||
Comment 9•7 years ago
|
||
So .... I think I've got it! The problem is we explicitly set CREATE_BREAKAWAY_FROM_JOB in the process creation flags, which is needed on Windows 7, since Windows 7 doesn't support nested jobs, and there are firefox tests which need to create a job object and associate it to subprocesses. However, on Windows 10, we can have nested job groups, and therefore it is not necessary to break away from an inherited job. One of the requirements of setting CREATE_BREAKAWAY_FROM_JOB is that the job itself has JOB_OBJECT_LIMIT_BREAKAWAY_OK set. On Windows 7, since nested jobs are not supported, the Task Scheduler appears to set this flag on the job group it creates for the scheduled task, and so we do not hit a problem when running CreateProcess. However, on Windows 10, presumably because nested jobs are supported, the Task Scheduler seems not to set JOB_OBJECT_LIMIT_BREAKAWAY_OK on the job group it creates for the scheduled task. Therefore, the resolution is to make sure that we only set this flag if nested jobs are not supported on the system, which we already established in order to decide whether to run CreateProcessWithLogon or CreateProcessAsUser. Process creation flags are documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684863(v=vs.85).aspx Nested jobs are documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388(v=vs.85).aspx The explicit setting of CREATE_BREAKAWAY_FROM_JOB occurs here: https://github.com/taskcluster/runlib/blob/cfcf5ce095d45717364e03d218f99b3a8837b6c7/subprocess/subprocess_windows.go#L212 The function that determines whether we should set the flag or not is here: https://github.com/taskcluster/runlib/blob/cfcf5ce095d45717364e03d218f99b3a8837b6c7/win32/win32_windows.go#L109-L119
Assignee | ||
Comment 10•7 years ago
|
||
Hey Ted, Could you cast your eye over the reasoning in this bug and see if you think the patch makes sense? Thanks!
Assignee | ||
Comment 11•7 years ago
|
||
Note, I should probably have done several things before submitting the view, but was a little lazy, and wanted to get it out before the end of the working week! 1) Using Process Explorer I could test the theory that JOB_OBJECT_LIMIT_BREAKAWAY_OK is set on Windows 7 on the job created by the task scheduler for a given scheduled task. 2) Using Process Explorer I could test the theory that JOB_OBJECT_LIMIT_BREAKAWAY_OK is *not* set on Windows 10 on the job created by the task scheduler for a given scheduled task. 3) Rebuilt generic worker and explicitly tested it on both Windows 7 and Windows 10 running in both modes of creating tasks as a new user and as the current user, when running from a scheduled task (i.e. all four combinations). After this all lands, I'd like to set up a better CI to handle all these combinations automatically in future.
Assignee | ||
Comment 12•7 years ago
|
||
s/submitting the view/submitting the review/ .... let's just pretend my keyboard can't keep up with my typing, rather than me making spelling mistakes ....
Updated•7 years ago
|
Attachment #8826716 -
Flags: review?(ted) → review+
Comment 13•7 years ago
|
||
The GH PR attached to this bug was merged on Jan 24th. Does that mean it has now been deployed in recent releases of generic-worker?
Flags: needinfo?(pmoore)
Assignee | ||
Comment 14•7 years ago
|
||
Indeed! Thanks for noticing Greg. :)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(pmoore)
Resolution: --- → FIXED
Updated•5 years ago
|
Component: Generic-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•