Closed Bug 1330718 Opened 7 years ago Closed 7 years ago

"CreateFrozen/CreateProcess: Access is denied" on win10 with "runTasksAsCurrentUser": true

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: pmoore)

References

Details

Attachments

(1 file)

Windows 7 and Windows 2012 don't seem to be affected, AFAIK.

When running with "runTasksAsCurrentUser": false (which is also default value, if not specified) there is also not a problem.

So this is something particular to the combination of running on Windows 10 with the option to run as current user.

Error is consistent.




Jan 11 19:32:02 i-0a9ef4536d54137fd.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/11 18:31:59 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 11 19:32:14 i-0b7a090dca88ac2b3.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/11 18:32:12 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:08:35 i-093e56b624e692417.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:08:19 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:14:07 i-0c70fab89a7b41a3f.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:14:06 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:16:33 i-05f5c69d56a2b2f6c.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:15:39 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:17:05 i-0770f5dff779e64c5.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:17:03 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:17:07 i-073721cc70ff9882b.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:17:03 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:24:35 i-00389a9cd10a19982.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:24:33 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:29:50 i-005c9211900f2bcf5.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:29:35 Cause: CreateFrozen/CreateProcess: Access is denied. 
Jan 12 11:30:41 i-02a4808771620cda0.gecko-t-win10-64-beta.use1.mozilla.com generic-worker: 2017/01/12 10:30:40 Cause: CreateFrozen/CreateProcess: Access is denied. 

This currently affects only gecko-t-win10-64-beta worker type, as this is the only place where these settings are applied.

I've created a win10 VM locally, and am currently installing all toolchains required etc in order to debug this locally.
Note, technically, this is probably a bug in the https://github.com/taskcluster/runlib library, rather than https://github.com/taskcluster/generic-worker directly.
I've not been able to reproduce locally so far. After installing win10 x64 and setting up an environment, I was able to run a task successfully as current user:

https://tools.taskcluster.net/task-inspector/#ECmqh-RTQWe78bcdso-6KA/0
I'll do some remote testing on a live instance. I'm wondering if for some reason task users have access to Z: drive, but for some reason, the GenericWorker account has restricted access to Z:? Could be way off though ... could be something entirely different, but debugging on a live instance should give me more insight. At least we see generic worker is theoretically capable of running in this mode on win10, so is perhaps an environment issue.
Strange, I'm also able to run a task on gecko-t-win10-64-beta as the current user! I manually edited the config on the worker I hijacked, and manually ran generic worker, and was able to run a task successfully:

https://tools.taskcluster.net/task-inspector/#S6Dhtsp2T3Wx6P6TtB2WhA/0

The command run ("whoami") returns "i-023850e5d715c\genericworker" to confirm the new setting was successfully applied.

Next, I'll investigate if this relates to running as a scheduled item (since I ran the worker manually).
I'm now manually running generic worker under an administrator command shell with the same test that was consistently blue on treeherder[1]:

https://tools.taskcluster.net/task-inspector/#X_WdJv5kRjWiR4RWHNLo1g/0

So far seems to be running ok, before it wouldn't even start ... So seems to point to something specific about being run as a scheduled task.

I wonder if this is job group related, such that when running as a scheduled task, a job group is implicitly assigned, and this interferes somehow, even though in win10 nested job groups are supported[2].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf38e6ba4f7af8c4925b51e9f23c482434b90401&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded&selectedJob=68092152
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388(v=vs.85).aspx
(In reply to Pete Moore [:pmoore][:pete] from comment #6)
> I'm now manually running generic worker under an administrator command shell
> with the same test that was consistently blue on treeherder[1]:
> 
> https://tools.taskcluster.net/task-inspector/#X_WdJv5kRjWiR4RWHNLo1g/0
> 
> So far seems to be running ok, before it wouldn't even start ... So seems to
> point to something specific about being run as a scheduled task.

In the end it failed due to two test failures:

TEST-FAIL | browser/components/migration/tests/unit/test_Edge_availability.js | took 30520ms
TEST-FAIL | browser/components/sessionstore/test/unit/test_startup_invalid_session.js | took 1596ms

But that is a different issue - the tests wouldn't even run before.

Will now try again as a scheduled task, to test this hypothesis.
Good news!

Running locally, I've been able to reproduce the issue, running from a scheduled task:

https://tools.taskcluster.net/task-inspector/#eu3W3DFjQbCimn4KqhO6Gg/0
So .... I think I've got it!

The problem is we explicitly set CREATE_BREAKAWAY_FROM_JOB in the process creation flags, which is needed on Windows 7, since Windows 7 doesn't support nested jobs, and there are firefox tests which need to create a job object and associate it to subprocesses.

However, on Windows 10, we can have nested job groups, and therefore it is not necessary to break away from an inherited job.

One of the requirements of setting CREATE_BREAKAWAY_FROM_JOB is that the job itself has JOB_OBJECT_LIMIT_BREAKAWAY_OK set.

On Windows 7, since nested jobs are not supported, the Task Scheduler appears to set this flag on the job group it creates for the scheduled task, and so we do not hit a problem when running CreateProcess.

However, on Windows 10, presumably because nested jobs are supported, the Task Scheduler seems not to set JOB_OBJECT_LIMIT_BREAKAWAY_OK on the job group it creates for the scheduled task.

Therefore, the resolution is to make sure that we only set this flag if nested jobs are not supported on the system, which we already established in order to decide whether to run CreateProcessWithLogon or CreateProcessAsUser.

Process creation flags are documented here:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684863(v=vs.85).aspx

Nested jobs are documented here:
https://msdn.microsoft.com/en-us/library/windows/desktop/hh448388(v=vs.85).aspx

The explicit setting of CREATE_BREAKAWAY_FROM_JOB occurs here:
https://github.com/taskcluster/runlib/blob/cfcf5ce095d45717364e03d218f99b3a8837b6c7/subprocess/subprocess_windows.go#L212

The function that determines whether we should set the flag or not is here:
https://github.com/taskcluster/runlib/blob/cfcf5ce095d45717364e03d218f99b3a8837b6c7/win32/win32_windows.go#L109-L119
Hey Ted,

Could you cast your eye over the reasoning in this bug and see if you think the patch makes sense?

Thanks!
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8826716 - Flags: review?(ted)
Note, I should probably have done several things before submitting the view, but was a little lazy, and wanted to get it out before the end of the working week!

1) Using Process Explorer I could test the theory that JOB_OBJECT_LIMIT_BREAKAWAY_OK is set on Windows 7 on the job created by the task scheduler for a given scheduled task.
2) Using Process Explorer I could test the theory that JOB_OBJECT_LIMIT_BREAKAWAY_OK is *not* set on Windows 10 on the job created by the task scheduler for a given scheduled task.
3) Rebuilt generic worker and explicitly tested it on both Windows 7 and Windows 10 running in both modes of creating tasks as a new user and as the current user, when running from a scheduled task (i.e. all four combinations).

After this all lands, I'd like to set up a better CI to handle all these combinations automatically in future.
s/submitting the view/submitting the review/

.... let's just pretend my keyboard can't keep up with my typing, rather than me making spelling mistakes ....
Attachment #8826716 - Flags: review?(ted) → review+
The GH PR attached to this bug was merged on Jan 24th.  Does that mean it has now been deployed in recent releases of generic-worker?
Flags: needinfo?(pmoore)
Indeed! Thanks for noticing Greg. :)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(pmoore)
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: