Closed Bug 1451682 Opened 6 years ago Closed 6 years ago

upgrade to new windows generic worker is causing talos xperf job to fail

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: pmoore)

References

Details

talos runs a xperf job which measures the file IO during a browser session.  Since this is not time based we can do this on a VM and have been doing it on the windows VM for a while.

We noticed that all our values were returning zero as of yesterday and in looking more into this it is related to the version of the generic worker.

On a passing run, I see this in the log [1]:
11:02:46     INFO -  TEST-INFO | 4496: exit 0
11:02:48     INFO -  The trace you have just captured "Z:\task_1522839380\build\test.etl.kernel" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
11:02:48     INFO -  The trace you have just captured "Z:\task_1522839380\build\test.etl.user" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
11:02:56     INFO -  Merged Etl: test.etl
11:04:48     INFO -  c:/Program Files/Microsoft Windows Performance Toolkit/xperf: warning: applying restriction of access for trace processing
11:05:42     INFO -  reading etl filename: test.etl
11:05:42     INFO -  etlparser: in readfile: test.etl.csv
11:05:42     INFO -  extending with xperf!
11:05:42     INFO -  COUNTER 'nonmain_normal_netio': [78522413.0]


and when running with the new generic worker, I see this [2]:
19:40:45     INFO -  TEST-INFO | 4004: exit 0
19:40:45     INFO -  c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: NT Kernel Logger: The instance name passed was not recognized as valid by a WMI data provider. (0x1069).
19:40:45     INFO -  c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: talos_ses: The instance name passed was not recognized as valid by a WMI data provider. (0x1069).
19:40:45     INFO -  c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: test.etl: The specified path is invalid. (0xa1).
19:40:45     INFO -  c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: test.etl: The specified path is invalid. (0x800700a1).
19:40:45     INFO -  reading etl filename: test.etl
19:40:45     INFO -  etlparser: in readfile: test.etl.csv
19:40:45     INFO -  extending with xperf!
19:40:45     INFO -  COUNTER 'nonmain_normal_netio': []


In order to access and run xperf we need administrator privileges since this is pulling counters from the core of the operating system.

:pmoore, is there a way that we can fix generic-worker to allow access to xperf?

[1] https://taskcluster-artifacts.net/BV-MNOHVRGe4nKmTY8tUfw/0/public/logs/live_backing.log
[2] https://taskcluster-artifacts.net/V_0G_mLnRnK_ODaQ0irV2g/0/public/logs/live_backing.log
Flags: needinfo?(pmoore)
Hi Joel,

It is possible to make the task user a member of the Administrators group (see "osGroups" in https://docs.taskcluster.net/reference/workers/generic-worker/docs/payload), but it isn't possible (yet) to run an elevated process - see bug 1439588.

I suspect adding Administrators to "osGroups" won't be enough, but is worth a try.

If that doesn't work, we'll need to implement bug 1439588. If that takes some time, we could create a custom worker type for these WPA tasks that uses the old 8.2.0 generic-worker release, until running elevated processes is supported.

Let me know if you have any other ideas.
Flags: needinfo?(pmoore)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #0)

> In order to access and run xperf we need administrator privileges since this
> is pulling counters from the core of the operating system.

Indeed looking at https://msdn.microsoft.com/en-us/library/ff190971.aspx?f=255&MSPPError=-2147217396 (step 2) it looks like an elevated process is required...
Depends on: 1439588
Pete: I think this is done now, yes?
Assignee: nobody → pmoore
Flags: needinfo?(pmoore)
Indeed, this is fixed now, thanks for following up!
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(pmoore)
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.