Closed Bug 1402120 Opened 8 years ago Closed 8 years ago

Screen resolution changes applied in gecko-t-win10-64 tasks persist into future tasks

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

it appears that if I change the screen resolution on a win10 VM image, it persists across jobs- thereby causing jobs to randomly have one resolution or another- this is not good as a try push can cause massive test failures on all branches. While I don't have a patch ready for review (bug 1401501), I am concerned about developing this further as well as making progress on win10 migration stuff. :garndt, is there any work we can do to reduce the damage from a hacker like me?
Flags: needinfo?(garndt)
Blocks: 1401501
I would assume that this happens because no other job is using the screen resolution script to change it back to a value it requires? As such the modified resolution is left as is. Maybe jobs which change the default resolution should revert it afterward? I feel that would be a good clean-up.
I think there are two things here: 1. jobs that require a certain resolution should ensure that the environment is that way and not making assumptions about it 2. settings changes like this should not persist between task runs. I thought that the entire user environment was isolated on these machines, but perhaps that's particular user settings and resolution changes are treated differently? ni? pmoore on this
Flags: needinfo?(garndt) → needinfo?(pmoore)
Note, that for #2, I was referring specifically to worker types that create a new user for each task. I'm making (possibly an incorrect) assumption that the user can change resolution and that after the task is complete that user environment is destroyed. That is what I would like clarification on. On a different note, there are worker types that use a global user for all tasks very similar to how buildbot executes tasks. I am not sure how what releng has put into place on buildbot machines that would ensure a consistent environment between task runs, but perhaps something like that is needed here? Knowing what the consistent environment should be, and what a task might have changed is a harder question to answer.
(In reply to Greg Arndt [:garndt] from comment #2) > I think there are two things here: > 1. jobs that require a certain resolution should ensure that the environment > is that way and not making assumptions about it Agreed. > 2. settings changes like this should not persist between task runs. I > thought that the entire user environment was isolated on these machines, but > perhaps that's particular user settings and resolution changes are treated > differently? ni? pmoore on this In the logs of the task that ran, we see that this worker type has been configured to run all tasks as the same user: https://tools.taskcluster.net/groups/LiDMXkG-QmaM9I-fKxBe_g/tasks/NVc_ro25Rc2CjSg-jKh-Jw/runs/0/logs/public%2Flogs%2Flive.log#L14 There are several ways to solve this: 1) OCC currently adapts the run-generic-worker.bat script: https://github.com/mozilla-releng/OpenCloudConfig/blob/3218a73e0f4a90fe009a9fec87a4767544da2cfd/userdata/Manifest/gecko-t-win10-64.json#L533 It could therefore explicitly set screen resolution in this script: https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot.bat 2) Setting of screen resolution could be added as a step to the OCC manifest, since the machine is rebooted between tasks, and the OCC manifest is applied on reboot: https://github.com/mozilla-releng/OpenCloudConfig/blob/3218a73e0f4a90fe009a9fec87a4767544da2cfd/userdata/Manifest/gecko-t-win10-64.json 3) Presumably none of this is needed once we run tasks as separate users, which will be done in bug 1399401, since I assume (hope) screen resolution is a user-specific setting
Flags: needinfo?(pmoore)
Assignee: nobody → relops
Component: Generic-Worker → RelOps
Product: Taskcluster → Infrastructure & Operations
QA Contact: arich
(In reply to Pete Moore [:pmoore][:pete] from comment #4) > > In the logs of the task that ran, we see that this worker type has been > configured to run all tasks as the same user: Good catch, I didn't notice that setting in the logs. I thought the worker types that run as current user had "-cu" appended to their worker type name. I'll make a mental note of this.
Assignee: relops → rthijssen
Summary: I have been working on adjusting the screen resolution for the windows 10 vm image we use and have broken all the things → Screen resolution changes applied in gecko-t-win10-64 tasks persist into future tasks
(In reply to Greg Arndt [:garndt] from comment #5) > (In reply to Pete Moore [:pmoore][:pete] from comment #4) > > > > In the logs of the task that ran, we see that this worker type has been > > configured to run all tasks as the same user: > > Good catch, I didn't notice that setting in the logs. I thought the worker > types that run as current user had "-cu" appended to their worker type name. > I'll make a mental note of this. That's just the taskcluster worker types used for internal taskcluster-CI of our workers, not the worker types managed by Release Operations for gecko.
in bug 1401501, mozharness was patched to set resolution for each test task.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
if we change the resolution on one task in the future on try, it will break tests on all branches- if we are going to support self serve setup, we need to support isolation. I know in a perfect world it would be reset or cleaned up at the end of the task, but often we have exceptions or coding errors which cause early termination or timeouts which cause different exit paths.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
isolation of tasks is supported in the latest generic worker. the bug to track upgrading of generic worker on windows 7 and 10 is bug 1399401. when that work is complete, the problem identified here just goes away. self serve loaners are terminated after the loaner is abandoned so can't be returned to the build/test ci pool. there's not really much we can do with generic worker 8 on the testers currently as it's only used while the effort to upgrade is still in progress, hence the wontfix.
Assignee: rthijssen → relops
got it.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.