Closed Bug 1831158 Opened 2 years ago Closed 7 months ago

Test Windows worker pool with generic-worker 49.1.2 for testing bug 1815711

Categories

(Infrastructure & Operations :: RelOps: Windows OS, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: markco)

References

Details

(Whiteboard: [relops-azure])

Attachments

(1 file, 2 obsolete files)

In order to test the work done in bug 1815711 it will be useful to have a test pool created.

Depends on: 1815711
Assignee: nobody → mcornmesser
Whiteboard: [relops-azure]

pmoore: would you like this pool to be Win 10 or Win 11?

Flags: needinfo?(pmoore)

This should be the same as test-windows11-64-2009-qr/opt-mochitest-media-wmfme but with an updated generic-worker/livelog/taskcluster-proxy/start-worker (49.1.2 or later).

Original task that fails with an older generic-worker: https://firefox-ci-tc.services.mozilla.com/tasks/PnVQXgGdSQiwfMkgRCBEng

Once the new pool is set up, I believe running this task against it should confirm whether the updated generic-worker fixes the issue.

The current production pool (above) is running generic-worker version 44.13.6.

Thanks Mark!

Flags: needinfo?(pmoore)
Pushed by mcornmesser@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/8ab3c61dd26a Add win 11 alpha pool for Taskcluster use. r=jmoss
Attachment #9332958 - Attachment is obsolete: true

Gecko-t/win11-64-2009-tc-alpha has been setup for this.

Process to get the pool up.

First step is to set up changes to be tested in Ronin Puppet. This can be done by either creating or new branch in the main repo or forking the repo. In this case a I created a new branch bug1831158.

Various worker pools are based off the same Puppet role catalogue. Typically based off of OS version and location. This image is based off of the win11642009azure. Software versions for the role is stored in a yaml file that corresponds to the role name and can be found at /data/roles with in the repo.

For this the file win11642009azure.yaml was updated to reflect the 49.1.2 version.

 1 ---
  2
  3 win-worker:
  4   # generic-worker
  5   generic_worker:
  6     # File versions
  7     name: "generic-worker-multiuser-windows-amd64"
  8     version: '49.1.2'
  9
 10   taskcluster:
 11     worker_runner:
 12         name: 'start-worker-windows-amd64'
 13         version: '49.1.2'
 14         provider: "azure"
 15         implementation: "generic-worker"
 16     proxy:
 17         name: "taskcluster-proxy-windows-amd64"
 18         version: '49.1.2'
 19     # Refrencing the file directly with version in it
 20     # Since there is no programtic way to check the version of the livelog exe
 21     livelog:
 22         name: "livelog-windows-amd64"
 23         version: '49.1.2'
 24     client_id: "azure/gecko-t/win11-64-2009"
 25     worker_group: "test"

The one part that will need to be done to by Release SRE is uploading the need files to the appropriate Azure blob.

Next stop is generating an image. This is currently done through cloud-image-builder with these instructions.

For this a new yaml file was created, packer/config/win11-64-2009-tc-alpha.yaml. Each yaml file in this config dir corresponds to an image. When creating a new image but based off of an OS that already in use, the existing yaml file can be copied, but the yaml file will need a few updates.

Under azure.locations the regions for the image to be built will need to be selected. As well as vm.tags will need to be updated.

 22 vm:
 23     size: Standard_F8s_v2
 24     tags:
 25         base_image: win11642009azure     -> This needs to reflect the role name in Ronin Puppet
 26         worker_pool_id: win11-64-2009-tc  -> This is reflected in the image name
 27         sourceOrganisation: mozilla-platform-ops  
 28         sourceRepository: ronin_puppet            
 29         sourceBranch: bug1831158   -> The "source" values correspond to which repo is cloned and applied 
 30         deploymentId: 9e9e219   -> This indicates which revision of the repo is to be cloned

When adding a new image/yaml file the name needs to be added to an array in /create-image-build-tasks.py. This is done so that Packer is used for this image instead of the default tools in the repo.

The image build is triggered off of syntax in the commit message. Details on the syntax can be found in the instructions above. For this the syntax looked like: git commit -m "win 11 tc pool" -m "include environments: production" -m "include keys:win11-64-2009-tc-alpha"

It produces an image named: win11-64-2009-tc-centralus-win11-22h2-avd-alpha

Note that images with -alpha are set up to overwrite the existing images each time they are built. This was set up so that testing iteration can be done without updating Ci-configuration. Which updating would require a review and can slow down testing.

To add a new image to be used in Ci-configuration an image definition needs to be added to the worker-images.yml. For this it looked like:

 ronin-t-windows11-64-2009-tc-alpha:    -> This will be the image name referenced in the pool config.
  azure2:   -> This reflects the worker-manager provider name. Ci-admin will determine how to build the config based on this value. 
    deployment_id: alpha    -> Indicates that alpha is used instead of deploymentId in the image name.   
    central-us: win11-64-2009-tc-centralus-win11-22h2-avd  -> This is the root of the image name. Must have one for each region. 
    east-us: win11-64-2009-tc-eastus-win11-22h2-avd

Additionally a pool configuration is added the worker-pools.yml . For this pool i copied the configuration for gecko-t/win11-64-2009 and updated as indicated below.

  - pool_id: gecko-t/win11-64-2009-tc-alpha  -> New pool name. 
    description: ''
    owner: relops-azure-provisioning@mozilla.com
    email_on_error: true
    provider_id: azure2
    config:
      image: ronin-t-windows11-64-2009-tc-alpha -> Update to reflect the image definition name in worker-images.yml.
      image_resource_group: rg-packer-through-cib
      implementation: generic-worker/worker-runner-windows
      worker-purpose: gecko-t
      locations: [central-us, east-us]  -> Tell which regions to serve. Must reflect the regions listed in worker-images.yml.
      maxCapacity: 100
      worker-config:
        genericWorker:
          config:
            workerType: win11-64-2009-tc-alpha   -> Updated with the workerType
            provisionerId: gecko-t
            idleTimeoutSecs: 5400
      tags:
        sourceScript: provisioners/windows/azure/azure-bootstrap.ps1
        sourceBranch: bug1831158  -> Provide info for source repo so tags reflect where the config came from. 
        sourceRepository: ronin_puppet
        sourceOrganisation: mozilla-platform-ops
      spot: true
      vmSizes:
        - vmSize: Standard_F8s_v2
          launchConfig:
            osProfile:
              windowsConfiguration:
                timeZone: UTC
                enableAutomaticUpdates: false
            storageProfile:
              osDisk:
                osType: Windows
                createOption: FromImage
              dataDisks:
                - lun: 0
                  diskSizeGB: 30
                  managedDisk:
                    storageAccountType: Standard_LRS
                  createOption: Empty
            hardwareProfile:
              vmSize: Standard_F8s_v2
            diagnosticsProfile:
              bootDiagnostics:
                enabled: false
Blocks: 1815711
No longer depends on: 1815711

(In reply to Pete Moore [:pmoore][:pete] from comment #2)

This should be the same as test-windows11-64-2009-qr/opt-mochitest-media-wmfme but with an updated generic-worker/livelog/taskcluster-proxy/start-worker (49.1.2 or later).

Original task that fails with an older generic-worker: https://firefox-ci-tc.services.mozilla.com/tasks/PnVQXgGdSQiwfMkgRCBEng

Once the new pool is set up, I believe running this task against it should confirm whether the updated generic-worker fixes the issue.

The current production pool (above) is running generic-worker version 44.13.6.

Thanks Mark!

I've triggered a new task the same as the one above, but using the new pool Mark has set up:

(In reply to Pete Moore [:pmoore][:pete] from comment #8)

Hit an issue, next attempt!

I can see the "ALL RESTRICTED APPLICATION PACKAGES" SID (amongst many others) on the winstation and desktop now, so it looks like the fix works as expected.
Mark, are you able to arrange for us to use the new worker version on our normal worker pools?

Flags: needinfo?(mcornmesser)
Depends on: 1834580

Mark, are you able to arrange for us to use the new worker version on our normal worker pools?

Yes. We are currently troubleshooting some issues with the Windows testers, RELOPS-562. We can look at updating once those are addressed.

Flags: needinfo?(mcornmesser)
Attachment #9386947 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: