Bug 1459203 Comment 10 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Miles Crabill [:miles] [also mcrabill@mozilla.com] from comment #8)

> I started looking into this and have a few questions:
> 
> 1. The way that logging is handled currently, it looks like we log to a file `generic-worker-service.log` adjacent to the bat script / exe. NSSM comes with some niceties (log rotation) but the go example uses a default eventlog. We can keep logging to a file and still use eventlog, but I don't think we get rotation for free. Do we need the custom logfile? Should we just use the system eventlog?
>

I wasn't aware that the current nssm integration is writing to the system event log, but funnily enough this has been requested in bug 1436274. If it is relatively straightforward to write to the event log, indeed we may wish to stop writing to the local log file, although I think we should make sure that when manually running the worker we can still write to standard out/standard error, e.g. so that you don't need to inspect the event log to find the cause of a unit test failure, etc. We would want to make sure that the generic-worker CI task logs contain the complete output of the worker, for example.

> 2. Do we need the bat script? [0] We create a bat script that essentially wraps the generic-worker.exe, passes arguments, and redirects output. My impression is that we can just include those arguments in the service creation instead, eliminating the need for the script. (though I'm not sure how input redirection works in this case)

The bat script was created mostly for the purposes of customisation. In OpenCloudConfig, [this version](https://github.com/mozilla-releng/OpenCloudConfig/blob/106f2aba497dc7adc96681dec581639723999d6c/userdata/Configuration/GenericWorker/run-generic-worker-and-reboot.bat) is swapped out with the version generated by the install command. If we were to remove it, we'd need to make sure that RelOps still had a way to customise this behaviour, or we should look at integrating their customised version, if it makes sense for it to be standard.

> 
> 3. It is mentioned here [1] that the service is updated if already installed. My impression is that doing this in Go means removing and reinstalling the service - does that sound reasonable? Do we care if the service is already installed? (i.e. do we actually want to reinstall it?) Really what we want is for the generic-worker.exe to by _up to date_, right?

This is a good question. I think on cloud instances we would never need to install more than once; if we are making system changes, we create a fresh instance and install from scratch. This isn't the case on all hardware workers - I believe they are typically not reimaged when they are upgraded. Therefore I think it probably does make sense to have an option to uninstall the worker.

In general, I think even if it is more work, providing the option to uninstall the worker would be a good thing, and we should ship that with the worker. Admittedly we don't provide a command line option for explicitly uninstalling the worker at the moment (the user would need to run something like `nssm remove 'Generic Worker'`) but once we stop using nssm, I think we should probably include a `generic-worker uninstall` target or similar.

> 4. There's a `configFile` variable passed to `deployService`. Best I can tell, it is unused. Any idea what it's there for? The worker itself obviously takes a config file, but `generic-worker service install --config $filename` doesn't seem to set that up (i.e. setting up the bash script to use the config file).
> 

Whoops, indeed that is a bug! Indeed, it should set the config file to use in the `run-generic-worker.bat` script, but at the moment it isn't passing it through to the `CreateRunGenericWorkerBatScript` method.

> 
> I have some WIP (untested because I don't have a working dev environment yet) code up here: https://github.com/taskcluster/generic-worker/compare/master...milescrabill:no-more-nssm-20190430#diff-04c6e90faac2675aa89e2176d2eec7d8

Great, I'll take a look!

> [0] https://github.com/taskcluster/generic-worker/blob/v14.1.0/plat_windows.go#L490-L520
> [1] https://github.com/taskcluster/generic-worker/blob/v14.1.0/plat_windows.go#L548
> 
> PS: You answered my 5th question (should I clean up the worker_types) in flight! Thanks! :)
(In reply to Miles Crabill [:miles] [also mcrabill@mozilla.com] from comment #8)

> I started looking into this and have a few questions:
> 
> 1. The way that logging is handled currently, it looks like we log to a file `generic-worker-service.log` adjacent to the bat script / exe. NSSM comes with some niceties (log rotation) but the go example uses a default eventlog. We can keep logging to a file and still use eventlog, but I don't think we get rotation for free. Do we need the custom logfile? Should we just use the system eventlog?
>

I wasn't aware that the current nssm integration is writing to the system event log, but funnily enough this has been requested in bug 1436274. If it is relatively straightforward to write to the event log, indeed we may wish to stop writing to the local log file, although I think we should make sure that when manually running the worker we can still write to standard out/standard error, e.g. so that you don't need to inspect the event log to find the cause of a unit test failure, etc. We would want to make sure that the generic-worker CI task logs contain the complete output of the worker, for example.

> 2. Do we need the bat script? [0] We create a bat script that essentially wraps the generic-worker.exe, passes arguments, and redirects output. My impression is that we can just include those arguments in the service creation instead, eliminating the need for the script. (though I'm not sure how input redirection works in this case)

The bat script was created mostly for the purposes of customisation. In OpenCloudConfig, [this version](https://github.com/mozilla-releng/OpenCloudConfig/blob/106f2aba497dc7adc96681dec581639723999d6c/userdata/Configuration/GenericWorker/run-generic-worker-and-reboot.bat) is swapped out with the version generated by the install command. If we were to remove it, we'd need to make sure that RelOps still had a way to customise this behaviour, or we should look at integrating their customised version, if it makes sense for it to be standard.

> 
> 3. It is mentioned here [1] that the service is updated if already installed. My impression is that doing this in Go means removing and reinstalling the service - does that sound reasonable? Do we care if the service is already installed? (i.e. do we actually want to reinstall it?) Really what we want is for the generic-worker.exe to by _up to date_, right?

This is a good question. I think on cloud instances we would never need to install more than once; if we are making system changes, we create a fresh instance and install from scratch. This isn't the case on all hardware workers - I believe they are typically not reimaged when they are upgraded. Therefore I think it probably does make sense to have an option to uninstall the worker.

In general, I think even if it is more work, providing the option to uninstall the worker would be a good thing, and we should ship that with the worker. Admittedly we don't provide a command line option for explicitly uninstalling the worker at the moment (the user would need to run something like `nssm remove 'Generic Worker'`) but once we stop using nssm, I think we should probably include a `generic-worker uninstall` target or similar.

Note, I agree that probably the only thing that changes when we install a newer version of a worker is that the generic-worker binary changes (so just swapping out the generic-worker.exe binary might be enough) but we can't easily guarantee that we won't make changes to the install footprint, so having an explicit uninstall would be nice to capture this. It also allows people who wish to play with the worker locally on a Windows machine to uninstall it after they no longer wish to use the worker / have it installed on their system.

> 4. There's a `configFile` variable passed to `deployService`. Best I can tell, it is unused. Any idea what it's there for? The worker itself obviously takes a config file, but `generic-worker service install --config $filename` doesn't seem to set that up (i.e. setting up the bash script to use the config file).
> 

Whoops, indeed that is a bug! Indeed, it should set the config file to use in the `run-generic-worker.bat` script, but at the moment it isn't passing it through to the `CreateRunGenericWorkerBatScript` method.

> 
> I have some WIP (untested because I don't have a working dev environment yet) code up here: https://github.com/taskcluster/generic-worker/compare/master...milescrabill:no-more-nssm-20190430#diff-04c6e90faac2675aa89e2176d2eec7d8

Great, I'll take a look!

> [0] https://github.com/taskcluster/generic-worker/blob/v14.1.0/plat_windows.go#L490-L520
> [1] https://github.com/taskcluster/generic-worker/blob/v14.1.0/plat_windows.go#L548
> 
> PS: You answered my 5th question (should I clean up the worker_types) in flight! Thanks! :)

Back to Bug 1459203 Comment 10