Closed
Bug 1496526
Opened 6 years ago
Closed 6 years ago
Generic-worker service is not being installed on newly deploy Windows moonshot nodes
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Infrastructure & Operations
RelOps: OpenCloudConfig
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: markco, Assigned: markco)
References
Details
Attachments
(1 file, 1 obsolete file)
58 bytes,
patch
|
pmoore
:
review+
|
Details | Diff | Splinter Review |
https://bugzilla.mozilla.org/show_bug.cgi?id=1493759#c6
(In reply to Zsolt Fay [:zfay] from comment #6)
> Found a few workers that aren't in TC, re-imaging them changes nothing in
> their state and the logs remain the same.
> T-W1064-MS-159 < rebooted, reimaged, GW is not running
> T-W1064-MS-211 < reimaged, GW is not running
> T-W1064-MS-214 < reimaged, GW is not running
> T-W1064-MS-478 < reimaged, GW is not running
> T-W1064-MS-543 < reimaged, GW is not running
> T-W1064-MS-581 < reimaged, GW is not running
> T-W1064-MS-589 < reimaged, GW is not running
>
> All of the above share a similarity in the logs. None have GW running.
> Spotted these log entries in a few of them:
> https://papertrailapp.com/systems/1730894031/
> events?focus=984350109993177146&selected=984350109993177146
I did a fresh install on ms-016. Starting here is paper trail: https://papertrailapp.com/groups/1141234/events?focus=984562529676201988&q=ms-016&selected=984562529676201988
The node deployed through and ran OCC, but will continue to reboot with message:
Oct 04 11:12:24 T-W1064-MS-016.mdc1.mozilla.com User32: The process C:\windows\system32\shutdown.exe (T-W1064-MS-016) has initiated the restart of computer T-W1064-MS-016 on behalf of user NT AUTHORITY\SYSTEM for the following reason: Application: Unresponsive Reason Code: 0x40005 Shutdown Type: restart Comment: reboot to rouse the generic worker#015
The generic-worker service is not present:
C:\Users\Administrator>sc queryex type= service state= all | find /i "generic"
C:\Users\Administrator>
And nssm had not been extracted to C:\
C:\>dir
Volume in drive C is Windows
Volume Serial Number is 50AC-80E8
Directory of C:\
10/04/2018 04:37 PM <DIR> builds
10/04/2018 06:14 PM <DIR> dsc
10/04/2018 05:47 PM <DIR> generic-worker
10/04/2018 05:47 PM <DIR> hg-shared
10/04/2018 05:47 PM <SYMLINKD> home [C:\Users]
10/04/2018 04:38 PM <DIR> Intel
10/04/2018 06:29 PM <DIR> log
10/04/2018 05:47 PM <DIR> mozilla-build
03/18/2017 09:03 PM <DIR> PerfLogs
10/04/2018 05:47 PM <DIR> pip-cache
10/04/2018 05:47 PM <DIR> ProcessExplorer
10/04/2018 05:47 PM <DIR> ProcessMonitor
10/04/2018 05:51 PM <DIR> Program Files
10/04/2018 05:49 PM <DIR> Program Files (x86)
10/04/2018 05:47 PM <DIR> tooltool-cache
10/04/2018 04:34 PM <DIR> Users
10/04/2018 05:46 PM <DIR> Windows
0 File(s) 0 bytes
17 Dir(s) 44,975,190,016 bytes free
In the logs it shows:
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-016]: [[Script]CommandRun_NSSMInstall] Performing the operation "Set-TargetResource" on target "Executing the SetScript with the user supplied credential".#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-016]: LCM: [ End Set ] [[Script]CommandRun_NSSMInstall] in 0.0100 seconds.#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: #015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: This command cannot be run due to the error: The system cannot find the file specified. #015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + PSComputerName : localhost#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: Set-TargetResource functionality with error message: This command cannot be#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: run due to the error: The system cannot find the file specified.#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: + PSComputerName : localhost#015
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: #015
But the file is downloaded:
Oct 04 11:14:10 T-W1064-MS-016.mdc1.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-016]: [[Log]Log_FileDownload_NSSMDownload] FileDownload: NSSMDownload, completed
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → mcornmesser
Assignee | ||
Comment 1•6 years ago
|
||
There are multiple errors based on not able to find a file or path:
ct 04 21:40:17 T-W1064-MS-211.mdc1.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-211]: LCM: [ End Set ] [[Script]CommandRun_OpenSshUnzip] in 0.0100 seconds.#015
Oct 04 21:40:17 T-W1064-MS-211.mdc1.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: #015
Oct 04 21:40:17 T-W1064-MS-211.mdc1.mozilla.com dsc-run: This command cannot be run due to the error: The system cannot find the file specified. #015
Oct 04 21:40:17 T-W1064-MS-211.mdc1.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException#015
I also found an error specif to OCC-Validate:
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: WARNING: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The names of some imported commands from the module 'OCC-Validate' include unapproved verbs that might make them less discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose parameter. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Log-Validation' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] Importing function 'Log-Validation'.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Validate-All' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] Importing function 'Validate-All'.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Validate-CommandsReturnOrNotRequested' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] Importing function 'Validate-CommandsReturnOrNotRequested'.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Validate-FilesContainOrNotRequested' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] Importing function 'Validate-FilesContainOrNotRequested'.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Validate-PathsExistOrNotRequested' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] Importing function 'Validate-PathsExistOrNotRequested'.#015
Oct 04 21:40:18 T-W1064-MS-543.mdc2.mozilla.com dsc-run: VERBOSE: [T-W1064-MS-543]: [[Script]InstallSupportingModules] The 'Validate-PathsNotExistOrNotRequested' command in the OCC-Validate' module was imported, but because its name does not include an approved verb, it might be difficult to find. For a list of approved verbs, type Get-Verb.#015
I am going to test with an older version of the file.
Assignee | ||
Comment 2•6 years ago
|
||
This pull request seems to be the cause https://github.com/mozilla-releng/OpenCloudConfig/commit/a64ffcf36957a16d5e880966deebe58023ec7bdf .
Without this code incorporated DSC was able to install the needed packages.
I think we should roll this back until Rob returns from PTO.
Attachment #9014932 -
Flags: review?(pmoore)
Updated•6 years ago
|
Attachment #9014932 -
Attachment is patch: true
Attachment #9014932 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9014932 -
Flags: review?(pmoore) → review-
Comment 3•6 years ago
|
||
Attachment #9014932 -
Attachment is obsolete: true
Attachment #9015281 -
Flags: review?(pmoore)
Updated•6 years ago
|
Attachment #9015281 -
Attachment is patch: true
Attachment #9015281 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #9015281 -
Flags: review?(pmoore) → review+
Comment 4•6 years ago
|
||
Apologies Mark. that patch (https://github.com/mozilla-releng/OpenCloudConfig/commit/a64ffcf36957a16d5e880966deebe58023ec7bdf) was indeed faulty.
i got back from pto today and noticed the problem (in papertrail), but didn't see this bug. i spent the day fixing it with a number of patches (debugging and testing) and actually had the errors fixed and committed with this push:
https://github.com/mozilla-releng/OpenCloudConfig/commit/4c6ea88bbf09ec9210e3e6b4f4ff598a6b874e49
Unfortunately, because I wasn't aware of this bug and hadn't posted here, revert merges were made:
- https://github.com/mozilla-releng/OpenCloudConfig/commit/96991ff8c218cb34e03de05c7813af977307e9e6
- https://github.com/mozilla-releng/OpenCloudConfig/commit/3464c36537c1a6449606db9ef824fdbcdaac3205
These didn't take into account that I had already completed and merged working patches so the reverts actually broke things again.
The breakages are most easily seen with these searches:
- hardware:
https://papertrailapp.com/groups/1958653/events?q=program%3Adsc-run%20%22SendConfigurationApply%20function%20did%20not%20succeed%22
- ec2:
https://papertrailapp.com/groups/2488493/events?q=program%3Adsc-run%20%22SendConfigurationApply%20function%20did%20not%20succeed%22
As soon as I reverted the reverts (https://github.com/mozilla-releng/OpenCloudConfig/commit/43e7daf2cd1e33c618cd24f1e0462944e6f3e708), the problem was sorted again.
Please let me know if you see any issues.
Note that the "approved verb" errors mentioned in comment 1 are safe to ignore. They're just warnings indicating that Powershell prefers the use of approved verbs in function names. eg: Start-Something instead of Begin-Something ("Start" is an approved verb "Begin" isn't). Those messages don't indicate that something is broken, just that the approved verb naming convention has been ignored.
The problem in the original patch was to do with my failure to use the new unique filename (introduced by the patch) everywhere that the file is referenced. This was corrected here:
https://github.com/mozilla-releng/OpenCloudConfig/commit/4c6ea88bbf09ec9210e3e6b4f4ff598a6b874e49
Comment 5•6 years ago
|
||
i've been monitoring the restarting hardware nodes and see that the procmon and procexp errors are no more (these were caused by the earlier defective patch, then fixed as per comment 4).
there is another error relating to the install of the Windows SDK. i don't know yet if this is a new error, or something that's been around for a while. this error is easiest to spot with this search:
https://papertrailapp.com/groups/1958653/events?q=SendConfigurationApply%20ExeInstall_Windows_SDK
one thing that used to cause us problems with SDK installs was that they don't always return an exit code of 0. we'll need to check if this is what's going on by manually running the sdk installer on a hardware instance and then checking the exit code. eg:
sdk-instal.exe /q
echo %errorlevel%
if the exit code is not 0 but the install was a success, we can simply modify the manifest component to allow whatever the exit code was.
eg if the exit code we want to allow is "7":
{
"ComponentName": "Windows_SDK",
...
"AllowedExitCodes": [
"7"
],
...
}
if the exit code is 0, this wasn't the issue.
if the manual install fails, we might learn why the dsc install is failing.
Assignee | ||
Comment 6•6 years ago
|
||
> there is another error relating to the install of the Windows SDK. i don't
> know yet if this is a new error, or something that's been around for a
> while. this error is easiest to spot with this search:
> https://papertrailapp.com/groups/1958653/
> events?q=SendConfigurationApply%20ExeInstall_Windows_SDK
This is an old error that seems to be non-impacting. I am going to create a bug to keep track of it, but I don't when I will actually get to it.
Comment 7•6 years ago
|
||
Re-imaged all 121 workers which were missing from TC:
T-W1064-MS-016 T-W1064-MS-214 T-W1064-MS-424
T-W1064-MS-020 T-W1064-MS-219 T-W1064-MS-427
T-W1064-MS-022 T-W1064-MS-222 T-W1064-MS-428
T-W1064-MS-031 T-W1064-MS-243 T-W1064-MS-429
T-W1064-MS-034 T-W1064-MS-248 T-W1064-MS-434
T-W1064-MS-041 T-W1064-MS-252 T-W1064-MS-435
T-W1064-MS-062 T-W1064-MS-255 T-W1064-MS-480
T-W1064-MS-063 T-W1064-MS-260 T-W1064-MS-497
T-W1064-MS-064 T-W1064-MS-262 T-W1064-MS-501
T-W1064-MS-066 T-W1064-MS-263 T-W1064-MS-502
T-W1064-MS-069 T-W1064-MS-266 T-W1064-MS-503
T-W1064-MS-070 T-W1064-MS-282 T-W1064-MS-504
T-W1064-MS-076 T-W1064-MS-283 T-W1064-MS-505
T-W1064-MS-077 T-W1064-MS-285 T-W1064-MS-506
T-W1064-MS-081 T-W1064-MS-289 T-W1064-MS-507
T-W1064-MS-090 T-W1064-MS-292 T-W1064-MS-508
T-W1064-MS-107 T-W1064-MS-293 T-W1064-MS-511
T-W1064-MS-108 T-W1064-MS-319 T-W1064-MS-512
T-W1064-MS-110 T-W1064-MS-326 T-W1064-MS-518
T-W1064-MS-111 T-W1064-MS-328 T-W1064-MS-547
T-W1064-MS-118 T-W1064-MS-329 T-W1064-MS-548
T-W1064-MS-120 T-W1064-MS-331 T-W1064-MS-550
T-W1064-MS-128 T-W1064-MS-332 T-W1064-MS-554
T-W1064-MS-129 T-W1064-MS-333 T-W1064-MS-556
T-W1064-MS-133 T-W1064-MS-334 T-W1064-MS-560
T-W1064-MS-152 T-W1064-MS-337 T-W1064-MS-561
T-W1064-MS-154 T-W1064-MS-338 T-W1064-MS-562
T-W1064-MS-155 T-W1064-MS-339 T-W1064-MS-564
T-W1064-MS-159 T-W1064-MS-342 T-W1064-MS-565
T-W1064-MS-164 T-W1064-MS-365 T-W1064-MS-570
T-W1064-MS-165 T-W1064-MS-367 T-W1064-MS-582
T-W1064-MS-170 T-W1064-MS-374 T-W1064-MS-586
T-W1064-MS-172 T-W1064-MS-382 T-W1064-MS-588
T-W1064-MS-173 T-W1064-MS-384 T-W1064-MS-589
T-W1064-MS-176 T-W1064-MS-406 T-W1064-MS-590
T-W1064-MS-177 T-W1064-MS-409 T-W1064-MS-593
T-W1064-MS-199 T-W1064-MS-410 T-W1064-MS-595
T-W1064-MS-201 T-W1064-MS-413 T-W1064-MS-596
T-W1064-MS-202 T-W1064-MS-418 T-W1064-MS-598
T-W1064-MS-204 T-W1064-MS-422
T-W1064-MS-205 T-W1064-MS-423
Assignee | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•