Closed Bug 1460178 Opened 8 years ago Closed 8 years ago

The SendConfigurationApply function did not succeed.

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: pmoore, Unassigned)

References

Details

We are getting this on a lot of workers, and for example this prevents jobs running on all staging win7 machines. Unfortunately this makes it impossible to test the generic-worker fix from bug 1458873 at the moment. > dsc-run: The SendConfigurationApply function did not succeed. > dsc-run: + CategoryInfo : NotSpecified: (root/Microsoft/...gurationManager:String) [], CimException > dsc-run: + FullyQualifiedErrorId : MI RESULT 1 > dsc-run: + PSComputerName : localhost > dsc-run: The SendConfigurationApply function did not succeed. > dsc-run: + CategoryInfo : NotSpecified: (root/Microsoft/...gurationManager:String) [], CimException > dsc-run: + FullyQualifiedErrorId : MI RESULT 1 > dsc-run: + PSComputerName : localhost > dsc-run: VERBOSE: Operation 'Invoke CimMethod' complete. > dsc-run: VERBOSE: Time taken for configuration job to complete is 31.071 seconds > dsc-run: ********************** > dsc-run: Windows PowerShell transcript end > dsc-run: End time: 20180509042811 > dsc-run: ********************** This could also be the root cause of bug 1457979, but creating a separate bug for this - this originates from https://bugzilla.mozilla.org/show_bug.cgi?id=1457979#c20
Blocks: 1457979, 1460084
Blocks: 1458873
(In reply to Pete Moore [:pmoore][:pete] from comment #0) > This could also be the root cause of bug 1457979, but creating a separate > bug for this - this originates from > https://bugzilla.mozilla.org/show_bug.cgi?id=1457979#c20 Typo - that should have been comment 19, not comment 20, i.e.: https://bugzilla.mozilla.org/show_bug.cgi?id=1457979#c19
the error shown in comment 0 indicates that some element of the dsc configuration failed to apply. it's necessary to scroll further up the logs to determine which element caused the exception. on the logs that i checked this morning, it was the download of a papertrail certificate which allows encrypted log forwarding by the instance to the papertrail servers. since this normally succeeds, somewhere in the region of 10000 times per day and only fails on a handful of instances, i am guessing that the papertrail static server which serves up the cert textfile: https://papertrailapp.com/tools/papertrail-bundle.pem is occassionally not able to cope with demand. i am copying the cert to github cdn, where we have a better track record of serving up this sort of content at volume. https://github.com/mozilla-releng/OpenCloudConfig/commit/744ea708cece623c3f727b5aad9b834d6543782b
an example of the specific dsc failure: May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: LCM: [ Start Resource ] [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: LCM: [ Start Test ] [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: LCM: [ End Test ] [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] in 0.0160 seconds. May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: LCM: [ Start Set ] [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] Performing the operation "Set-TargetResource" on target "Executing the SetScript with the user supplied credential". May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] Deleted C:\windows\Temp\papertrail-bundle.pem May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] GET https://papertrailapp.com/tools/papertrail-bundle.pem with 0-byte payload May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: VERBOSE: [I-064E990D7D275]: LCM: [ End Set ] [[Script]ChecksumFileDownload_PaperTrailEncryptionCertificate] in 0.4680 seconds. May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: The May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: server committed a protocol violation. Section=ResponseStatusLine May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + PSComputerName : localhost May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: <11>1 2018-05-08T23:11:59.208154+00:00 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC 2604 - [NXLOG@14506 Keywords="4611686018427387904" EventType="ERROR" EventID="4250" ProviderGuid="{50DF9E12-A8C4-4939-B281-47E1325BA63E}" Version="0" Task="0" OpcodeValue="0" RecordNumber="158" ActivityID="{85B51564-409F-0001-6E9C-98AA21E7D301}" ThreadID="1716" Channel="Microsoft-Windows-DSC/Operational" Domain="NT AUTHORITY" AccountName="SYSTEM" UserID="S-1-5-18" AccountType="User" Opcode="Info" JobId="{311458CC-5315-11E8-8BA9-0AB520447E54}" HResult="-2146233087" StackTrace=" at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input, Hashtable errorResults, Boolean enumerate) May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecute(Array input, Hashtable errorResults) May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: at System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper() May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: at System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()" EventReceivedTime="2018-05-08 23:12:01" SourceModuleName="eventlog" SourceModuleType="im_msvistalog"] Job {311458CC-5315-11E8-8BA9-0AB520447E54} : Message The server committed a protocol violation. Section=ResponseStatusLine HResult -2146233087 StackTrack at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input, Hashtable errorResults, Boolean enumerate) at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecute(Array input, Hashtable errorResults) at System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper() at System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc() May 09 02:12:02 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: Job {311458CC-5315-11E8-8BA9-0AB520447E54} : This event indicates that failure happens when LCM is processing the configuration. Error Id is 0x1. Error Detail is The SendConfigurationApply function did not succeed.. Resource Id is [Script]ChecksumFileDownload_PaperTrailEncryptionCertificate and Source Info is C:\windows\TEMP\xDynamicConfig.ps1::250::9::Script. Error Message is PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: The server committed a protocol violation. Section=ResponseStatusLine . May 09 02:12:03 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com Microsoft-Windows-DSC: Job {311458CC-5315-11E8-8BA9-0AB520447E54} : MIResult: 1 Error Message: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: The server committed a protocol violation. Section=ResponseStatusLine Message ID: ProviderOperationExecutionFailure Error Category: 7 Error Code: 1 Error Type: MI May 09 02:12:22 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: The server committed a protocol violation. May 09 02:12:22 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: Section=ResponseStatusLine May 09 02:12:22 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException May 09 02:12:22 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure May 09 02:12:22 i-064e990d7d275af9b.gecko-t-win7-32.usw2.mozilla.com dsc-run: + PSComputerName : localhost
on further investigation i have seen that the dsc failure specific to gecko-t-win7-32-beta is related to the installation of generic worker 10.7.12. > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: LCM: [ Start Resource ] [[Script]CommandRun_GenericWorkerInstall] May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: LCM: [ Start Test ] [[Script]CommandRun_GenericWorkerInstall] May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: [[Script]CommandRun_GenericWorkerInstall] Validate-PathsExistOrNotRequested :: 2 validations specified. May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: [[Script]CommandRun_GenericWorkerInstall] Path absent: C:\generic-worker\run-generic-worker.bat May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: [[Script]CommandRun_GenericWorkerInstall] Path present: C:\generic-worker\generic-worker.exe May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: [[Script]CommandRun_GenericWorkerInstall] Validations not satisfied May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: LCM: [ End Test ] [[Script]CommandRun_GenericWorkerInstall] in 0.0000 seconds. May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: LCM: [ Start Set ] [[Script]CommandRun_GenericWorkerInstall] May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: [[Script]CommandRun_GenericWorkerInstall] Performing the operation "Set-TargetResource" on target "Executing the SetScript with the user supplied credential". May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: VERBOSE: [I-0BC4BE0C1BAB1]: LCM: [ End Set ] [[Script]CommandRun_GenericWorkerInstall] in 0.0460 seconds. May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: This command cannot be run due to the error: The parameter is incorrect. May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure May 08 22:49:28 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + PSComputerName : localhost May 08 22:50:17 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: PowerShell DSC resource MSFT_ScriptResource failed to execute Set-TargetResource functionality with error message: This command cannot be run due to the May 08 22:50:17 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: error: The parameter is incorrect. May 08 22:50:17 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + CategoryInfo : InvalidOperation: (:) [], CimException May 08 22:50:17 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + FullyQualifiedErrorId : ProviderOperationExecutionFailure May 08 22:50:18 i-0bc4be0c1bab1c323.gecko-t-win7-32-beta.usw2.mozilla.com dsc-run: + PSComputerName : localhost
i think the problem is here: https://github.com/mozilla-releng/OpenCloudConfig/blob/744ea70/userdata/Manifest/gecko-t-win7-32-beta.json#L598 where the validation expects to find run-generic-worker.bat but has no DependsOn directive for the download of that file (ChecksumFileDownload/GenericWorkerStateWait).
Many thanks Rob! Nice sleuthing.
It looks like two different errors on different instances 1) Path absent: C:\generic-worker\run-generic-worker.bat [[Script]CommandRun_GenericWorkerInstall] Validate-PathsExistOrNotRequested :: 2 validations specified. [[Script]CommandRun_GenericWorkerInstall] Path absent: C:\generic-worker\run-generic-worker.bat [[Script]CommandRun_GenericWorkerInstall] Path present: C:\generic-worker\generic-worker.exe [[Script]CommandRun_GenericWorkerInstall] Validations not satisfied [[Script]CommandRun_GenericWorkerInstall] in 0.0000 seconds. I agree with your analysis in comment 5 with how to fix this - I'll make a patch. 2) Command: C:\generic-worker\generic-worker.exe --version => Search: generic-worker 10.7.12 => Validations not satisfied [[Script]CommandRun_GenericWorkerInstall] Validate-PathsExistOrNotRequested :: 2 validations specified. [[Script]CommandRun_GenericWorkerInstall] Path present: C:\generic-worker\run-generic-worker.bat [[Script]CommandRun_GenericWorkerInstall] Path present: C:\generic-worker\generic-worker.exe [[Script]CommandRun_GenericWorkerInstall] Validate-PathsNotExistOrNotRequested :: 0 validations specified. [[Script]CommandRun_GenericWorkerInstall] Validate-CommandsReturnOrNotRequested :: 1 validation specified. [[Script]CommandRun_GenericWorkerInstall] Command: C:\generic-worker\generic-worker.exe --version [[Script]CommandRun_GenericWorkerInstall] Search: generic-worker 10.7.12 [[Script]CommandRun_GenericWorkerInstall] Validations not satisfied [[Script]CommandRun_GenericWorkerInstall] in 0.1560 seconds. This is strange, because in this case generic worker has been installed, and the paths are present. I'll check on a machine to see if the binary is corrupted etc.
(In reply to Pete Moore [:pmoore][:pete] from comment #7) > This is strange, because in this case generic worker has been installed, and > the paths are present. I'll check on a machine to see if the binary is > corrupted etc. Indeed this is the case. C:\generic-worker\generic-worker.exe contains the content: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>31899192/232d476a-521c-11e8-9bfa-8cab171a2194</Key><RequestId>09408DDFFFFA8564</RequestId><HostId>YUSXC2bDGbmDsYkcmcsfxts6wA3DXmeU34M6nlFboCHsrio9LdsYYijcW5zGuQJTY80ru9D50zA=</HostId></Error>
I think the problem was that I uploaded the failed download.... I can see this by checking the SHA512 of the corrupt version, and seeing it matches what was put in the manifest. The SHA512 of the file on the machine: PS C:\Users\Administrator> Get-FileHash C:\generic-worker\generic-worker.exe -Algorithm SHA512 | Format-List Algorithm : SHA512 Hash : D854D13978BA83E8B0F806A3C21653C2AD76C18E8C74DA719D6F383672E76234A13A656C2F429C68CF4C94F793F3561B3E790433C83 384CA1C34AF9204979B8F Path : C:\generic-worker\generic-worker.exe This matches version in manifest: https://github.com/mozilla-releng/OpenCloudConfig/blob/b25df5f997c0bde23ca5941b80d4cea2a7afa54a/userdata/Manifest/gecko-t-win7-32-beta.json#L486 "sha512": "d854d13978ba83e8b0f806a3c21653c2ad76c18e8c74da719d6f383672e76234a13a656c2f429c68cf4c94f793f3561b3e790433c83384ca1c34af9204979b8f" So I think the problem was in the download from github, which was done by https://github.com/petemoore/myscrapbook/blob/93c517f5594d14c1270fb9ecc3f043280278344f/upgrade-gw-betas-cu.sh#L26 Probably this is an error that originated from github's blob storage provider (AWS S3?) and no doubt is due to the eventual consistency of the backing store, that this particular request did not get serviced with the real binary, as the service instance I hit had not yet seen it. I'll add a check to this upgrade-gw-betas-cu.sh script to ensure a 200 http response status code after downloading, and maybe add an extra sanity check that it is at least 2MB in size - that probably would have caught it. I'll also publish the real binary, and make a new gecko-t-win7-32-beta deployment with the updated SHA512 in the manifest....
(In reply to Rob Thijssen (:grenade UTC+2) from comment #9) > that looks like the tooltool error stating that the artefact hasn't been > uploaded. > https://api.pub.build.mozilla.org/tooltool/sha512/ > d854d13978ba83e8b0f806a3c21653c2ad76c18e8c74da719d6f383672e76234a13a656c2f429 > c68cf4c94f793f3561b3e790433c83384ca1c34af9204979b8f I believe not existing artifacts will return a Not Found (404) rather than Forbidden (403). The reason it is forbidden is because of --visibility internal option in tooltool upload, which I think we have to use, because the tooltool auth token isn't authorized to download public artifacts, so we have to upload internal ones, iirc.
if you visit https://api.pub.build.mozilla.org/tooltool and log in and then revisit https://api.pub.build.mozilla.org/tooltool/sha512/d854d13978ba83e8b0f806a3c21653c2ad76c18e8c74da719d6f383672e76234a13a656c2f429c68cf4c94f793f3561b3e790433c83384ca1c34af9204979b8f, you won't get the forbidden error. instead you will get a successful download of a text file containing: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>31899192/232d476a-521c-11e8-9bfa-8cab171a2194</Key><RequestId>09408DDFFFFA8564</RequestId><HostId>YUSXC2bDGbmDsYkcmcsfxts6wA3DXmeU34M6nlFboCHsrio9LdsYYijcW5zGuQJTY80ru9D50zA=</HostId></Error>
(In reply to Rob Thijssen (:grenade UTC+2) from comment #12) > if you visit https://api.pub.build.mozilla.org/tooltool and log in and then > revisit > https://api.pub.build.mozilla.org/tooltool/sha512/ > d854d13978ba83e8b0f806a3c21653c2ad76c18e8c74da719d6f383672e76234a13a656c2f429 > c68cf4c94f793f3561b3e790433c83384ca1c34af9204979b8f, you won't get the > forbidden error. > > instead you will get a successful download of a text file containing: > > > <?xml version="1.0" encoding="UTF-8"?> > <Error><Code>NoSuchKey</Code><Message>The specified key does not > exist.</Message><Key>31899192/232d476a-521c-11e8-9bfa-8cab171a2194</ > Key><RequestId>09408DDFFFFA8564</ > RequestId><HostId>YUSXC2bDGbmDsYkcmcsfxts6wA3DXmeU34M6nlFboCHsrio9LdsYYijcW5z > GuQJTY80ru9D50zA=</HostId></Error> Yes, this is what I wrote in comment 10.
(In reply to Pete Moore [:pmoore][:pete] from comment #7) > It looks like two different errors on different instances > > 1) Path absent: C:\generic-worker\run-generic-worker.bat > > [[Script]CommandRun_GenericWorkerInstall] Validate-PathsExistOrNotRequested > :: 2 validations specified. > [[Script]CommandRun_GenericWorkerInstall] Path absent: > C:\generic-worker\run-generic-worker.bat > [[Script]CommandRun_GenericWorkerInstall] Path present: > C:\generic-worker\generic-worker.exe > [[Script]CommandRun_GenericWorkerInstall] Validations not satisfied > [[Script]CommandRun_GenericWorkerInstall] in 0.0000 seconds. > > I agree with your analysis in comment 5 with how to fix this - I'll make a > patch. On closer inspection, comment 5 was incorrect. If the `generic-worker install` command is successful, it will create C:\generic-worker\run-generic-worker.bat - the reason it didn't, is because the binary was bad, since resolved in https://github.com/mozilla-releng/OpenCloudConfig/commit/7e4be776826709e35d7a87567535fe1d2951434c. It was therefore correct that the validation failed. The step you pointed to in comment 5 (ChecksumFileDownload/GenericWorkerStateWait) is one that replaces the default version with a customised version, which does not need to have already run for the path to exist, so I won't be adding the dependency.
closing because the exception thrown was the expected behaviour under the circumstances
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.