Closed Bug 1067756 Opened 6 years ago Closed 6 years ago

Fix Windows test slaves with broken maintenanceservice.exe

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Windows XP
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: q)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2722] )

All of the slaves which have hit bug 1027039 (other than the one it was filed about, #c0 was actually something different) have gotten a broken maintenance service, and according to bug 945192 comment 123 "The maintenanceservice.exe under Mozilla Maintenance Service\ in the x86 PROGRAMFILES needs to be restored to the one from the image."

Ordinarily, I'd say "fine, I'll disable every slave that hits 1027039, we can just reimage them" but with your choice of bug 1067062 or bug 1065853 saying that at least for Win8, probably for Win7 too, a reimage makes (or made, unclear whether bug 1067062 is saying "is now fixed" or "will be fixed") the slave completely broken, might have to just manually fix the one broken file. And, there's no guarantee that an affected slave will have already run xpcshell on a tree where someone would star it as 1027039, so picking them off one-by-one may well take longer than anyone's memory of "if I get bugmail about this one intermittent failure, it means a slave needs to be reimaged" will last. 

https://tbpl.mozilla.org/?tree=Try&rev=52b2042aa7aa is one causative try push, so that's one list of slaves which will be affected, but not necessarily the only list.
As the general fix we should probably have GPO/puppet put this right, so that we avoid this issue in the future (glorious, v1.1). coop, do you have some paramiko goodness we can use in the short term ?
bhearsum created a script that should be installing the old maintenance service before a test run.

Ben, can you check if it is running? Thanks
Flags: needinfo?(bhearsum)
I'm really not sure what to look at here. My original work on this was done on an older generation of machines. I'm guessing that we're supposed to have some GPO task to do something here, but I'm not sure.

Q, do you know if we have anything that deals with the mozilla maintenance service installation? If it's being used like on the old rev3 machines, it would be a scheduled task. Feel free to ping me - I know I'm not giving you much to go on here...
Flags: needinfo?(bhearsum) → needinfo?(q)
There is a GPO that installs the MMS it only applies to win 7 and win XP. It triggers if the Win32 service "MozillaMaintenance" does not exist. It installs the service and adds reg keys then imports the MozRoot.cer. Do we need to change the versions here ?

 As  a point of note it is entirely possible that any paramiko goodness will get stomped by GPO next boot.
Flags: needinfo?(q)
(In reply to Q from comment #4)
> There is a GPO that installs the MMS it only applies to win 7 and win XP. It
> triggers if the Win32 service "MozillaMaintenance" does not exist. It
> installs the service and adds reg keys then imports the MozRoot.cer. Do we
> need to change the versions here ?

What about Windows 8?

Even so, it looks like bug 1027039 affects all Windows platforms. Eg, there was a recent failure on t-w732-ix-045.

I don't have time to look into this any further at the moment, I might be able to come back around to it later this week or early next.
Flags: needinfo?(q)
If the script is applied by GPO it will only get applied after the system boots. We should also revert back to the original one after a test run. I can likely mitigate this partially by restoring it after a test run though it would be better if it was re-installed prior to a test run.
Q, if the script is not applied to Win 8 systems can it be setup like Win XP and Win 7?

To mitigate bug 1027039 can the account running the test have full control over the %PROGRAMFILES(X86)%\Mozilla Maintenance Service\ directory on Win 8?

BTW: it appears that that it already has the equivalent of full control of the %PROGRAMFILES(X86)%\Mozilla Maintenance Service\ on Win XP and Win 7.

We need this in order to land the Mac v2 signing changes (bug 1047584 and associated bugs).
I am looking at porting the script to win 8 and I will get build account access to the folders.
Flags: needinfo?(q)
Awesome and thanks!
Component: Buildduty → Platform Support
QA Contact: bugspam.Callek → coop
After some email conversations and speaking with Arr we should be able to roll this out now that the chem-spill is over. 

The question came up as to weather or not we want to add the MMS install to win 8. There may have been a reason this was not done before should it happen now ? Anything we should look out for?
The only reason I can think of it not being added to win8 is that iirc we didn't have win8 build slaves when the script was first deployed. When we have had changes to the maintenance service that required client changes before we could run tests the script has been used to deploy those changes without requiring manual intervention or changes to build slave images so it would be a good thing for it to also run on win8.
Full control permissions are now set on XP and 7 I am looking into side effects for win 8 right now.

Q
(In reply to Q from comment #4)
> There is a GPO that installs the MMS it only applies to win 7 and win XP. It
> triggers if the Win32 service "MozillaMaintenance" does not exist. It
> installs the service and adds reg keys then imports the MozRoot.cer. Do we
> need to change the versions here ?

It sounds like the GPO will only install if there is none already, unless the tests happen to leave the service uninstalled (seems fragile). Don't we need the service install it to happen after every test ? We proxy to every boot because we reboot after every test, and presumably will run GPO/puppet after ever test in the 'runner' future.
(In reply to Nick Thomas [:nthomas] from comment #13)
> (In reply to Q from comment #4)
> > There is a GPO that installs the MMS it only applies to win 7 and win XP. It
> > triggers if the Win32 service "MozillaMaintenance" does not exist. It
> > installs the service and adds reg keys then imports the MozRoot.cer. Do we
> > need to change the versions here ?
> 
> It sounds like the GPO will only install if there is none already, unless
> the tests happen to leave the service uninstalled (seems fragile).
The tests don't uninstall the service. They do require that the service is present though.

> Don't we
> need the service install it to happen after every test ?
No. As long as it is installed before the tests run and we have write access to the directory that would suffice. It would be nice to test other things but having the maintenance service installed by the script as it does for non Win8 Windows and having write access would be a significant step forward.

> We proxy to every
> boot because we reboot after every test, and presumably will run GPO/puppet
> after ever test in the 'runner' future.
Do we actually reboot after every test? If so then the script doesn't appear to be running successfully since we've seen systems with a newer than expected maintenance service installed after running tests on m-c. For example, aurora would have the m-c maintenance service when it ran the tests. Write access would allow the test to put the correct maintenance service in place so this should be a moot point.
(In reply to Q from comment #12)
> Full control permissions are now set on XP and 7 I am looking into side
> effects for win 8 right now.
Hi Q, any update on Win 8?
Flags: needinfo?(q)
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2715]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2715] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2722]
Assignee: nobody → q
Flags: needinfo?(q)
The delay here was based around a discussion of wither or not this would be fixed with puppett in timely fashion. Getting back around to this. It is clear puppett will not be viable anytime soon for win 8 so GPOs will need to be altered. I am working on getting a testing scenario set back up for this problem.
Is it possible to work on this again? We'd like to land multi platform MAR signing work over in bug 973933 and the last remaining part is a win8 failure due to needing a new service installed which this would fix. Thank you!
Flags: needinfo?(q)
In particular what I think we need is the ability for the maintenance service to be copied over in win8.
Blocks: 973933
With the changes in bug 973933 I expect this bug will start causing more failures such as bug 1112284. With the changes currently on Oak it will affect other branches as well.
The deployment has been tested but never signed off on. The current GPO will do the following in the wild:

* Install the Mozilla maintenance service on every boot from here (Only on win 8 for now):
  http://runtime-binaries.pvt.build.mozilla.org/updateservice.zip
* Update mozroot.cer

* Install the service on shutdown if missing

* gives access to cltbld to:
%ProgramFiles% (x86)\Mozilla Maintenance Serviceshow
%ProgramFiles%\Mozilla Maintenance Serviceshow

Will this be enough to satisfy the need?
Flags: needinfo?(q)
I suspect there was a copy / paste error and you meant that write access will be granted to
C:\Program Files (x86)\Mozilla Maintenance Service
and
C:\Program Files\Mozilla Maintenance Service

When can this be deployed?
Q, bug 973933 is being held up by this. Can I get an expected completion date for this bug? Thanks!
Flags: needinfo?(q)
The file permissions fix ( allowing service overwrite ) is already in place for both win7 and 8. I can push out the service re-installer today.

Q
Flags: needinfo?(q)
Recent tests failed because of this. When did that change go live?
Flags: needinfo?(q)
I see machines that picked up the change of of this morning. I verified t-w864-ix-005 which was not part of the st pool as picking up the change after a reboot.
Flags: needinfo?(q)
Thanks! I did a try run to verify and everything is looking good!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.